Framework v2.1Last updated April 2026

Methodology

A model-agnostic framework for measuring the quality of AI-generated images and video. Nine dimensions, three gates, automated scorecards.

In video QA, “artifact” often names a defect. HarteFact scores outputs anyway — assets, streams, pixels, facts.

Local-first. No cloud dependencies. Designed to run on Apple Silicon using open-source components. The framework is incremental — each phase produces infrastructure consumed by later phases.

Core principles

Model-agnostic by design

Most metrics measure properties of the output file — resolution, texture, temporal stability, color accuracy, identity consistency — regardless of which model produced it. Scoring does not require recalibration when models change.

Algorithmic vs. AI-evaluated

Every score is labeled algorithmic or ai_evaluated. VLM scores are reported with mean and variance and are never presented as equivalent to deterministic metrics.

Tiered gating

Three gates avoid wasting compute on content that has already failed. A clip with the wrong codec never consumes GPU cycles on identity-drift analysis.

Versioned, reproducible

Every run logs framework version, calibration version, and model versions. Re-evaluations are new runs, not silent replacements. Score history is queryable per asset.

Pipeline architecture

Three gates separate fast, cheap checks from expensive deep analysis. Failed content gets immediate, specific feedback identifying the failure dimension — without the cost of downstream scoring.

Gate 1Technical specs
Dimension 1
Pass / fail on file specs, codec, resolution, audio packaging.
Gate 2Spatial quality
Dimension 2
Pass / fail on catastrophic spatial failures (severe artifacts, banding).
Gate 3Temporal & audio basics
Dimensions 3 + 4 (parallel)
Pass / fail on flicker, scene-cut sanity, audio levels, sync offset.
DeepIdentity, lighting, brand, prompt adherence
Dimensions 5 – 9
Per-character analysis, scene integrity, client-compliance scoring.
OutputVersioned scorecard
Pass/fail summary, per-dimension detail, annotated frame thumbnails, timeline visualization, per-frame metric trends, client threshold reference.

The nine dimensions

Each dimension owns a distinct axis of output quality. Build phases follow the dependency map: each phase produces infrastructure later phases reuse, so no work is thrown away.

D01

Technical Delivery Compliance

Phase 1

File specs, codecs, container, color space, VMAF, audio packaging. The non-negotiable foundation.

Resolution / frame rate
Codec & container
VMAF score
Color space

D02

Spatial & Texture Integrity

Phase 2

Per-frame visual quality. Compression artifacts, texture noise, banding, VAE seam detection.

BRISQUE / NIQE
Laplacian sharpness
Color banding
Wavelet noise analysis

D03

Temporal Consistency & Motion

Phase 3

Stability across frames. Background flicker, optical flow consistency, scene-cut detection.

Background SSIM
Optical flow
Flicker detection
Scene cuts

D04

Audio Quality

Phase 4

Loudness, clipping, sync offset. Runs in parallel with the temporal pipeline.

LUFS measurement
Clipping detection
Sync offset
Spectral integrity

D05

Lip Sync Precision

Phase 5

Combines mouth aspect ratio (MAR) with audio phoneme timing via DTW alignment.

MAR extraction
DTW alignment
WhisperX phonemes
Sync drift over time

D06

Character & Identity Integrity

Phase 6

Face identity drift, hand failures, body proportions, teeth, clothing consistency.

InsightFace cosine similarity
Hand failure logging
Body proportions
Skin tone stability

D07

Lighting & Scene Integrity

Phase 7

Shadow coherence, luminance tracking, color temperature stability, reflection plausibility.

Shadow masking
Luminance per region
Color temperature drift
Reflection flagging

D08

Brand & Client Compliance

Phase 8

Per-client palette, talent reference, logo placement, LUT comparison, typography.

Brand HEX Delta-E
Talent face match
LUT comparison
Logo / wordmark presence

D09

Prompt & Action Adherence

Phase 9

VLM-evaluated framing, composition, physics plausibility, object/spatial flagging.

VLM scene description
Framing & composition
Physics flags
Slideshow detection

Includes ai_evaluated scores; reported with mean + variance.

What this framework is not

—Not a scoring rubric for taste, creativity, or commercial appeal. Aesthetic judgment remains human.
—Not a model leaderboard. The framework benchmarks output properties; model comparisons are a separate activity built on top of the same infrastructure.
—Not a SaaS dashboard. Phase 1 ships a local pipeline and a versioned scorecard format, not a hosted product.
—Not a substitute for human QC on edge cases. The system is designed to scale review, not to replace the final sign-off on high-stakes deliverables.

Print-on-demand extension

A separate addendum extends the framework with print-specific quality metrics: CMYK gamut warnings, ink coverage limits, transparency edge fringing, design placement safety, and pre-generation input validation.

Read the POD addendum

Pilot engagements

Phase 1 (Technical Delivery) and Phase 1b (Identity Consistency) are in active build. We're scoping a small number of pilot engagements with production studios, agencies, and POD operators for the second half of 2026.

Get in touch