Framework v2.1Last updated April 2026

Methodology

A model-agnostic framework for measuring the quality of AI-generated images and video. Nine dimensions, three gates, automated scorecards.

In video QA, “artifact” often names a defect. HarteFact scores outputs anyway — assets, streams, pixels, facts.

Local-first. No cloud dependencies. Designed to run on Apple Silicon using open-source components. The framework is incremental — each phase produces infrastructure consumed by later phases.

Core principles

Model-agnostic by design

Most metrics measure properties of the output file — resolution, texture, temporal stability, color accuracy, identity consistency — regardless of which model produced it. Scoring does not require recalibration when models change.

Algorithmic vs. AI-evaluated

Every score is labeled algorithmic or ai_evaluated. VLM scores are reported with mean and variance and are never presented as equivalent to deterministic metrics.

Tiered gating

Three gates avoid wasting compute on content that has already failed. A clip with the wrong codec never consumes GPU cycles on identity-drift analysis.

Versioned, reproducible

Every run logs framework version, calibration version, and model versions. Re-evaluations are new runs, not silent replacements. Score history is queryable per asset.

Pipeline architecture

Three gates separate fast, cheap checks from expensive deep analysis. Failed content gets immediate, specific feedback identifying the failure dimension — without the cost of downstream scoring.

  1. Gate 1Technical specs
    Dimension 1

    Pass / fail on file specs, codec, resolution, audio packaging.

  2. Gate 2Spatial quality
    Dimension 2

    Pass / fail on catastrophic spatial failures (severe artifacts, banding).

  3. Gate 3Temporal & audio basics
    Dimensions 3 + 4 (parallel)

    Pass / fail on flicker, scene-cut sanity, audio levels, sync offset.

  4. DeepIdentity, lighting, brand, prompt adherence
    Dimensions 5 – 9

    Per-character analysis, scene integrity, client-compliance scoring.

  5. OutputVersioned scorecard

    Pass/fail summary, per-dimension detail, annotated frame thumbnails, timeline visualization, per-frame metric trends, client threshold reference.

The nine dimensions

Each dimension owns a distinct axis of output quality. Build phases follow the dependency map: each phase produces infrastructure later phases reuse, so no work is thrown away.

D01

Technical Delivery Compliance

Phase 1

File specs, codecs, container, color space, VMAF, audio packaging. The non-negotiable foundation.

  • Resolution / frame rate
  • Codec & container
  • VMAF score
  • Color space
D02

Spatial & Texture Integrity

Phase 2

Per-frame visual quality. Compression artifacts, texture noise, banding, VAE seam detection.

  • BRISQUE / NIQE
  • Laplacian sharpness
  • Color banding
  • Wavelet noise analysis
D03

Temporal Consistency & Motion

Phase 3

Stability across frames. Background flicker, optical flow consistency, scene-cut detection.

  • Background SSIM
  • Optical flow
  • Flicker detection
  • Scene cuts
D04

Audio Quality

Phase 4

Loudness, clipping, sync offset. Runs in parallel with the temporal pipeline.

  • LUFS measurement
  • Clipping detection
  • Sync offset
  • Spectral integrity
D05

Lip Sync Precision

Phase 5

Combines mouth aspect ratio (MAR) with audio phoneme timing via DTW alignment.

  • MAR extraction
  • DTW alignment
  • WhisperX phonemes
  • Sync drift over time
D06

Character & Identity Integrity

Phase 6

Face identity drift, hand failures, body proportions, teeth, clothing consistency.

  • InsightFace cosine similarity
  • Hand failure logging
  • Body proportions
  • Skin tone stability
D07

Lighting & Scene Integrity

Phase 7

Shadow coherence, luminance tracking, color temperature stability, reflection plausibility.

  • Shadow masking
  • Luminance per region
  • Color temperature drift
  • Reflection flagging
D08

Brand & Client Compliance

Phase 8

Per-client palette, talent reference, logo placement, LUT comparison, typography.

  • Brand HEX Delta-E
  • Talent face match
  • LUT comparison
  • Logo / wordmark presence
D09

Prompt & Action Adherence

Phase 9

VLM-evaluated framing, composition, physics plausibility, object/spatial flagging.

  • VLM scene description
  • Framing & composition
  • Physics flags
  • Slideshow detection

Includes ai_evaluated scores; reported with mean + variance.

What this framework is not

  • Not a scoring rubric for taste, creativity, or commercial appeal. Aesthetic judgment remains human.
  • Not a model leaderboard. The framework benchmarks output properties; model comparisons are a separate activity built on top of the same infrastructure.
  • Not a SaaS dashboard. Phase 1 ships a local pipeline and a versioned scorecard format, not a hosted product.
  • Not a substitute for human QC on edge cases. The system is designed to scale review, not to replace the final sign-off on high-stakes deliverables.

Print-on-demand extension

A separate addendum extends the framework with print-specific quality metrics: CMYK gamut warnings, ink coverage limits, transparency edge fringing, design placement safety, and pre-generation input validation.

Read the POD addendum

Pilot engagements

Phase 1 (Technical Delivery) and Phase 1b (Identity Consistency) are in active build. We're scoping a small number of pilot engagements with production studios, agencies, and POD operators for the second half of 2026.

Get in touch