Open source · Apache-2.0 · Prototype build

H4 for AI workflows.

In the 18th century, navigation had no truth — until Harrison built H4. We're building the H4 for AI workflows.

Try the live eval See how it works The Trap Street Wall →

Tools evaluated

200

Real tasks

Fabrications caught

What we do, in four lines

We don't fine-tune models.

We test claims.

We run real tasks.

We expose what works, what fails, and what lies.

The four verbs

Define real tasks. Explore tasks. Submit a task. Run an eval.

Define real tasks

Real JDs, real résumés, real SEC filings — never synthetic, never gameable. Trap-street probes are seeded into the held-out set.

Explore tasks

Public tasks are open data. Anyone can browse what the world is being measured on. Held-out and Live Mode tasks remain private.

Submit a task

Three submission tiers: Bronze (CLI), Silver (audit-eligible API), Gold (we run it ourselves on held-out + Live Mode).

Run an eval

Pydantic Evals + LLM-as-judge wrapped in Langfuse traces. 200 tasks. 5 minutes. Scores published with full provenance.

Trust tiers

Every score wears its evidence on its sleeve.

The user's question is "how do I know this score is real?", not "how good is the tool?" The tier badge answers the first question. The score answers the second.

BRONZE

Self-reported by the builder

SILVER

Audited — 10–20% re-run on our infra

GOLD

Full eval on our infra + Live Mode

Why H4

Truth, not theories.

Reproducible

Every score is re-runnable. Anyone can clone the harness, replay our traces, and verify the verdict. That's the H4 standard.

Live Mode

Today's SEC filings. Today's LinkedIn JDs. Tasks that did not exist in any model's training data. A moat that renews itself daily.

Trap streets

We seed verifiable falsehoods inside held-out tasks. Workflows that fabricate trip the trap and land on the public Wall.

Transparent judges

Our LLM judges are open. Their prompts are versioned. Agreement with human annotators is published. No black boxes.

Find the fakes.

Built on Pydantic Evals + Langfuse. Open source. The community edition is free forever. The hosted edition is how we eat.

Run an eval in your browser Read the code