Spec-driven, AI-first delivery

The philosophy behind ClosedLoop.ai and why specs, not chats, are the durable unit of AI-assisted work.

ClosedLoop.ai is built around a specific belief: AI-assisted software delivery only scales when the specification is the source of truth. Every loop is grounded in an artifact — a PRD, an implementation plan, or a feature description — and every agent output is judged against that artifact.

This is what we mean by spec-driven, AI-first delivery.

What "spec-driven" means here

Three commitments:

Intent is written down. A PRD, a plan, or a ticket captures the problem, the desired outcome, and the acceptance criteria. This is what the AI reads.
Execution is bounded by the spec. A loop inherits the artifact's scope and can only succeed by satisfying it. The orchestrator emits a <promise>COMPLETE</promise> only when the plan's pendingTasks are empty and the acceptance criteria validate.
Review is against the spec. Judges score the plan or the code against the artifact that launched the loop. Human review closes against the same artifact.

Spec-driven

The specification is the durable object. Agents, runs, reviews, and patches are all derived from it.

If the spec is precise, the output is reviewable.
If the spec is ambiguous, no amount of prompting will produce a repeatable result.
If the spec changes, every downstream artifact gets an auditable revision.

In practice this means:

PRDs live as first-class artifacts, not as chat history.
Implementation plans are generated from PRDs and reviewed by critics before any code is written.
Acceptance criteria are explicit and used by judges to score outputs.
Learnings (patterns, failures, pitfalls) are captured back against the spec and the run, not just in someone's head.

AI-first

Humans define scope and make judgment calls. Agents produce the intermediate artifacts. The system gates both.

ClosedLoop.ai treats AI agents as the default execution path:

Agents draft, explore, and implement.
Agents review agents — critics run in parallel, judges run in batches.
Deterministic tooling gates every phase (lint, typecheck, build, tests, sandbox enforcement, promise validation).
Humans review only at checkpoints where judgment is required.

This is different from "AI-assisted" or "copilot" workflows, where a human is in the loop on every keystroke. AI-first is more like managing a team: you set expectations, you review outputs, and you trust the gates.

Why this produces repeatable correctness

LLMs are non-deterministic content generators. Without structure, they are also unreliable. ClosedLoop.ai adds three kinds of structure:

Artifact binding — every run is attached to an object you can version, diff, and review.
Phased gates — every phase has a promise and a validation that must pass before the next phase starts.
Judgment — every output is graded against the artifact by LLM-as-judge evaluators that produce structured scores.

Together, these let you trust AI-produced output enough to ship it.

What this replaces

ClosedLoop.ai replaces three common failure modes:

Chat-driven coding — fast at the start, but produces no shared artifact, no review trail, and no pattern capture.
One-shot agent tools — produce a patch but do not gate, judge, or learn.
Manual ticket-to-PR flows — predictable but slow, and do not benefit from AI leverage.

The operating model

When a team adopts ClosedLoop.ai it usually evolves into a rhythm like this:

A product owner drafts a PRD. PRD judges and teammates iterate until it passes.
The PRD is decomposed into features. Each feature is a shippable, testable chunk of work.
A plan is generated for each feature. Plan judges and the team iterate until the plan is review-grade.
The plan is executed as a loop. Implementation, testing, and visual QA run inside the loop.
Post-loop code review produces a verdict. A fix cycle resolves non-approving findings.
A PR is opened. Human review closes against the original artifact.
Self-learning captures what worked and what did not, so the next loop starts smarter.

That is spec-driven, AI-first delivery, and it is what the rest of this documentation describes in detail.

Philosophical shift

Engineering shifts from writing code to specifying outcomes and reviewing artifacts. Teams get faster not by coding faster, but by reviewing bigger scopes with more confidence, because the spec-to-output trail is complete and every step is graded.

Tickets become tasks. Epics become features. Sections of a quarterly roadmap land in a few PRs, with judgment attached.