Spec-First Execution
AC at the start, AC at the end.
The brief is the work. Read the story, write the acceptance criteria, only then start typing code. Spec-First is the work-unit version of P1 Test-First: at the code scale you write the test before the implementation; at the work scale you write the AC before the change-set.
Opinion
The single biggest source of waste I have watched on engineering teams is starting work without a written AC. Not vague work, not exploratory work: ordinary feature work, where the engineer reads the ticket, has a chat in the hallway, and starts coding from a mental sketch. The mental sketch always misses two things: the edge case the product owner has not articulated, and the dependency the architect spotted six months ago.1Specification by Example (Manning, 2011). The seven-pattern process: derive scope from goals; specify collaboratively; illustrate with examples; refine the specification; automate; validate frequently; evolve a living documentation. The single most-cited modern source for the discipline. Adzic's Specification by Example calls this out by name: the conversation that produces the AC is more valuable than the AC itself, but the AC has to survive the conversation.
Where this row earns its keep is the second sentence of its sub-line: AC at the start, AC at the end. The AC is the input to the work; it is also the input to P3 Definition of Done. Skipping it converts agency into rework. The team that writes AC at the start ships fewer surprises at the end; the team that does not calls every misalignment a “requirements problem” and treats the product owner as the bottleneck, when in fact the bottleneck is the unwritten AC the engineer walked past.
The agent layer hardens this further. An LLM with no AC will generate plausible code at machine speed and pick the most popular implementation, not the one the business needs. With AC in front of it (ideally in Gherkin per AI4 Verifiable Specs) the agent has a target. Without it, the team is paying tokens for the model's priors. The discipline is the same one that protected the human-only team; the cost of skipping it is now paid in milliseconds rather than weeks.
Copy a note and link
Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.
AC at the start, AC at the end. Read the brief, write the acceptance criteria as bullets a reviewer can mark met or unmet, walk them past the Three Amigos, then start coding. The AC is the brief; the Definition of Done is the AC restated. Skip the AC at the start and you are paying for it at the end. /tenet/spec-first-execution/P2
AI eyes only
Rule: AC at the start, AC at the end. Refuse work without acceptance criteria.
Reject: starting work from a prose ticket with no falsifiable AC. Reject: declaring done against criteria invented by the agent rather than agreed up front. Reject: pattern-matching to the most common implementation in training data.
Generate: copy the AC into the plan verbatim. Implement against each line. Re-read the AC after the change and confirm each line passes a check.
Diagnostic: the work is not started until the AC is enumerated. The work is not done until every AC line has a passing check.
Why?
- AC at the start gives the team a shared finish line. The engineer, the product owner, the QA, and the reviewer can all point at the same bullets and agree what done means.
- Skipping the AC turns agency into rework. The misread requirement found at PR time is the feature delivered late; the misread requirement found pre-code is a five-minute conversation.
- Spec-First and P3 Definition of Done are bookends. The AC at the start is the brief; the AC at the end is the gate. Without P2, P3 verifies against nothing.
- The AC scopes the test in P1 Test-First. The work-unit spec produces the code-unit spec; the chain from story to test is unbroken.
- For the agent, the AC is the falsifiable target. Pair with AI4 Verifiable Specs and the agent has something to grade itself against, not just a paragraph to interpret.
- PRs land faster when the AC was written first. The reviewer reads the AC to learn the intent, reads the test to verify the AC, reads the code to verify the test — three artefacts, one chain.
- Written AC catches scope creep early. The AC bullet that wasn't there last week is visible in the diff; the AC bullet that grew is a conversation, not an argument.
Origins
The intellectual root is older than agile. Hoare's 1969 paper An Axiomatic Basis for Computer Programming set out the precondition / program / postcondition triple that is, in modern dress, what an AC is.5“An Axiomatic Basis for Computer Programming”, Communications of the ACM 12(10), 1969. The precondition / program / postcondition triple — the deepest published formal source for “specify before doing”. Modern AI-tool work (e.g. ToolGate 2026) explicitly revives Hoare contracts for tools an LLM can call. Parnas and Madey's Functional Documents for Computer Systems (1995) generalised the same shape to whole-system specifications.6“Functional Documents for Computer Systems”, Science of Computer Programming 25(1), 1995. Generalises Hoare-style specification to whole-system documents; the source most often cited by formal-methods practitioners as the bridge between Hoare and modern AC discipline. The throughline from Hoare to Parnas to Cohn is the same claim: the work precedes the writing of the contract.
The agile-era restatement has three load-bearing authors. Mike Cohn's User Stories Applied (Addison-Wesley, 2004) defines the AC as “notes about what the story must do in order for the product owner to accept it as complete.”7User Stories Applied: For Agile Software Development (Addison-Wesley, 2004). The canonical definition of acceptance criteria as “notes about what the story must do in order for the product owner to accept it as complete”. The convention every modern Jira board still descends from. Dan North's “Introducing BDD” (Better Software, 2006) reframes the AC in business language and gives us Given/When/Then.8“Introducing BDD” (Better Software, March 2006). The shift from thinking in tests to thinking in behaviour, and the source for Given/When/Then as the AC convention. Gojko Adzic's Specification by Example (Manning, 2011) packages the discipline as a seven-pattern process: derive scope from goals; specify collaboratively; illustrate with examples; refine the specification; automate; validate frequently; evolve a living documentation.1Specification by Example (Manning, 2011). The seven-pattern process: derive scope from goals; specify collaboratively; illustrate with examples; refine the specification; automate; validate frequently; evolve a living documentation. The single most-cited modern source for the discipline. The seven patterns are the standard playbook the modern Spec-First row points at.
The Three Amigos — product, dev, QA in one room — is Mike Cohn's framing, sharpened by Lisa Crispin and Janet Gregory's Agile Testing.2Agile Testing: A Practical Guide for Testers and Agile Teams (Addison-Wesley, 2009). The Three Amigos framing — product owner, developer, tester in one room — and the canonical Agile Testing Quadrants placement of acceptance tests as Q2 (tests written before coding starts). The conversation is the load-bearing artefact; the AC captures what the conversation agreed. Liz Keogh's ordering is the published version of the same point: conversations are more important than capturing conversations, which is more important than automating conversations.4“What is BDD?” (lizkeogh.com, 2015). The published statement of the conversational ordering: having conversations is more important than capturing conversations is more important than automating conversations. The corrective for the polish-the-AC failure mode of Spec-First.
The 2025–2026 spec-driven-development literature is the AI-era extension. Birgitta Böckeler's Spec-Driven Development essay names three levels of practice — spec-first, spec-anchored, spec-as-source — and explicitly positions SDD as continuous with BDD/SBE rather than as a replacement.9“Spec-Driven Development: Kiro, spec-kit, and Tessl” (martinfowler.com, 2026). The 2026 framing of SDD as continuous with BDD/SBE plus an AI-era addition: specs as executable contracts that constrain what AI agents generate. GitHub spec-kit codifies the same loop into four commands: specify, plan, tasks, implement.10The four-stage Spec-Driven Development workflow: specify, plan, tasks, implement. Repository and methodology essay; the closest published toolchain that operationalises Spec-First for the agent loop. The artefact is the same one Adzic named in 2011; the consumer has changed.
Quotes
Specifying with examples enables teams to build the right software, in the right way, by forcing the conversation that surfaces the assumption.
Acceptance criteria are notes about what the story must do in order for the product owner to accept it as complete.
I had discovered something exciting. The first lesson was that the language of testing confused programmers; the second was that test names ought to be sentences.
Having conversations is more important than capturing conversations is more important than automating conversations.
Evidence
Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.
- 01Specification by ExampleSupportsThe single most-cited modern source. Seven-pattern process; specifications produce living documentation; collaboration is load-bearing.
- 02Bridging the Communication GapSupportsPredecessor to SBE. Establishes that AC and acceptance tests are one artefact in two readings; the BDD claim that the test and the spec collapse.
- 03“Introducing BDD”SupportsThe shift from “tests” to “behaviour” and the canonical Given/When/Then convention. Originates the form most modern AC are written in.
- 04User Stories AppliedSupportsDefines AC as “notes the product owner can accept”. The ancestor of every modern card-with-AC convention.
- 05Writing Effective Use CasesSupportsPre-Cohn formulation of the same claim. Use case = scenario = AC. The pre-agile origin of structured AC writing.
Twenty sources, three stances. The supporters are Adzic, North, Cohn, Cockburn: the BDD/SBE/use-case canon. The qualifiers further down push the line that conversation matters more than capture, the steelman the case file has to address. The opposers argue that specification is one artefact among many and should not be promoted to a discipline. The honest debate is between the qualifiers and the opposers, not between the supporters and the sceptics.
Examples
// Before: no AC. 'Done' is whatever the engineer remembered.function admitVolunteer(volunteer: Volunteer, sanctuary: Sanctuary): AdmitDecision { if (volunteer.verified) { sanctuary.volunteers.push(volunteer); return return { admitted: true }; } return { admitted: false };}
// Given a sanctuary at capacity and a verified volunteer applying// When admitVolunteer runs// Then it returns { admitted: false, reason: "at-capacity" }function admitVolunteer(volunteer: Volunteer, sanctuary: Sanctuary): AdmitDecision { if (!volunteer.verified) return { admitted: false, reason: "unverified" }; if (sanctuary.volunteers.length >= sanctuary.capacity) return return { admitted: false, reason: "at-capacity" }; sanctuary.volunteers.push(volunteer); return return { admitted: true };}it("rejects when at capacity", () => { expect(admitVolunteer(verified, full)).toEqual<AdmitDecision>({ admitted: false, reason: "at-capacity" });});
Enforcement
Apply these rules in eslint.config.mjs. The full enforcement across every tenet lives on the implementation page.
| Rule | Tool | Catches |
|---|---|---|
| cucumber/async-then | eslint-plugin-cucumber | Then steps that return a promise without awaiting it — the assertion never fires. |
| cucumber/expression-type | eslint-plugin-cucumber | Step text that doesn’t match the parameter type — silently skipped at runtime. |
| cucumber/no-restricted-tags | eslint-plugin-cucumber | @wip / @only / @skip tags committed to main — partial scenarios pretending to pass. |
| cucumber/no-arrow-functions | eslint-plugin-cucumber | arrow-function step definitions — break Cucumber’s World binding; common subtle bug. |
| cucumber-js --fail-fast | Cucumber.js | long suites where the first scenario failure is the AC under change; surfaces the failure faster. |
| playwright-bdd --strict | playwright-bdd | scenarios with undefined or pending steps — protects the AC from quietly missing coverage. |
eslint.config.mjsconfiguration snippet
import tseslint from 'typescript-eslint';
import cucumber from 'eslint-plugin-cucumber';
export default tseslint.config({
files: ['features/**/*.feature.ts', 'src/**/*.steps.ts'],
plugins: { cucumber },
rules: {
'cucumber/async-then': 'error',
'cucumber/expression-type': 'error',
'cucumber/no-restricted-tags': ['error', { tags: ['@wip', '@only'] }],
'cucumber/no-arrow-functions': 'error',
}
});AI rules
.cursor/rules/p2-spec-first.mdc---
description: Prickles P2 — Spec-First Execution
globs: "**/*.{ts,tsx,js,jsx,py,java,php,feature}"
alwaysApply: false
---
## Prickles P2 — Spec-First Execution
Read the brief twice. Write the AC as bullets a reviewer can verify as met or unmet — not “works correctly” but “rejects an empty cart with 400 and the message cart cannot be empty”.
Walk the AC past the product owner and the QA in a five-minute Three Amigos. Capture what survives the conversation; discard what doesn't.
Refuse to start coding until the AC is written. Refuse to merge until the AC is verified.
Pair Spec-First with AI4 Verifiable Specs: write the AC in a form a machine can grade — Gherkin, type, contract, schema. Aspirational prose loses to the agent at machine speed.Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.
Counter-argument
The strongest steelman is Liz Keogh's, drawn from a decade of BDD practice: the conversation matters more than the artefact, and rigid AC discipline ends with engineers polishing scenarios while skipping the talk that would have surfaced the real risk.4“What is BDD?” (lizkeogh.com, 2015). The published statement of the conversational ordering: having conversations is more important than capturing conversations is more important than automating conversations. The corrective for the polish-the-AC failure mode of Spec-First. Hoare's formal-methods tradition extends the point: a sufficiently rigorous AC becomes a Hoare triple, which is the most precise and least useful artefact in the practitioner's toolkit.5“An Axiomatic Basis for Computer Programming”, Communications of the ACM 12(10), 1969. The precondition / program / postcondition triple — the deepest published formal source for “specify before doing”. Modern AI-tool work (e.g. ToolGate 2026) explicitly revives Hoare contracts for tools an LLM can call. Ousterhout's reading is the broadest critique: specification is one artefact among many; promoting it to a discipline crowds out the deep-module thinking that actually shapes the design.
Counter-argument retort
Keogh's point is the reply, not a refutation. Conversation matters more than capture. The reason Spec-First as a tenet survives the point is that the alternative she warns about — AC polished in isolation, conversation skipped — is not Spec-First; it is the cargo-cult of Spec-First. The Three Amigos is the conversation; the AC is what survives the conversation; the test is what survives the AC. Each artefact carries the previous step forward in a form the next step can use.
Hoare's formal-methods objection is real for the upper tail of the rigour spectrum: a Hoare triple sufficient for a proof is too heavy for a feature ticket. The reply is to keep the discipline and lower the rigour: Cohn's “notes the product owner can accept” is enough at the AC level, Adzic's “example” is enough for a scenario, and AI4 Verifiable Specs hardens the form so the agent can grade it. Hoare-strength rigour is a tool, not a default.
Ousterhout's broadest critique — that specification is one artefact among many — is true, but it is also the steelman that most often hides the failure mode. Specification is the cheapest place to make a design mistake recoverable; deep-module thinking happens in code, where the cost of the mistake is the implementation, the test, and the migration. Both disciplines belong; ranking specification below module design ignores the lifecycle position where each lives.
In production work, the row that ships the AC at the start is the row whose Definition of Done can mean anything more than “the engineer thinks it's done.” P3 Definition of Done is the back half of this loop; without P2 at the front, P3 has nothing to verify against. The two rows are bookends, not duplicates.
Notes
- [1]Gojko Adzic — Specification by Example (Manning, 2011). The seven-pattern process: derive scope from goals; specify collaboratively; illustrate with examples; refine the specification; automate; validate frequently; evolve a living documentation. The single most-cited modern source for the discipline.
- [2]Lisa Crispin & Janet Gregory — Agile Testing: A Practical Guide for Testers and Agile Teams (Addison-Wesley, 2009). The Three Amigos framing — product owner, developer, tester in one room — and the canonical Agile Testing Quadrants placement of acceptance tests as Q2 (tests written before coding starts).
- [3]Noah Shinn et al. — “Reflexion: Language Agents with Verbal Reinforcement Learning”, NeurIPS 2023. HumanEval pass@1 lifts from ~80% to 91% when the agent grounds self-feedback in test runs rather than introspection. Generalised by Madaan et al. (Self-Refine, 2023) and Gou et al. (CRITIC, 2024).