Definition of Done
Done means the checklist passed.
Done means the checklist passed. Coverage, lint, type-check, format, dup-check, AC, a11y: written down, machine-checked, signed off. Judgement is too lenient on a Friday afternoon; the list is the gate, and the gate does not flex.
Opinion
Every team I have worked with that didn't have a written Definition of Done shipped half a feature and called it done at the same rate, and the rate did not fall with seniority. The most senior engineer on a Friday afternoon shipped a half-typed change with the same conviction as the most junior on Monday. Gawande's point about ICU central lines is the same point about feature merges: the experts know all the steps, the experts skip steps when they're tired, and the checklist they hate at first is the checklist that saves them later.1The Checklist Manifesto: How to Get Things Right (Metropolitan Books, 2009). Built on aviation pilot checklists and Peter Pronovost’s ICU central-line checklist. Experts on hard problems beat their own judgement when they read from a list.
What I want to plant a flag on is that the DoD is the back half of P2 Spec-First Execution. The AC at the start is the brief; the AC at the end is the gate. If the AC was written verifiably per AI4 Verifiable Specs, the DoD has something machine-readable to verify against. If not, the DoD becomes judgement, which is the failure mode the row is built to displace. The DoD without a verifiable AC is theatre; the AC without a DoD is aspiration.
The agent layer is where the row earns its weight. An LLM left unchecked will declare itself done at the moment the code compiles; that is the prior, and no amount of CLAUDE.md scolding changes the prior. The fix is not to ask the agent to be more careful. The fix is to give the agent a list it has to tick. Reflexion, Self-Refine, CRITIC: all the same shape. The checklist is the external signal the model corrects against.2“Reflexion: Language Agents with Verbal Reinforcement Learning”, NeurIPS 2023. The agent grounded in machine-checkable feedback improves; the agent grading itself does not. Generalised by Madaan (Self-Refine) and Gou (CRITIC). Without it, the agent grades itself; with it, the agent has something it cannot fake.
Copy a note and link
Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.
Done means the checklist passed. Coverage, lint, type-check, format, dup-check, AC verified, a11y verified — every row machine-checked, every row signed off. Judgement is too lenient on a Friday afternoon; the list is the gate, and the gate doesn't flex. /tenet/definition-of-done/P3
AI eyes only
Rule: done means the checklist passed. Define done as a single command.
Reject: declaring done because the build is green. Reject: declaring done before lint, type- check, tests, format, duplication, coverage all pass. Reject: skipping the script “to save time”.
Generate: run npm run quality (or the project equivalent) before handing back. Paste the result. If the script fails, fix and re-run. Done is the script's exit code, not the agent's opinion.
Diagnostic: cite the exit code of the quality script in every handover. No exit code, no done.
Why?
- A written checklist beats personal judgement. Senior engineers on Friday afternoons skip the same steps as junior engineers on Monday mornings; the list is the corrective.
- The AC is the bookend. P2 Spec-First Execution writes the AC at the start; P3 verifies it at the end. The two rows compose; without P3, the AC is aspiration.
- Total coverage is one row on the list, not a tenet of its own. Promoting it inflated the canon; absorbing it sharpens the list and matches Fowler and Marick on coverage as diagnostic, not target.
- The DoD is the agent's grader. The model declares itself done when the build compiles; the list is what stops it. CRITIC and the evaluator-optimizer pattern both ground the agent in something machine-checkable.
- PRs land faster when the reviewer can read down the DoD instead of reverse-engineering the intent. Each row of the list is a row the reviewer doesn't have to think about.
- DORA / Accelerate names the DoD as one of the capabilities that predicts delivery-performance. The list isn't bureaucracy; it's a measured driver of throughput.
- The DoD is what “done” means on this team. New starters know what to verify; existing members can't lower the bar without a conversation; the floor moves up over time, not down.
Origins
The agile-era origin is Scrum. Ken Schwaber and Jeff Sutherland's Scrum Guide has carried the term “Definition of Done” from the 1990s revisions onward; the current edition treats the DoD as “a formal description of the state of the Increment when it meets the quality measures required for the product.”5The Scrum Guide (current edition, scrumguides.org). “Definition of Done” as a formal description of the state of the Increment when it meets the quality measures required for the product. The agile-era origin of the term. Mike Cohn's Succeeding with Agile (Addison-Wesley, 2010) ch.10 is the practitioner playbook for assembling one.6Succeeding with Agile: Software Development Using Scrum (Addison-Wesley, 2010), ch.10. The practitioner playbook for assembling a DoD; useful as the operational counterweight to the Scrum Guide’s definition. Robert C. Martin's The Clean Coder ch.2 (“Saying Yes”) and ch.8 (“Testing Strategies”) carry the discipline-first flavour: the checklist defines what a professional commitment means.7The Clean Coder: A Code of Conduct for Professional Programmers (Pearson, 2011), ch.2 “Saying Yes” and ch.8 “Testing Strategies”. The checklist defines what a professional commitment means.
The deeper intellectual root is Atul Gawande's Checklist Manifesto (Metropolitan Books, 2009).1The Checklist Manifesto: How to Get Things Right (Metropolitan Books, 2009). Built on aviation pilot checklists and Peter Pronovost’s ICU central-line checklist. Experts on hard problems beat their own judgement when they read from a list. Gawande's case is built on aviation pilot checklists and Peter Pronovost's ICU central-line checklist (Johns Hopkins, 2001) — experts on hard problems beat their own judgement when they read from a list. Gawande's claim transfers directly to engineering: the senior engineer on a Friday afternoon is the ICU doctor at the end of a shift; the checklist is the corrective.
The DORA / Forsgren research programme makes the DoD a measured driver of delivery performance. Accelerate (IT Revolution, 2018) names the 24 capabilities of high performers; the DoD checklist sits inside several — continuous testing, test automation, deployment automation.8Accelerate: The Science of Lean Software and DevOps (IT Revolution, 2018). Names the 24 capabilities of high-performing teams; the DoD checklist sits inside continuous testing, test automation, and deployment automation. The 2024 DORA report continues to list these as predictors. The Microsoft Engineering Playbook codifies the same shape for an industrial team and is the most-cited operational template on GitHub.9microsoft.github.io/code-with-engineering-playbook. The most-cited operational template for engineering DoDs on GitHub. Codifies the row-by-row checklist shape for industrial teams.
The total-coverage row in the previous canon merges into the DoD here. Brian Marick's coverage taxonomy and Martin Fowler's “Test Coverage” bliki entry both argue the same point: coverage is a metric, not a target.4“Test Coverage” bliki entry (martinfowler.com, 2012). Coverage targets are a smell; coverage is a diagnostic, not a goal. Pairs with Brian Marick’s coverage taxonomy as the qualifier inside the DoD’s coverage row. Promoting it to a tenet inflated the list; absorbing it into the DoD as one row with named exclusions (.types.ts, .spec.ts, .stories.tsx, .config.*, __mocks__/) is the move P4 Continuous Quality Feedback and the canonical literature both prefer.
Quotes
Good checklists are precise. They are efficient, to the point, and easy to use even in the most difficult situations. They do not try to spell out everything; they provide reminders of only the most critical and important steps.
The Definition of Done is a formal description of the state of the Increment when it meets the quality measures required for the product. The moment a Product Backlog item meets the Definition of Done, an Increment is born.
Professionals do not commit to anything they aren't sure they can deliver. The Definition of Done is the line between “sure” and “hopeful”.
Test coverage is a useful tool for finding untested parts of a codebase. Test coverage is of little use as a numeric statement of how good your tests are.
Evidence
Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.
- 01The Checklist ManifestoSupportsThe intellectual root. Aviation pilot checklists and Pronovost’s ICU central-line checklist; experts on hard problems beat their own judgement when they read from a list.
- 02The agile-era origin of the term. The DoD as a formal description of the state of the Increment when it meets the quality measures required for the product.
- 03Succeeding with AgileSupportsCh.10 is the practitioner playbook for assembling a DoD. Walks through team-level vs release-level definitions and the difference each makes.
- 04The Clean CoderSupportsCh.2 “Saying Yes” and ch.8 “Testing Strategies”. The checklist defines what a professional commitment means; the discipline-first flavour.
- 05Names the 24 capabilities of high-performing teams. The DoD sits inside continuous testing, test automation, and deployment automation; an empirical case for the row.
Twenty sources, three stances. The supporters are Gawande, Schwaber & Sutherland, Cohn, Martin's Clean Coder, and Forsgren: the checklist canon. The qualifiers further down push the line that coverage and DoD rows are diagnostics, not goals. The opposers argue checklists are bureaucracy; the steelman the case has to address.
Enforcement
Apply these rules in eslint.config.mjs. The full enforcement across every tenet lives on the implementation page.
| Rule | Tool | Catches |
|---|---|---|
| vitest --coverage | vitest --coverage | the coverage row of the DoD. Configure thresholds.lines / branches / functions to fail the script when coverage drops below the team floor. |
| tsc --noEmit | typescript-eslint | the type-check row. Strict mode plus noUncheckedIndexedAccess plus exactOptionalPropertyTypes catch the silent contract failures. |
| eslint . | ESLint core | the lint row. Every error in the config is a row on the DoD; warnings are the rows the team chose not to enforce yet. |
| jscpd . | jscpd | the dup-check row. Threshold 0 in this repo; non-zero is a refactoring task before merge. |
| prettier --check . | Prettier | the format row. Format failures are auto-fixable but never auto-merged. |
| husky pre-push | Husky pre-push | the gate. Runs the quality script on every push; the merge button doesn’t accept a red pre-push. |
| lint-staged | lint-staged | the per-commit subset. Runs the DoD on the staged files only so the loop stays under a second. |
| no-warning-comments | ESLint core | TODO / FIXME / WIP markers — the unfinished-work signal that wasn’t triaged before merge. |
eslint.config.mjsconfiguration snippet
import tseslint from 'typescript-eslint';
export default tseslint.config({
files: ['**/*.{ts,tsx}'],
rules: {
'no-warning-comments': ['error', { terms: ['todo', 'fixme', 'wip'], location: 'anywhere' }],
'no-restricted-syntax': ['error', {
selector: "CallExpression[callee.name='it'][arguments.length=1]",
message: 'Tests without an implementation block are pending — finish before merge.',
}],
}
});AI rules
.cursor/rules/p3-definition-of-done.mdc---
description: Prickles P3 — Definition of Done
globs: "**/*.{ts,tsx,js,jsx,py,java,php,md}"
alwaysApply: false
---
## Prickles P3 — Definition of Done
Done means the checklist passed. Coverage, lint, type-check, format, dup-check, AC verification, accessibility — every row machine-checked, every row signed off.
The list is the gate, not a guideline. The moment a row is treated as advisory, the row is dead.
Revisit the list every quarter. Remove rows that haven't caught anything; add rows that catch failures the team is paying for.
The AC at the end is the AC at the start (P2 Spec-First). Without P2's AC, the DoD has nothing to verify against.Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.
Counter-argument
The strongest steelman is the lean-throughput reading: every checklist row is a unit of overhead, and the team that ships fastest is the team that has stripped its DoD to the bone. Donald Reinertsen's Principles of Product Development Flow argues that batch-size discipline beats list discipline for delivery throughput.3The Principles of Product Development Flow (Celeritas, 2009). The lean-throughput steelman: every checklist row is overhead; batch-size discipline beats list discipline for delivery throughput. Marick's coverage taxonomy is the qualifier inside the row: target-driven coverage produces local optima.4“Test Coverage” bliki entry (martinfowler.com, 2012). Coverage targets are a smell; coverage is a diagnostic, not a goal. Pairs with Brian Marick’s coverage taxonomy as the qualifier inside the DoD’s coverage row. Ousterhout's critique is broadest: checklists are a bureaucratic answer to a design problem; the time spent on the list is time not spent on the design.
Counter-argument retort
Reinertsen's lean-throughput point lands when the DoD has become a backlog of compliance items the team can't justify item-by-item.3The Principles of Product Development Flow (Celeritas, 2009). The lean-throughput steelman: every checklist row is overhead; batch-size discipline beats list discipline for delivery throughput. The reply is to keep the list short, every row earning its keep, every row tied to a measurable failure mode. The seven-row npm run quality script in this repo is the working example: lint, sonar, type-check, format, dup-check, test+coverage, plus the AC verification on the PR. Each row catches a class of failure that has cost the team an hour in the past quarter; the list is short because the list is empirical.
Marick's coverage point is the qualifier inside the row, not a refutation of the row. Coverage as a target produces local optima — the engineer adds a meaningless test to the file with low coverage; the suite grows; nothing is checked. Coverage as a diagnostic surfaces the file with no tests at all, which is a different question. The DoD treats coverage as the diagnostic, with named exclusions for files that are already covered by the surrounding code ( .types.ts, .config.*) and a floor (90%) that the team has chosen because the cost of dropping below it is measurable.
Ousterhout's critique — that checklists are bureaucracy — is true for the DoD that has grown without pruning. The fix is not to drop the DoD; it is to revisit the DoD every quarter and remove the rows that haven't caught anything. A short DoD is a working DoD; a long DoD is theatre. The discipline survives Ousterhout; the bureaucratic growth doesn't.
In production work, the DoD is the row that lets the team merge to main on a Friday afternoon without holding the breath of every reviewer. P2 writes the AC; P4 runs the gates as the change is being written; P3 is the final read of the list before the merge button is pressed. The three rows compose; collapsing them loses one of the three failure modes each catches.
Notes
- [1]Atul Gawande — The Checklist Manifesto: How to Get Things Right (Metropolitan Books, 2009). Built on aviation pilot checklists and Peter Pronovost’s ICU central-line checklist. Experts on hard problems beat their own judgement when they read from a list.
- [2]Noah Shinn et al. — “Reflexion: Language Agents with Verbal Reinforcement Learning”, NeurIPS 2023. The agent grounded in machine-checkable feedback improves; the agent grading itself does not. Generalised by Madaan (Self-Refine) and Gou (CRITIC).
- [3]Donald G. Reinertsen — The Principles of Product Development Flow (Celeritas, 2009). The lean-throughput steelman: every checklist row is overhead; batch-size discipline beats list discipline for delivery throughput.