Case file — TS1

Living Documentation

Tests don't lie.

Prose lies. Tests fail. The test file is the spec because the spec is the thing that goes red when the code changes; nothing else in the repository has that property.

ByAdam LewisPublished3 May 2026Reading12 minVersionv1.0ConfidenceHigh
§0b

Opinion

I have read enough README files lying about the API beneath them to stop trusting prose altogether. The doc says POST /orders accepts an idempotency key; the handler has not since the rewrite three sprints ago. The doc says this method returns null on miss; it throws. The doc was right when it was written and wrong by the next commit, and the next reader either trusts it and ships a bug, or distrusts every doc on the way past. The same instinct that drives F5 Self-Documenting Code (let the artefact carry the meaning) is what TS1 is on the testing surface.

The test in CI cannot drift. It either passes or it fails, and a green build is the only signal in the codebase that says this is still true today. Gojko Adzic named this the seventh of his seven Specification by Example patterns: evolving living documentation as the exhaust of an SBE engine.1Gojko AdzicSpecification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Pattern #7 of the seven SBE process patterns: “Evolving Living Documentation.” The earliest verbatim use of the term as a named outcome of automated examples. Cyrille Martraire's 2019 book widens it past Gherkin: living glossaries, code-derived diagrams, runtime documentation, every artefact that updates because it has to or breaks because it must.2Cyrille MartraireLiving Documentation: Continuous Knowledge Sharing by Design (Addison-Wesley, 2019). The first standalone book on the subject; widens it past Gherkin into a 15-chapter taxonomy of techniques whose property is co-located, evolving, machine-checkable. Same idea on different surfaces.

The practical consequence is the one a working developer feels: stop writing the README first and the tests later. Write the tests; let the README be a paragraph that points at the test file and the Gherkin folder. The reader who needs the truth runs the suite. The reader who needs the gist reads the test names. Neither reader is reading prose written six months ago by somebody who has since left.

Copy a note and link

Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.

Tests don't lie. The test file is the most reliable description of behaviour you have — prose drifts, scenarios fail. If a behaviour matters, prove it with a test; if a doc explains what a test would have caught, replace the doc with the test.

/tenet/living-documentation/TS1
§0c

AI eyes only

Rule: tests are the spec. Test names read as full sentences describing one behaviour.

Reject: test names like handlesEdgeCase, worksCorrectly, test1. Reject: prose docs that re-explain what the tests already prove.

Generate: test names that read as sentences ( rejects an order whose total is negative). Each test isolates one behaviour the spec calls out. The test list is the behavioural index of the unit.

Diagnostic: read the test names aloud as a list. They must read as the behavioural spec of the unit, with no help from the implementation.

§0d

Why?

  • Documentation that fails when it lies. The build is red until the prose and the behaviour agree, so the next reader sees the truth or sees a broken pipeline.
  • The spec lives where the code lives. The test file ships in the same PR as the change it describes, so the diff carries both the behaviour and the assertion of the behaviour.
  • Reviewers read the test names first. A directory of spec files reads as a feature list; a directory of READMEs reads as historical fiction.
  • Same operating principle as F5 Self-Documenting Code on the testing surface. The artefact carries the meaning, not the prose beside it.
  • Coding agents read the test as a specification they can satisfy. A green run is the only signal in the repo that says “this is still true today,” and the model treats it as such.
  • Pair with TS2 Specification by Example when the reader is a stakeholder; the same scenario doubles as feature documentation and acceptance check.
  • Cuts the class of bugs caused by stale prose — the README that lies about the API, the comment that describes the previous behaviour, the wiki page nobody updated after the rewrite.
The receipts
Origins, quoted passages, evidence, the strongest counter-argument and the reply.
§1

Origins

The phrase living documentation enters the working vocabulary through Gojko Adzic and the Specification by Example community. Adzic's 2011 book names “evolving living documentation” as the seventh of his seven SBE process patterns — the exhaust of an engine in which automated examples are written, refined, and kept in human-readable form alongside the code they describe.1Gojko AdzicSpecification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Pattern #7 of the seven SBE process patterns: “Evolving Living Documentation.” The earliest verbatim use of the term as a named outcome of automated examples. The framing is downstream of Dan North's 2006 introduction of BDD, where Given/When/Then was already proposed as a documentation grammar more than a testing grammar.8Dan North“Introducing BDD” (dannorth.net, 2006). The article that named the practice and proposed Given/When/Then first as a documentation grammar, second as a testing grammar — the prioritisation TS1 inherits.

Cyrille Martraire's Living Documentation (2019) is the first dedicated book on the term, and it widens the scope past BDD scenarios into a 15-chapter taxonomy: living glossaries, runtime documentation, refactorable architecture diagrams, code-derived explanations, generated context maps.2Cyrille MartraireLiving Documentation: Continuous Knowledge Sharing by Design (Addison-Wesley, 2019). The first standalone book on the subject; widens it past Gherkin into a 15-chapter taxonomy of techniques whose property is co-located, evolving, machine-checkable. Behaviour-driven scenarios are one chapter inside that book; the rule is bigger. Anything that is authoritative and co-located with the artefact it describes counts. The test file is the canonical instance because it has the second property by construction.

Kent Beck's Test Desiderata names the underlying property as Predictive: when tests pass, the code works; when they fail, you know.9Kent BeckTest Desiderata (kentbeck.github.io, 2019). Names the underlying property as Predictive: when tests pass, the code works; when they fail, you know. Gerard Meszaros's xUnit Test Patterns catalogues the same property under Tests as Documentation, and frames the inverse smell — Obscure Test — as the failure mode that erodes it.10Gerard MeszarosxUnit Test Patterns (Addison-Wesley, 2007). Catalogues “Tests as Documentation” and identifies the inverse smell “Obscure Test”. The reference text on the mechanics that earn the property. Ward Cunningham's 1992 paper on the WyCash portfolio system carries the ancestor instinct: documentation that lives where the code lives, edited as the code is edited, never far enough to drift.11Ward Cunningham“The WyCash Portfolio Management System” (OOPSLA 1992). Origin of the wiki instinct: documentation that lives where the code lives, edited as the code is edited. Ancestor of every co-located-spec doctrine since. The doctrine is not new; the testing form is the one that bites because tests are the only artefact whose drift is announced by a build failure.

The standard the principle most directly displaces is the comment that explains a method. F5 Self-Documenting Code is the foundations-side rule: improve the name and refactor instead of annotating. TS1 is the testing-side rule: write the test instead of the paragraph. The two tenets are one operating principle on different surfaces — the artefact carries the meaning, not the prose beside it.

§2

Quotes

An automated specification with examples, still in a human-readable form and easily accessible to all team members, becomes an executable specification.

Gojko Adzic · Specification by Example (2011), pattern #7

You don’t have to choose between working software and comprehensive, high-quality documentation: you can have both.

Cyrille Martraire · Living Documentation (2019)

The code reflects the documentation, and the documentation reflects the team’s shared understanding of the problem domain.

Cucumber Docs · BDD Overview

Predictive: if the tests all pass, then the code is good enough to deploy. Tests are the first piece of evidence, not the last.

Kent Beck · Test Desiderata (2019)
§3

Evidence

Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.

  1. 01
    Gojko Adzic · 2011
    The first verbatim use of “living documentation” as the named outcome of automated examples. Pattern #7 of seven: keep the executable specification human-readable and co-located with the code.
  2. 02
    Cyrille Martraire · 2019
    The only book-length treatment of the subject. Widens the scope past Gherkin into glossaries, runtime documentation, refactorable diagrams, code-derived prose. BDD scenarios are one chapter inside a 15-chapter taxonomy.
  3. 03
    BDD OverviewSupports
    Cucumber documentation · 2020+
    “The code reflects the documentation, and the documentation reflects the team’s shared understanding.” The canonical Cucumber framing of the property TS1 names.
  4. 04
    Kent Beck · 2003
    The foundational instinct: tests as runnable specifications. Read together with the “Predictive” entry in Beck’s later Test Desiderata for the property TS1 abstracts.
  5. 05
    Kent Beck · 2019
    Names the property as Predictive: when tests pass, the code works; when they fail, you know. The truth-axis claim TS1 leans on.

Twenty sources, three stances. The supports are the canon: Adzic, Martraire, Cucumber, and Beck. The qualifiers further down carve out the part of documentation tests cannot carry: conceptual explanation, exploratory questioning. The opposers sharpen the steelman the reply has to address.

§4

Examples

Viewing: TypeScript.
Avoid
Filesanctuary-census.ts
// Returns the count of hedgehogs in the sanctuary.export function countActiveHedgehogs(sanctuary: Sanctuary): number {  return sanctuary.intake.filter((h) => h.status !== "released").length;}
Prefer
Filesanctuary-census.spec.ts
// After: no comment to lie. The test name is the specification.export function countActiveHedgehogs(sanctuary: Sanctuary): number {  return sanctuary.intake.filter((h) => h.status !== "released").length;}it("excludes released hedgehogs from the active count", () => {  const sanctuary: Sanctuary = { intake: [    { name: "boris", status: "resident" },    { name: "prickles", status: "released" },    { name: "spike", status: "recovering" },  ] };  expect(countActiveHedgehogs(sanctuary)).toBe(2);});
§4b

Enforcement

Viewing: TypeScript.

Apply these rules in eslint.config.mjs. The full enforcement across every tenet lives on the implementation page.

RuleToolCatches
vitest/valid-title@vitest/eslint-plugintest names that start with “should”, “test that” or other regex-shaped fragments. Forces sentence-case behavioural titles.
vitest/no-disabled-tests@vitest/eslint-plugin“.skip” or “xit” calls left in main. A disabled test is documentation that lies; the suite says behaviour is asserted when it isn’t.
vitest/no-focused-tests@vitest/eslint-plugin“.only” left in committed tests. Same failure mode as a skip: the suite reports green but most of it never ran.
vitest/expect-expect@vitest/eslint-plugintests with no assertion. A test that asserts nothing is a sentence with no verb — it can’t be documentation.
vitest/no-identical-title@vitest/eslint-pluginduplicate test names. The suite output is a feature list; duplicate entries make it useless as documentation.
vitest/prefer-each@vitest/eslint-pluginloops of duplicated <code>it()</code> blocks. <code>it.each</code> turns each row into its own line in the reporter, which reads as documentation.
vitest --reporter=verboseVitestmissing test-level visibility. The verbose reporter prints every <code>describe</code>/<code>it</code> name &mdash; the artefact you point reviewers at as the spec.
eslint.config.mjsconfiguration snippet
import vitest from '@vitest/eslint-plugin';
import jestFormatting from 'eslint-plugin-jest-formatting';

export default [
  {
    files: ['**/*.{spec,test}.{ts,tsx}'],
    plugins: { vitest, 'jest-formatting': jestFormatting },
    rules: {
      'vitest/valid-title': ['error', { ignoreTypeOfDescribeName: false }],
      'vitest/valid-describe-callback': 'error',
      'vitest/no-disabled-tests': 'error',
      'vitest/no-focused-tests': 'error',
      'vitest/no-identical-title': 'error',
      'vitest/expect-expect': 'error',
      'vitest/no-conditional-tests': 'error',
      'vitest/prefer-each': 'warn',
      'jest-formatting/padding-around-describe-blocks': 'error',
      'jest-formatting/padding-around-test-blocks': 'error',
    },
  },
];
§4c

AI rules

File.cursor/rules/ts1-living-documentation.mdc
---
description: Prickles TS1 — Living Documentation
globs: "**/*.{ts,tsx,js,jsx,py,java,php,feature}"
alwaysApply: false
---

## Prickles TS1 — Living Documentation

If a behaviour matters, write a test that proves it. The test is the spec; prose docs are a paraphrase.

When the code changes, the test fails or it doesn't. Prose can drift in silence; a test in CI cannot. The first signal is a red build, not a stale paragraph.

Name the test for the behaviour, not the method. `it('rejects an order whose total is negative')` reads as a sentence; `it('throws on negative total')` reads as a regex.

Where stakeholders read the spec, write Gherkin. Where developers read the spec, write Vitest. Both are living documentation; neither is the comment that rotted.

Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.

§5

Counter-argument

Counter

The honest steelman is Daniele Procida's.3Daniele ProcidaDiátaxis (diataxis.fr, 2017+). Splits documentation into tutorials, how-tos, references and explanations. The honest steelman: tests primarily produce one (reference), not all four. Diátaxis splits documentation into four kinds (tutorials, how-tos, references, explanations) and tests primarily produce one of them. The reference is automatable; the tutorial that walks a new hire through the system, the how-to that explains the migration play, the explanation that traces why the architecture decided one way and not the other: none of those is a test. Brian Marick sharpens it from a different angle: treating the test file as the spec rots the test from asking questions into checking known answers, and the suite becomes a regression net rather than a probe.4Brian Marick&ldquo;Coverage-Driven Test Design&rdquo; (exampler.com, 2003). Coined the &ldquo;checking versus testing&rdquo; distinction; sharpened in Kaner, Bach &amp; Pettichord (2002). Treating the test file as the spec rots the test from asking new questions into checking known answers. A reader who needs the why has nowhere to go.

§6

Counter-argument retort

Reply

Procida is right that tests primarily produce one of his four documentation modes. The reply isn't to deny the other three; it's to refuse the comment that pretends to carry them. A tutorial is its own artefact, written for new hires, kept in sync because somebody reads it on day one. A how-to is its own artefact, owned by the team that runs the migration. An explanation lives in an ADR, dated, signed, and superseded rather than edited.5Michael NygardDocumenting Architecture Decisions (cognitect.com, 2011). The ADR pattern: dated, signed, superseded rather than edited &mdash; the artefact for the explanation half of Procida&rsquo;s taxonomy. The class of prose TS1 displaces is the in-line comment about behaviour and the paragraph in the README that paraphrases an API — the prose that should have been a test and was written instead because the test was hard.

Marick's checking versus testing distinction4Brian Marick&ldquo;Coverage-Driven Test Design&rdquo; (exampler.com, 2003). Coined the &ldquo;checking versus testing&rdquo; distinction; sharpened in Kaner, Bach &amp; Pettichord (2002). Treating the test file as the spec rots the test from asking new questions into checking known answers. is the strongest sharpening: an automated test that becomes a regression net stops asking new questions of the system. The reply concedes and adds two artefacts. First, property-based tests — QuickCheck, Hypothesis, fast-check — restore the questioning lens by generating inputs the developer hasn't imagined.6Koen Claessen &amp; John Hughes&ldquo;QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs&rdquo; (ICFP 2000). The original property-based testing paper. Properties as the questioning lens that regression nets lack. Second, exploratory testing kept on the team's books as a separate activity, not pretending to be regression. The living-documentation property survives both: properties fail loudly when reality drifts; exploratory notes are dated and short-lived by design.

Hillel Wayne's objection7Hillel Wayne&ldquo;The Myth of Self-Documenting Code&rdquo; (buttondown.com/hillelwayne, 2021). Argues self-documentation is impossible because the writer brings invisible context the reader lacks. Extended to TS1: tests cannot encode the why on their own. — that “tests as spec” flattens the modal richness of a real specification (TLA+, formal methods, invariants stronger than any single example) — is conceded inside its scope and refused outside it. For the systems whose correctness needs formal proof, write the formal spec; for the systems most working developers ship most days, a Vitest file or a Gherkin scenario carries more truth than the README ever did. The honest residue is published-API surface where the contract is irreducibly prose: cite the OpenAPI, the JSDoc tag, the typed signature — see T1 Domain-Driven Types. Even there, the living-documentation property is the property the artefact has to earn; if the OpenAPI lies, the contract test catches it.

In production code the doc that explains is the doc that lies. The next reader trusts the doc; the doc is wrong; a bug ships. Write the test — the test is the spec, the spec is the doc, the doc is in CI. The discipline is the same one F5 enforces on the comment surface; TS1 is its testing-side corollary.

§7

Notes

  1. [1]Gojko AdzicSpecification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Pattern #7 of the seven SBE process patterns: &ldquo;Evolving Living Documentation.&rdquo; The earliest verbatim use of the term as a named outcome of automated examples.
  2. [2]Cyrille MartraireLiving Documentation: Continuous Knowledge Sharing by Design (Addison-Wesley, 2019). The first standalone book on the subject; widens it past Gherkin into a 15-chapter taxonomy of techniques whose property is co-located, evolving, machine-checkable.
  3. [3]Daniele ProcidaDiátaxis (diataxis.fr, 2017+). Splits documentation into tutorials, how-tos, references and explanations. The honest steelman: tests primarily produce one (reference), not all four.
Disagree? Found a hole in the argument? Take issue with this tenet →
Last revised: 2026-04-27