Case file — TS2

Specification by Example

Domain-language Gherkin.

The scenario that survives a redesign talks about what the system does, not what the page looks like. Domain language survives the next CSS commit; click-coordinates do not.

ByAdam LewisPublished3 May 2026Reading11 minVersionv1.0ConfidenceHigh
§0b

Opinion

I've sat in too many sprint reviews where the demo passed acceptance and shipped a bug. The scenario said the user can register; the implementation registered the user and then quietly dropped the welcome email. Acceptance was met by a paraphrase. The fix is the same one Adzic and North have been arguing for twenty years: write the example, in the domain's words, in a form a machine can run.1Gojko AdzicSpecification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Codifies the practice as seven process patterns and is the canonical reference for the term itself. If the example doesn't cover the welcome email, the example was wrong; the conversation has to happen before the code, not after.

The declarative-versus-imperative line is where most teams get this wrong. When I submit the form is what the user does; When I click the third button is what the front-end happens to be wired to today. The first survives a redesign because it talks about behaviour; the second breaks the morning a designer changes the layout. Cucumber's “Writing Better Gherkin” guide makes the rule explicit2Cucumber documentation“Writing Better Gherkin”. The declarative-versus-imperative rule in Cucumber’s own words: scenarios should describe the intended behaviour, not the implementation., and it is the rule almost every legacy step file ignores. Same shape as F2 Intention-Revealing Names: the domain noun beats the implementation noun every time.

The pipeline that makes the scenario load-bearing is what stops it being theatre. P2 Spec-First Execution commands writing the AC up front; AI4 Verifiable Specs commands writing it as something a machine can grade; TS2 commands writing it declaratively in domain words; TS1 takes the resulting check and turns it into the documentation. Four tenets, one artefact, four checkpoints. Skip any one and the spec rots into prose nobody runs.

Copy a note and link

Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.

Write the scenario in domain language. `When I submit the form` is the spec; `When I click the third button` is the implementation leaking through. Declarative not imperative; the scenario survives a redesign because it talks about what the system does, not what the DOM looks like.

/tenet/specification-by-example/TS2
§0c

AI eyes only

Rule: write Gherkin in domain language. No DOM selectors, no implementation detail.

Reject: When I click "#submit-btn". Reject: scenarios that use technical IDs the product owner does not recognise. Reject: imperative click-by-click tutorials masquerading as scenarios.

Generate: declarative scenarios in the ubiquitous language ( When I submit my registration). Step definitions handle the mechanics; the .feature file stays domain-only.

Diagnostic: a stakeholder reads the feature file and recognises every term. If a developer has to translate, the term does not belong in the scenario.

§0d

Why?

  • Stakeholder, developer and tester share one vocabulary — Evans's ubiquitous language by another name. The scenario is the artefact all three sign.
  • The AC in the ticket is the scenario in the feature file. The story moves from analysis to development without anybody rewriting the requirement in code-flavoured prose.
  • Declarative steps survive a redesign. When I submit the form still passes after the button moves; When I click the third button breaks the morning the layout changes.
  • Slots into the four-tenet spec pipeline — P2 Spec-First Execution AI4 Verifiable Specs → TS2 → TS1 Living Documentation. One artefact, four checkpoints, each owned by the pillar that polices it.
  • Coding agents treat the Gherkin as the canonical spec. A declarative scenario in domain words is a target the model can satisfy; an imperative one is a UI script the model reproduces step-by-step.
  • One scenario, one behaviour. The and-counting rule keeps the document diff-friendly, the failure modes attributable, and the regression net legible.
  • A declarative scenario is parameterisation-ready. Pair with TS3 Parameterised Scenarios when the example becomes a table, with property-based tests when the table becomes infinite.
The receipts
Origins, quoted passages, evidence, the strongest counter-argument and the reply.
§1

Origins

Specification by Example crystallised in 2009–2011. Gojko Adzic's Bridging the Communication Gap (2009)6Gojko AdzicBridging the Communication Gap: Specification by Example and Agile Acceptance Testing (Neuri, 2009). The earlier book that established the practice; acceptance tests as agreed truth between business and developers. set the practice; Specification by Example (2011)1Gojko AdzicSpecification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Codifies the practice as seven process patterns and is the canonical reference for the term itself. codified it as seven process patterns. Adzic chose the name deliberately over “BDD” or “ATDD”7iBorn / SBE communityAdzic on the choice of name: “the one with the least amount of negative baggage”; Aslak Hellesøy agreed it was a better name than “Behaviour Driven Development”. — he wanted the term with the least baggage and the cleanest description of the artefact: a specification, illustrated by examples, that the team agreed on.

The lineage runs back through Dan North's 2006 introduction of BDD,8Dan North“Introducing BDD” (dannorth.net, 2006). The article that named the practice. Given/When/Then proposed first as a documentation grammar, second as a testing grammar. where Given/When/Then was already proposed as a documentation grammar, and further back to Eric Evans's Domain-Driven Design (2003), whose ubiquitous language is the source of the “domain language” clause in the TS2 punch.9Eric EvansDomain-Driven Design (Addison-Wesley, 2003). Ubiquitous Language is the source of the “domain language” clause in the TS2 punch. The scenario is the language’s written form. Cucumber's “BDD History” page traces the chain explicitly.10Cucumber documentation“BDD History”. Traces the lineage from Evans’s ubiquitous language through North’s BDD into the modern Cucumber-era practice. Liz Keogh's “What is BDD?”11Liz Keogh“What is BDD?” (lizkeogh.com, 2015). The shortest accurate definition: examples in conversation to illustrate behaviour. is the cleanest modern compression: examples in conversation to illustrate behaviour.

The declarative-not-imperative rule, often the part working developers most need, is Cucumber's own Writing Better Gherkin guide.2Cucumber documentation“Writing Better Gherkin”. The declarative-versus-imperative rule in Cucumber’s own words: scenarios should describe the intended behaviour, not the implementation. The Three Amigos pattern (Dinwiddie, 2009)12George Dinwiddie“If you don’t automate acceptance tests...” (blog.gdinwiddie.com, 2009). The Three Amigos pattern named: business, development, testing in one room before the code starts. names the conversation half: business, development, testing in one room before the code starts. Wynne and Hellesøy's Cucumber Book13Matt Wynne & Aslak HellesøyThe Cucumber Book, 2nd ed. (Pragmatic Bookshelf, 2017). The reference text on operating SBE with a working test runner. is the reference text on putting the practice into a working test runner.

In the Prickles canon, TS2 is the third link in a four-tenet pipeline. P2 Spec-First Execution commands writing the AC up front; AI4 Verifiable Specs commands writing it as something a machine can grade; TS2 commands writing it declaratively in domain words; TS1 takes the resulting check and turns it into the documentation. The chain is the operating principle; TS2 is its testing-side phrasing rule.

§2

Quotes

Your scenarios should describe the intended behaviour of the system, not the implementation.

Cucumber Docs · Writing Better Gherkin

Using examples in conversation to illustrate behaviour.

Liz Keogh · What is BDD? (2015)

Given some initial context (the givens), When an event occurs, Then ensure some outcomes.

Dan North · Introducing BDD (2006)

Specification by Example is a set of process patterns that facilitate change in software products to ensure that the right product is delivered efficiently.

Gojko Adzic · Specification by Example (2011)
§3

Evidence

Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.

  1. 01
    Gojko Adzic · 2011
    Codifies the practice as seven process patterns; the canonical reference. Adzic chose the term over BDD or ATDD for the cleanest description of the artefact — a specification, illustrated by examples, that the team agreed on.
  2. 02
    Gojko Adzic · 2009
    The earlier book that established the practice. Acceptance tests as agreed truth between business and developers; SBE’s grounding in the conversation half.
  3. 03
    Dan North · 2006
    The article that named BDD; Given/When/Then proposed first as a documentation grammar, second as a testing grammar. The structural ancestor of Adzic’s Gherkin.
  4. 04
    What is BDD?Supports
    Liz Keogh · 2015
    “Using examples in conversation to illustrate behaviour.” The shortest accurate definition of the practice TS2 names.
  5. 05
    Liz Keogh · 2011
    The conversation-first framing. “BDD isn’t about the tools”; the spec is the discussion the team has, with the Gherkin file as its written record.

Eighteen sources, three stances. The supports above are the canon: Adzic, North, and Keogh on Specification by Example and BDD. The qualifiers further down carve out the part of agreement that no scenario can carry: the open question, the exploratory probe. The opposers carry the steelman the reply has to address.

§4

Examples

Viewing: TypeScript.
Avoid
Filerescue-steps.ts
// Before: imperative step. Couples to the button copy "Submit".When("I click the button with text {string}", async (label: string) => {  await page.getByRole("button", { name: label }).click();});// Scenario:// Given a hedgehog has arrived at the sanctuary// When I click the button with text "Submit"// Then the rescue is recorded
Prefer
Filerescue-steps.ts
// After: declarative step. Describes the volunteer's intent.When("I submit the hedgehog rescue form", async () => {  await rescueForm.submit(page);});// Scenario:// Given a hedgehog has arrived at the sanctuary// When I submit the hedgehog rescue form// Then the rescue is recorded
§4b

Enforcement

Viewing: TypeScript.

Apply these rules in .gherkin-lintrc. The full enforcement across every tenet lives on the implementation page.

RuleToolCatches
no-unnamed-scenariosgherkin-lintscenarios without a name. The name is the headline; an unnamed scenario is documentation that doesn’t announce itself.
no-files-without-scenariosgherkin-lintfeature files containing only background or rule blocks. A feature without a scenario doesn’t document a behaviour.
name-lengthgherkin-lintscenario names longer than 70 chars — the threshold beyond which the headline stops working as documentation. Adjust per house style.
use-andgherkin-lintconsecutive Given/When/Then steps that should have used <code>And</code> for readability. The shape of the scenario is part of the spec.
playwright-bdd defineBddConfigplaywright-bddscenarios with no matching step definition. The build fails before the test runs; the spec stays honest about what is and isn&rsquo;t automated.
@cucumber/cucumber --strict@cucumber/cucumberundefined or pending steps left in main. Strict mode fails the build until every step has an implementation.
.gherkin-lintrcconfiguration snippet
{
  "no-unnamed-features": "error",
  "no-unnamed-scenarios": "error",
  "no-empty-file": "error",
  "no-files-without-scenarios": "error",
  "indentation": ["error", { "Feature": 0, "Scenario": 2, "Step": 4, "ExampleHeader": 4 }],
  "use-and": "warn",
  "name-length": ["error", { "Feature": 70, "Scenario": 70 }],
  "no-multiple-empty-lines": "error",
  "no-trailing-spaces": "error"
}
§4c

AI rules

File.cursor/rules/ts2-specification-by-example.mdc
---
description: Prickles TS2 — Specification by Example
globs: "**/*.feature"
alwaysApply: false
---

## Prickles TS2 — Specification by Example

Write the scenario in the language a stakeholder uses. `When I submit the form` is the spec; `When I click the third button` is the implementation leaking through.

Declarative steps describe behaviour; imperative steps describe interaction. The first survives a redesign; the second breaks on every CSS edit.

The Given/When/Then is the conversation, then the documentation, then the test. If it can't be read aloud at a Three Amigos meeting, rewrite it.

One scenario, one behaviour. If the scenario uses `and` more than twice, split it.

Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.

§5

Counter-argument

Counter

The strongest pushback is Hillel Wayne's3Hillel Wayne&ldquo;Why Don&rsquo;t People Use Formal Methods?&rdquo; (hillelwayne.com, 2018). The natural-language-spec critique: examples cannot encode invariants, only their instances.: natural-language specs are ambiguous by design. A Gherkin scenario reads cleanly aloud and still means three things to three readers; the precision the test buys is precision against an example, not against the behaviour. For systems whose correctness is genuinely bit-precise (cryptography, distributed consensus, billing) the scenario is a ladder, not a roof. Brian Marick sharpens the same point: an automated example becomes a regression net the moment the team stops asking “what else?” about it.4Brian Marick&ldquo;Coverage-Driven Test Design&rdquo; (exampler.com, 2003). Coined the &ldquo;checking versus testing&rdquo; distinction. Treating an automated example as the spec rots the test from asking new questions into checking known answers. J. B. Rainsberger's broader objection5J. B. Rainsberger&ldquo;Integrated Tests Are A Scam&rdquo; (thecodewhisperer.com, 2010). The integrated test SBE produces gives false confidence; collaborators can drift while the scenario passes. is that the integrated test SBE produces gives false confidence; collaborators can drift while the scenario passes.

§6

Counter-argument retort

Reply

Hillel Wayne's objection3Hillel Wayne&ldquo;Why Don&rsquo;t People Use Formal Methods?&rdquo; (hillelwayne.com, 2018). The natural-language-spec critique: examples cannot encode invariants, only their instances. is conceded inside its scope and refused outside it. For the systems whose correctness is genuinely bit-precise — cryptographic primitives, distributed consensus, financial settlement — the right artefact is a formal spec (TLA+, Alloy, a typed invariant) and the Gherkin sits below it as the customer-facing illustration. For the systems most working developers ship most days — checkout flows, registration, scheduling, content management — the ambiguity Wayne names is real and small, and the upside of a stakeholder-readable spec dominates the residual ambiguity. The scenario is not the roof; it is the ceiling tile that covers the room you actually live in.

Marick's checking-versus-testing distinction4Brian Marick&ldquo;Coverage-Driven Test Design&rdquo; (exampler.com, 2003). Coined the &ldquo;checking versus testing&rdquo; distinction. Treating an automated example as the spec rots the test from asking new questions into checking known answers. is the strongest sharpening. The reply concedes and adds the practices that restore the questioning lens. TS3 Parameterised Scenarios generalises the example into a table, and property-based testing generalises the table into a generator that produces examples the developer hadn't imagined.14David R. MacIver&ldquo;What is Property-Based Testing?&rdquo; (Hypothesis docs). Property-based testing as the next rung past parameterised scenarios &mdash; the table generalised into a generator that produces examples the developer hadn&rsquo;t imagined. Exploratory testing kept on the team's books as a separate activity, not pretending to be regression, completes the picture. SBE is the regression-and-documentation half; the questioning half is its companion practice, not a substitute for it.

Rainsberger's “integrated tests are a scam”5J. B. Rainsberger&ldquo;Integrated Tests Are A Scam&rdquo; (thecodewhisperer.com, 2010). The integrated test SBE produces gives false confidence; collaborators can drift while the scenario passes. objection runs against treating SBE’s test as the only fidelity check. The reply concedes and adds contract testing at the boundaries; the scenario covers behaviour end-to-end, the contract test covers the interface. Both ship. Neither replaces the other.

The discipline reduces to four words on the conference room wall: write the example, agree the example, automate the example, run the example. The scenario is the spec; the spec is the contract; the contract is in CI. Same operating principle as TS1, focused on the form the artefact takes when stakeholders read it.

§7

Notes

  1. [1]Gojko AdzicSpecification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Codifies the practice as seven process patterns and is the canonical reference for the term itself.
  2. [2]Cucumber documentation&ldquo;Writing Better Gherkin&rdquo;. The declarative-versus-imperative rule in Cucumber&rsquo;s own words: scenarios should describe the intended behaviour, not the implementation.
  3. [3]Hillel Wayne&ldquo;Why Don&rsquo;t People Use Formal Methods?&rdquo; (hillelwayne.com, 2018). The natural-language-spec critique: examples cannot encode invariants, only their instances.
Disagree? Found a hole in the argument? Take issue with this tenet →
Last revised: 2026-04-27