Specification by Example
Domain-language Gherkin.
The scenario that survives a redesign talks about what the system does, not what the page looks like. Domain language survives the next CSS commit; click-coordinates do not.
Opinion
I've sat in too many sprint reviews where the demo passed acceptance and shipped a bug. The scenario said the user can register; the implementation registered the user and then quietly dropped the welcome email. Acceptance was met by a paraphrase. The fix is the same one Adzic and North have been arguing for twenty years: write the example, in the domain's words, in a form a machine can run.1Specification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Codifies the practice as seven process patterns and is the canonical reference for the term itself. If the example doesn't cover the welcome email, the example was wrong; the conversation has to happen before the code, not after.
The declarative-versus-imperative line is where most teams get this wrong. When I submit the form is what the user does; When I click the third button is what the front-end happens to be wired to today. The first survives a redesign because it talks about behaviour; the second breaks the morning a designer changes the layout. Cucumber's “Writing Better Gherkin” guide makes the rule explicit2“Writing Better Gherkin”. The declarative-versus-imperative rule in Cucumber’s own words: scenarios should describe the intended behaviour, not the implementation., and it is the rule almost every legacy step file ignores. Same shape as F2 Intention-Revealing Names: the domain noun beats the implementation noun every time.
The pipeline that makes the scenario load-bearing is what stops it being theatre. P2 Spec-First Execution commands writing the AC up front; AI4 Verifiable Specs commands writing it as something a machine can grade; TS2 commands writing it declaratively in domain words; TS1 takes the resulting check and turns it into the documentation. Four tenets, one artefact, four checkpoints. Skip any one and the spec rots into prose nobody runs.
Copy a note and link
Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.
Write the scenario in domain language. `When I submit the form` is the spec; `When I click the third button` is the implementation leaking through. Declarative not imperative; the scenario survives a redesign because it talks about what the system does, not what the DOM looks like. /tenet/specification-by-example/TS2
AI eyes only
Rule: write Gherkin in domain language. No DOM selectors, no implementation detail.
Reject: When I click "#submit-btn". Reject: scenarios that use technical IDs the product owner does not recognise. Reject: imperative click-by-click tutorials masquerading as scenarios.
Generate: declarative scenarios in the ubiquitous language ( When I submit my registration). Step definitions handle the mechanics; the .feature file stays domain-only.
Diagnostic: a stakeholder reads the feature file and recognises every term. If a developer has to translate, the term does not belong in the scenario.
Why?
- Stakeholder, developer and tester share one vocabulary — Evans's ubiquitous language by another name. The scenario is the artefact all three sign.
- The AC in the ticket is the scenario in the feature file. The story moves from analysis to development without anybody rewriting the requirement in code-flavoured prose.
- Declarative steps survive a redesign. When I submit the form still passes after the button moves; When I click the third button breaks the morning the layout changes.
- Slots into the four-tenet spec pipeline — P2 Spec-First Execution → AI4 Verifiable Specs → TS2 → TS1 Living Documentation. One artefact, four checkpoints, each owned by the pillar that polices it.
- Coding agents treat the Gherkin as the canonical spec. A declarative scenario in domain words is a target the model can satisfy; an imperative one is a UI script the model reproduces step-by-step.
- One scenario, one behaviour. The
and-counting rule keeps the document diff-friendly, the failure modes attributable, and the regression net legible. - A declarative scenario is parameterisation-ready. Pair with TS3 Parameterised Scenarios when the example becomes a table, with property-based tests when the table becomes infinite.
Origins
Specification by Example crystallised in 2009–2011. Gojko Adzic's Bridging the Communication Gap (2009)6Bridging the Communication Gap: Specification by Example and Agile Acceptance Testing (Neuri, 2009). The earlier book that established the practice; acceptance tests as agreed truth between business and developers. set the practice; Specification by Example (2011)1Specification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Codifies the practice as seven process patterns and is the canonical reference for the term itself. codified it as seven process patterns. Adzic chose the name deliberately over “BDD” or “ATDD”7Adzic on the choice of name: “the one with the least amount of negative baggage”; Aslak Hellesøy agreed it was a better name than “Behaviour Driven Development”. — he wanted the term with the least baggage and the cleanest description of the artefact: a specification, illustrated by examples, that the team agreed on.
The lineage runs back through Dan North's 2006 introduction of BDD,8“Introducing BDD” (dannorth.net, 2006). The article that named the practice. Given/When/Then proposed first as a documentation grammar, second as a testing grammar. where Given/When/Then was already proposed as a documentation grammar, and further back to Eric Evans's Domain-Driven Design (2003), whose ubiquitous language is the source of the “domain language” clause in the TS2 punch.9Domain-Driven Design (Addison-Wesley, 2003). Ubiquitous Language is the source of the “domain language” clause in the TS2 punch. The scenario is the language’s written form. Cucumber's “BDD History” page traces the chain explicitly.10“BDD History”. Traces the lineage from Evans’s ubiquitous language through North’s BDD into the modern Cucumber-era practice. Liz Keogh's “What is BDD?”11“What is BDD?” (lizkeogh.com, 2015). The shortest accurate definition: examples in conversation to illustrate behaviour. is the cleanest modern compression: examples in conversation to illustrate behaviour.
The declarative-not-imperative rule, often the part working developers most need, is Cucumber's own Writing Better Gherkin guide.2“Writing Better Gherkin”. The declarative-versus-imperative rule in Cucumber’s own words: scenarios should describe the intended behaviour, not the implementation. The Three Amigos pattern (Dinwiddie, 2009)12“If you don’t automate acceptance tests...” (blog.gdinwiddie.com, 2009). The Three Amigos pattern named: business, development, testing in one room before the code starts. names the conversation half: business, development, testing in one room before the code starts. Wynne and Hellesøy's Cucumber Book13The Cucumber Book, 2nd ed. (Pragmatic Bookshelf, 2017). The reference text on operating SBE with a working test runner. is the reference text on putting the practice into a working test runner.
In the Prickles canon, TS2 is the third link in a four-tenet pipeline. P2 Spec-First Execution commands writing the AC up front; AI4 Verifiable Specs commands writing it as something a machine can grade; TS2 commands writing it declaratively in domain words; TS1 takes the resulting check and turns it into the documentation. The chain is the operating principle; TS2 is its testing-side phrasing rule.
Quotes
Your scenarios should describe the intended behaviour of the system, not the implementation.
Using examples in conversation to illustrate behaviour.
Given some initial context (the givens), When an event occurs, Then ensure some outcomes.
Specification by Example is a set of process patterns that facilitate change in software products to ensure that the right product is delivered efficiently.
Evidence
Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.
- 01Specification by ExampleSupportsCodifies the practice as seven process patterns; the canonical reference. Adzic chose the term over BDD or ATDD for the cleanest description of the artefact — a specification, illustrated by examples, that the team agreed on.
- 02The earlier book that established the practice. Acceptance tests as agreed truth between business and developers; SBE’s grounding in the conversation half.
- 03Introducing BDDSupportsThe article that named BDD; Given/When/Then proposed first as a documentation grammar, second as a testing grammar. The structural ancestor of Adzic’s Gherkin.
- 04What is BDD?Supports“Using examples in conversation to illustrate behaviour.” The shortest accurate definition of the practice TS2 names.
- 05Conversational Patterns in BDDSupportsThe conversation-first framing. “BDD isn’t about the tools”; the spec is the discussion the team has, with the Gherkin file as its written record.
Eighteen sources, three stances. The supports above are the canon: Adzic, North, and Keogh on Specification by Example and BDD. The qualifiers further down carve out the part of agreement that no scenario can carry: the open question, the exploratory probe. The opposers carry the steelman the reply has to address.
Examples
// Before: imperative step. Couples to the button copy "Submit".When("I click the button with text {string}", async (label: string) => { await page.getByRole("button", { name: label }).click();});// Scenario:// Given a hedgehog has arrived at the sanctuary// When I click the button with text "Submit"// Then the rescue is recorded
// After: declarative step. Describes the volunteer's intent.When("I submit the hedgehog rescue form", async () => { await rescueForm.submit(page);});// Scenario:// Given a hedgehog has arrived at the sanctuary// When I submit the hedgehog rescue form// Then the rescue is recorded
Enforcement
Apply these rules in .gherkin-lintrc. The full enforcement across every tenet lives on the implementation page.
| Rule | Tool | Catches |
|---|---|---|
| no-unnamed-scenarios | gherkin-lint | scenarios without a name. The name is the headline; an unnamed scenario is documentation that doesn’t announce itself. |
| no-files-without-scenarios | gherkin-lint | feature files containing only background or rule blocks. A feature without a scenario doesn’t document a behaviour. |
| name-length | gherkin-lint | scenario names longer than 70 chars — the threshold beyond which the headline stops working as documentation. Adjust per house style. |
| use-and | gherkin-lint | consecutive Given/When/Then steps that should have used <code>And</code> for readability. The shape of the scenario is part of the spec. |
| playwright-bdd defineBddConfig | playwright-bdd | scenarios with no matching step definition. The build fails before the test runs; the spec stays honest about what is and isn’t automated. |
| @cucumber/cucumber --strict | @cucumber/cucumber | undefined or pending steps left in main. Strict mode fails the build until every step has an implementation. |
.gherkin-lintrcconfiguration snippet
{
"no-unnamed-features": "error",
"no-unnamed-scenarios": "error",
"no-empty-file": "error",
"no-files-without-scenarios": "error",
"indentation": ["error", { "Feature": 0, "Scenario": 2, "Step": 4, "ExampleHeader": 4 }],
"use-and": "warn",
"name-length": ["error", { "Feature": 70, "Scenario": 70 }],
"no-multiple-empty-lines": "error",
"no-trailing-spaces": "error"
}AI rules
.cursor/rules/ts2-specification-by-example.mdc--- description: Prickles TS2 — Specification by Example globs: "**/*.feature" alwaysApply: false --- ## Prickles TS2 — Specification by Example Write the scenario in the language a stakeholder uses. `When I submit the form` is the spec; `When I click the third button` is the implementation leaking through. Declarative steps describe behaviour; imperative steps describe interaction. The first survives a redesign; the second breaks on every CSS edit. The Given/When/Then is the conversation, then the documentation, then the test. If it can't be read aloud at a Three Amigos meeting, rewrite it. One scenario, one behaviour. If the scenario uses `and` more than twice, split it.
Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.
Counter-argument
The strongest pushback is Hillel Wayne's3“Why Don’t People Use Formal Methods?” (hillelwayne.com, 2018). The natural-language-spec critique: examples cannot encode invariants, only their instances.: natural-language specs are ambiguous by design. A Gherkin scenario reads cleanly aloud and still means three things to three readers; the precision the test buys is precision against an example, not against the behaviour. For systems whose correctness is genuinely bit-precise (cryptography, distributed consensus, billing) the scenario is a ladder, not a roof. Brian Marick sharpens the same point: an automated example becomes a regression net the moment the team stops asking “what else?” about it.4“Coverage-Driven Test Design” (exampler.com, 2003). Coined the “checking versus testing” distinction. Treating an automated example as the spec rots the test from asking new questions into checking known answers. J. B. Rainsberger's broader objection5“Integrated Tests Are A Scam” (thecodewhisperer.com, 2010). The integrated test SBE produces gives false confidence; collaborators can drift while the scenario passes. is that the integrated test SBE produces gives false confidence; collaborators can drift while the scenario passes.
Counter-argument retort
Hillel Wayne's objection3“Why Don’t People Use Formal Methods?” (hillelwayne.com, 2018). The natural-language-spec critique: examples cannot encode invariants, only their instances. is conceded inside its scope and refused outside it. For the systems whose correctness is genuinely bit-precise — cryptographic primitives, distributed consensus, financial settlement — the right artefact is a formal spec (TLA+, Alloy, a typed invariant) and the Gherkin sits below it as the customer-facing illustration. For the systems most working developers ship most days — checkout flows, registration, scheduling, content management — the ambiguity Wayne names is real and small, and the upside of a stakeholder-readable spec dominates the residual ambiguity. The scenario is not the roof; it is the ceiling tile that covers the room you actually live in.
Marick's checking-versus-testing distinction4“Coverage-Driven Test Design” (exampler.com, 2003). Coined the “checking versus testing” distinction. Treating an automated example as the spec rots the test from asking new questions into checking known answers. is the strongest sharpening. The reply concedes and adds the practices that restore the questioning lens. TS3 Parameterised Scenarios generalises the example into a table, and property-based testing generalises the table into a generator that produces examples the developer hadn't imagined.14“What is Property-Based Testing?” (Hypothesis docs). Property-based testing as the next rung past parameterised scenarios — the table generalised into a generator that produces examples the developer hadn’t imagined. Exploratory testing kept on the team's books as a separate activity, not pretending to be regression, completes the picture. SBE is the regression-and-documentation half; the questioning half is its companion practice, not a substitute for it.
Rainsberger's “integrated tests are a scam”5“Integrated Tests Are A Scam” (thecodewhisperer.com, 2010). The integrated test SBE produces gives false confidence; collaborators can drift while the scenario passes. objection runs against treating SBE’s test as the only fidelity check. The reply concedes and adds contract testing at the boundaries; the scenario covers behaviour end-to-end, the contract test covers the interface. Both ship. Neither replaces the other.
The discipline reduces to four words on the conference room wall: write the example, agree the example, automate the example, run the example. The scenario is the spec; the spec is the contract; the contract is in CI. Same operating principle as TS1, focused on the form the artefact takes when stakeholders read it.
Notes
- [1]Gojko Adzic — Specification by Example: How Successful Teams Deliver the Right Software (Manning, 2011). Codifies the practice as seven process patterns and is the canonical reference for the term itself.
- [2]Cucumber documentation — “Writing Better Gherkin”. The declarative-versus-imperative rule in Cucumber’s own words: scenarios should describe the intended behaviour, not the implementation.
- [3]Hillel Wayne — “Why Don’t People Use Formal Methods?” (hillelwayne.com, 2018). The natural-language-spec critique: examples cannot encode invariants, only their instances.