TS6 Behaviour Testing

§0b

Opinion

I've watched too many teams confuse code coverage with confidence. They mock five collaborators, spy on a private helper, and assert that the helper got called with { id: 1 }. The bar turns green and the team feels safe. Then a junior refactors the helper from fetchUser(id) to fetchUser({ id }), the test goes red, and the team blames the junior. The junior didn't break anything; the test was wrong from the moment it was written. The contract was “the user shows up on the page” and the test asserted “a function got called with a particular shape of argument.”

Adam's punch (brittle tests are the testing equivalent of premature abstraction) is the sharpest framing this principle has had in print. Sandi Metz's 3×3 matrix named which messages to test (incoming queries, outgoing commands) and which to leave alone (sent-to-self).1 Kent C. Dodds named the failure mode: implementation details are things users don't see, hear, or know about, and tests that assert on them are tests that lock in decisions the design should be free to walk away from.2 Both arguments converge on the same answer: the test addresses the unit by what callers can perceive, then asks the public surface a question.

The query-as-user half is the operational version. Testing Library's priority list puts getByRole first because the role tree is what a screen reader, a keyboard user, and a sighted user all see in different forms.3 Playwright ports the same priority to E2E. Same axiom (test the public) on different surfaces. The unit test asserts on the rendered output; the component test asserts on the role tree; the E2E asserts on the page. None of them ask whether the helper function was called. F1 Single Responsibility is the design-side cousin: a unit that does one thing has one public surface, and one public surface is one test.

Copy a note and link

Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.

Test the public, not the private. Render and query like a user — getByRole, getByLabelText, getByText. Don't reach into private state, don't assert on internal helper calls, don't mock collaborators inside the same module. The test should break when behaviour changes, not when an internal name does.

/tenet/behaviour-testing/TS6

§0c

AI eyes only

Rule: test behaviour, not implementation. Drive the same surface the user uses.

Reject: tests that pin internal collaborators with vi.mock. Reject: tests that break on a refactor that does not change behaviour. Reject: testing-library/no-node-access violations.

Generate: tests that drive the public surface (function call, screen render, HTTP call) and assert observable outcomes. Use screen.getByRole first.

Diagnostic: refactor the implementation without touching the test. If the test breaks but the behaviour is unchanged, the test is wrong.

§0d

Why?

Tests survive refactoring. Renaming a private helper, splitting a function, swapping the internal data structure — the test still passes because the public surface is unchanged.
When a test fails, it's a real regression. The signal-to-noise ratio rises sharply once implementation-detail tests are gone; the suite stops crying wolf.
Painful tests become design feedback. Public-surface tests that strain — e.g., the unit needs five mocks — are signalling an SRP failure per F1 Single Responsibility, not a testing failure.
Accessibility lands automatically. getByRole traverses the role tree a screen reader uses; if the test passes, the page is at minimum role-traversable.
Reviews focus on behaviour. The reviewer doesn't need to ask “is this asserting the right private detail?” — the assertion is the public outcome.
Test names describe products, not code. “Submitting an empty form shows an error” beats “handleSubmit calls validate with the form payload”.
Coding agents stop writing brittle tests. With the rule in CLAUDE.md and Testing Library lint plugins on, the model writes screen.getByRole first because that's the cheapest path through review.

The receipts

Origins, quoted passages, evidence, the strongest counter-argument and the reply.

§1

Origins

The principle has the longest authority lineage in the entire Testing pillar. Sandi Metz's Magic Tricks of Testing talk1 at RailsConf 2013 introduced the canonical 3×3 matrix — incoming query, incoming command, outgoing query, outgoing command, sent-to-self — and named the rule explicitly: test only what crosses the public boundary; ignore sent-to-self entirely. Steve Freeman and Nat Pryce's Growing Object-Oriented Software, Guided by Tests gave it the most-quoted single-line statement in the literature: “only mock types you own.”5

Kent Beck's Test Desiderata8 distilled it into the “behavioural” property: tests describe behaviours, not procedures. Michael Feathers's seam concept in Working Effectively with Legacy Code identified the right place for the test boundary before identifying that there should be one.9 Google's testing blog ran the consensus piece in 2013 — Test Behaviour, Not Implementation10 — that turned the rule into folklore inside FAANG-scale codebases.

The component-testing flavour is younger but converged independently. Kent C. Dodds's Testing Implementation Details2 (2018) made the canonical title-essay for the principle in the React tradition, and Testing Library's priority list3 made it executable. Marcy Sutton showed why it lands accessibility as a side effect: querying by role traverses the same tree a screen reader walks. The PUP standards/unit-testing.md file owns the on-canon formulation, citing Testing Library's guiding principle verbatim: “the more your tests resemble the way your software is used, the more confidence they can give you.”

§2

Quotes

Implementation details are things which users of your code will not typically use, see, or know about.

Kent C. Dodds · Testing Implementation Details (2018)

Test incoming queries by asserting on the value they return. Test incoming commands by asserting on direct public side-effects. Send-to-self messages: do not test.

Sandi Metz · The Magic Tricks of Testing (RailsConf 2013)

Only mock types you own.

Steve Freeman & Nat Pryce · GOOS (2009)

The more your tests resemble the way your software is used, the more confidence they can give you.

Testing Library · Guiding Principles

§3

Evidence

Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.

01
“Testing Implementation Details”Supports
Kent C. Dodds · 2018
The canonical title-essay for the principle in the React era. Implementation details are anything users wouldn’t see, hear, or know about; tests should not assert on them.
02
“The Magic Tricks of Testing” (RailsConf talk)Supports
Sandi Metz · 2013
The canonical 3×3 matrix: incoming query / incoming command / sent-to-self / outgoing query / outgoing command. Test only what crosses the public boundary; ignore sent-to-self entirely.
03
Practical Object-Oriented Design in Ruby (POODR), Ch. 9Supports
Sandi Metz · 2012, 2018
Designing Cost-Effective Tests. The book-length treatment of the matrix, with the boundary-rule and the sent-to-self exclusion expressed as design heuristics.
04
Growing Object-Oriented Software, Guided by TestsSupports
Steve Freeman & Nat Pryce · 2009
The London-school reference. “Only mock types you own” — the most-quoted single line in the boundary-mocking literature.
05
Test DesiderataSupports
Kent Beck · 2019
The Behavioural property: tests describe behaviours, not procedures. Beck’s mature position is the public-surface position.

Seventeen sources, three lineages. The OOP-design tradition (Metz, Freeman & Pryce, Beck) frames it as a design rule. The component-testing tradition (Dodds at the top of the row) frames it as a query rule. The Google testing-blog tradition further down frames it as a maintenance rule. They agree on the answer.

§4

Examples

Viewing: TypeScript.

Avoid

Filerescue-hedgehog.spec.ts

// Before: pokes private cache and a private helper.it("caches the lookup and calls the formatter", () => {  rescueHedgehog({ name: "Spike" });  expect(rescueHedgehog.__internal.cache).toEqual({ Spike: "burrow-7" });  expect(formatHedgehogStatus).toHaveBeenCalledWith("Spike", "burrow-7");});

Prefer

Filerescue-hedgehog.spec.ts

// After: calls the public function, asserts the public result.it("schedules a rescue and returns the assigned reserve slot", () => {  const result = rescueHedgehog({ name: "Spike" });  expect(result).toEqual({ status: "scheduled", reserveSlot: "burrow-7" });});it("returns no slot when the reserve is at capacity", () => {  fillReserveToCapacity();  const result = rescueHedgehog({ name: "Spike" });  expect(result.status).toBe("waitlisted");}

§4b

Enforcement

Viewing: TypeScript.

Apply these rules in eslint.config.mjs. The full enforcement across every tenet lives on the implementation page.

Rule	Tool	Catches
testing-library/no-node-access	eslint-plugin-testing-library	.firstChild, .parentNode, .children walks inside tests — DOM traversal masquerading as a query.
testing-library/no-container	eslint-plugin-testing-library	container.querySelector — the canonical content-coupling escape hatch the rule rules out.
testing-library/prefer-screen-queries	eslint-plugin-testing-library	destructured render() helpers — keeps every query traceable through the screen object.
testing-library/no-render-in-lifecycle	eslint-plugin-testing-library	render() called inside beforeEach/beforeAll. A signal that the test is asserting on setup state, not behaviour.
jest-dom/prefer-in-document	eslint-plugin-jest-dom	expect(...).toHaveLength(1) and similar — pushes assertions toward the public toBeInTheDocument matcher.
jest-dom/prefer-to-have-text-content	eslint-plugin-jest-dom	raw .textContent === checks. The matcher keeps the test focused on what users perceive.
vitest/expect-expect	eslint-plugin-vitest	tests with no expect() call — the silent green that is worse than a red one. Catches empty placeholder tests left after a refactor.

eslint.config.mjsconfiguration snippet

import tseslint from 'typescript-eslint';
import testingLibrary from 'eslint-plugin-testing-library';
import jestDom from 'eslint-plugin-jest-dom';
import vitest from 'eslint-plugin-vitest';

export default tseslint.config({
  files: ['**/*.spec.{ts,tsx}', '**/*.test.{ts,tsx}'],
  plugins: { 'testing-library': testingLibrary, 'jest-dom': jestDom, vitest },
  rules: {
    'testing-library/no-node-access': 'error',
    'testing-library/no-container': 'error',
    'testing-library/prefer-screen-queries': 'error',
    'testing-library/await-async-events': 'error',
    'testing-library/no-render-in-lifecycle': 'error',
    'testing-library/no-debugging-utils': 'error',
    'jest-dom/prefer-in-document': 'error',
    'jest-dom/prefer-to-have-text-content': 'error',
    'vitest/no-conditional-tests': 'error',
    'vitest/no-conditional-in-test': 'error',
    'vitest/expect-expect': 'error',
  }
});

§4c

AI rules

Paste destination

File.cursor/rules/ts6-behaviour-testing.mdc

---
description: Prickles TS6 — Behaviour Testing
globs: "**/*.{spec,test}.{ts,tsx,js,jsx}"
alwaysApply: false
---

## Prickles TS6 — Behaviour Testing

Test through the public surface only. The signature, the rendered output, the response payload — anything a caller or user can see. Internal helpers and private collaborators stay invisible.

Render and query like a user. Locate by role, label, or text — the same tree a screen reader walks. Implementation-detail selectors (CSS, data-testid as default, DOM traversal) are anti-patterns.

If the test breaks because an internal name changed, the test was wrong. If it breaks because the user-visible behaviour changed, the test was right.

Don't reach into private state to verify. Drive the public action; assert the public outcome.

Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.

§5

Counter-argument

Counter

The honest pushback comes from the strict London-school reading of GOOS: every interaction with collaborators should be tested with mocks, including internal ones, because that's how tests drive design.5 If you only test the public surface, the argument runs, you defer the design feedback the test was supposed to give you. Tests-as-design-pressure (Beck's Tidy First?, Feathers's seam concept) extends the point: the moment a private helper becomes painful to test through the public surface is the moment the helper wanted to be its own unit. Refusing to mock it loses the signal.6

§6

Counter-argument retort

The London-school reading is half right. The mature classicist response — Fowler in Mocks Aren't Stubs, Seemann's Mocks for Commands, Stubs for Queries, Khorikov's Unit Testing — conceded the point that mocks are design pressure but moved the line: mock at the boundary, not in the middle.7 A mock at the port is a question about the system's shape; a mock on a private helper is an answer the test agreed to before the design did. The classicist consensus, the Google testing-blog material, and Khorikov's book all converge: only mock unmanaged dependencies that cross a process boundary.

The tests-as-design-pressure argument also lands — on the design side, not the test side. When a private helper becomes painful to test through the public surface, that is real signal: the helper wants to be a unit. Beck's answer in Tidy First? is to tidy first — extract the unit, give it its own public surface, then test it through that. The design moves; the test follows. The wrong move is to leave the helper private and reach in with a spy. That captures the pain without doing anything about it.6

The query-as-user half is what makes the rule stick in component code. Without it, “test the public” is hand-wavy and you get tests that assert on component.state.formData.email. With it, the locator priority list is the enforcement: getByRole first, then getByLabelText, getByText, and getByTestId as the last resort.3 Pair the rule with TS5 Structural Assertions for the E2E surface and TS7 Test Isolation for the unit boundary. Three rules, three surfaces, one principle: the test exercises what callers can see and nothing else.

§7

Notes

[1]Sandi Metz — “The Magic Tricks of Testing”, RailsConf 2013. The canonical 3×3 matrix: incoming query / incoming command / sent-to-self / outgoing query / outgoing command, with the rule “test only what crosses the public boundary.”
[2]Kent C. Dodds — “Testing Implementation Details” (kentcdodds.com, 2018). “Implementation details are things which users of your code will not typically use, see, or know about.”
[3]Testing Library maintainers — “About Queries — locator priority”. The order: getByRole, getByLabelText, getByPlaceholderText, getByText, getByDisplayValue, getByAltText, getByTitle, getByTestId.

Disagree? Found a hole in the argument? Take issue with this tenet →

Last revised: 2026-04-27