TS4 Real-Dependency E2E

§0b

Opinion

I've cleaned up too many post-mortems where the test suite was green and the database migration shipped a broken column. The unit tests passed because the mock returned what the mock was told to return; the integration tests passed because they shared the same fakes; the bug was in the rename that the ORM accepted and the migration didn't. The fix is the Kent C. Dodds line written down twenty times in the back of the runbook: write an E2E test that uses the real DB.1 Same instinct generalises (real queue, real cache, real upstream API) the moment the dependency is part of the production data path.

The unit-vs-E2E inversion is the part most teams misread. Unit tests should mock everything outside the unit, the discipline TS7 Test Isolation enforces. E2E tests should mock nothing inside the system, the discipline TS4 enforces. The two are paired tenets, not contradictions. The boundary between the layers is what counts as the unit; the rules either side of that boundary are different by design.

Determinism is the operational discipline that makes the rule survive contact with CI. The database is rebuilt from a fixed seed before each run; the queue starts from a known commit; the upstream is pinned to a tagged release. Fowler's Eradicating Non-Determinism in Tests2 is the playbook; Google's “hermetic servers” pattern3 is the architectural pattern. The real dependency is sealed inside a boundary the test owns, not the world. The third-party SaaS that won't cooperate (a payment processor, a card-present terminal) gets a contract test plus a recorded interaction, and the narrow exception is documented as such, not as a green light to re-introduce mocks elsewhere.

Copy a note and link

Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.

End-to-end tests run against the real dependencies — real database, real queue, real API. Mocked dependencies pass mocked tests; only the real thing catches the migration that breaks production. Determinism comes from rebuilding state per run, not from replacing the dependency. The unit-tier mocks-everything contract is preserved; the E2E tier inverts it.

/tenet/real-dependency-e2e/TS4

§0c

AI eyes only

Rule: E2E tests touch real dependencies. Unit tests mock outside the unit; E2E tests mock nothing inside the system.

Reject: vi.mock of any internal service inside an E2E test. Reject: an E2E test that passes against an in-memory fake. Reject: collapsing the unit and E2E tiers because the unit tier is easier to write.

Generate: E2E tests that run against the real database, real HTTP, real filesystem. Migration fixtures, seed loaders, and teardown helpers are part of the test harness.

Diagnostic: take the database offline. If the E2E suite still passes, it is not E2E; the tier is mislabelled.

§0d

Why?

Catches the migration that breaks production. The ORM accepts the column rename; the database doesn't. Only the test that runs against the real database surfaces the mismatch in CI rather than at deploy.
Surfaces network-shaped failures: the queue that times out, the cache that returns stale data, the upstream that rate-limits at 100 RPS. None of these reproduce against a fake.
The hermetic boundary keeps the apex of the pyramid affordable. Service containers seed from a known commit; the suite stays minutes, not hours, because the run is engineered for determinism.
Pairs cleanly with TS7 Test Isolation. The two rules carve the unit-vs-E2E boundary at the right altitude; both ship, both stay honest about their layer.
Enables structural E2E. With the data real, the locator-side discipline of TS5 Structural Assertions can run against the live build — sitemap discovery, role-tree assertions, the works.
Stops coding agents from collapsing the E2E tier into the unit tier. A clear rule in AGENTS.md plus a CI workflow with real services keeps the model writing fixtures and seed scripts rather than mocks.
The narrow third-party carve-out forces honesty. Each unmockable dependency is documented with its contract test, recorded interaction, and rationale — not a green light to mock elsewhere.

The receipts

Origins, quoted passages, evidence, the strongest counter-argument and the reply.

§1

Origins

The “real database in E2E” line has three load-bearing primary sources. Kent C. Dodds's 2020 tweet1 is the snappiest formulation: write an E2E test that uses the real DB. Martin Fowler's Eradicating Non-Determinism in Tests2 set out the production-fidelity argument: a test that doesn't exercise production-shaped infrastructure cannot catch production-shaped failures. Google's “hermetic servers” series on the Testing Blog3 formalised the architectural pattern: the real backend, sealed behind a hermetic boundary the test owns.

The framing generalises past the database. Justin Searls's “Don't mock what you don't own” line6 is the boundary half — the third-party SaaS gets a recorded interaction or a contract test, but the dependencies you do own go in real. Software Engineering at Google's Test Sizes taxonomy7 places real DB, real queue, real network in “Large” tests as the fidelity ceiling of the pyramid. The Postgres-specific framing in the original Prickles standard is one instance of a broader rule that covers every real dependency in the production data path.

Vladimir Khorikov's classical school chapter8 documents the in-process variant: integration tests mock across process boundaries; touch the real database in-process. The chapter is the bridge between the unit-mock-everything discipline and the E2E-mock-nothing discipline; both ship, at different tiers, against the same architectural backbone. Hermetic but realistic3 is the operational shorthand.

In the Prickles canon, TS4 is one half of a paired-tenets contrast. TS7 Test Isolation is the inverse rule at the unit tier: the unit touches nothing outside the unit. The two rows together draw the unit-vs-E2E boundary cleanly — the testing-pillar equivalent of the F3 DRY ↔ S2 Wait for Three contrast in style. Apparent contradiction, different layers, both true.

§2

Quotes

Write an E2E test that uses the real DB.

Kent C. Dodds · 2020

For tests to run reliably, they need to start from a known good state and run to a known good state. Each test should rebuild that starting state from scratch, eradicating any leftover state from earlier tests.

Martin Fowler · Eradicating Non-Determinism in Tests (2011)

A hermetic server contains all of the software dependencies it needs in order to run, and no others. The benefit is reliability: the test does not depend on production state, and production state does not depend on the test.

Google Testing Blog · Hermetic Servers (2012)

Use mocks only for unmanaged dependencies. For managed dependencies, use the real thing.

Vladimir Khorikov · Unit Testing (2020), ch. 8

§3

Evidence

Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.

01
Write an E2E test that uses the real DBSupports
Kent C. Dodds · 2020
The snappiest formulation. The mainstream JavaScript-ecosystem articulation of TS4’s rule, in one sentence.
02
Eradicating Non-Determinism in TestsSupports
Martin Fowler · 2011
The production-fidelity argument paired with the determinism playbook. Rebuild starting state from scratch for each test environment so results are deterministic.
03
Hermetic Servers (Google Testing Blog)Supports
Google Testing Blog · 2012
The architectural pattern: real backends sealed behind a hermetic boundary the test owns. The phrase “hermetic but realistic” is the operational shorthand.
04
Software Engineering at Google, ch. 11 — Testing OverviewSupports
Titus Winters, Tom Manshreck & Hyrum Wright · 2020
FAANG-scale evidence. The Test Sizes taxonomy explicitly puts “real DB”, “real network between processes” in Large tests as the fidelity ceiling of the pyramid.
05
What Makes a Good Test? (Google Testing Blog)Supports
Mike Bland · 2014
Sharpens the fidelity-over-speed argument at the integration tier. A test that passes with mocks but fails in production is a false positive at the worst time.

Twenty sources, three stances. The supports are the canon: Kent C. Dodds' real-DB essay, Fowler on Eradicating Non-Determinism, the Google “hermetic servers” line, and the SwE at Google Test Sizes taxonomy. The qualifiers further down carve out the cost angle: real-dependency E2E is the apex of the pyramid, not the base. The opposers carry the steelman the reply has to address.

§4b

Enforcement

Viewing: TypeScript.

Apply these rules in .github/workflows/e2e.yml. The full enforcement across every tenet lives on the implementation page.

Rule	Tool	Catches
services: container	GitHub Actions service containers	in-memory database swaps. The <code>services</code> block in the workflow runs the real database, queue, or cache; the test process talks to it over the same driver and port shape as production.
playwright webServer + baseURL	Playwright	tests that bypass the deployed build. The <code>webServer</code> block makes Playwright start the actual application; the <code>baseURL</code> points at the assembled system, not a fixture.
playwright test.use({ trace })	Playwright	missing diagnostics on real-dependency failures. Trace mode captures the request log and DOM snapshots so a failed E2E run is debuggable from the report.
@testcontainers/node	@testcontainers/node	local-dev parity issues. Testcontainers spins up the same image CI uses; the developer’s laptop and the workflow run agree on the dependency surface.
pact provider verification	Pact	the third-party carve-out. The contract test runs against the vendor’s sandbox; the recorded interaction stands in for the dependency the team didn’t buy.
no in-memory database in E2E	ESLint custom rule (recommended)	<code>better-sqlite3</code>, <code>:memory:</code>, or in-process database imports inside <code>e2e/</code> directories. The lint rule fails the PR before the suite runs.

.github/workflows/e2e.ymlconfiguration snippet

name: e2e
on: [pull_request]
jobs:
  e2e:
    runs-on: ubuntu-latest
    services:
      mysql:
        image: mysql:8.0
        env:
          MYSQL_ROOT_PASSWORD: root
          MYSQL_DATABASE: app_test
        ports: ['3306:3306']
        options: --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=5
      redis:
        image: redis:7
        ports: ['6379:6379']
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v3
      - run: pnpm install --frozen-lockfile
      - run: pnpm migrate --url ${{ env.DATABASE_URL }}
      - run: pnpm seed
      - run: pnpm exec playwright install --with-deps
      - run: pnpm test:e2e
        env:
          DATABASE_URL: ${{ secrets.TEST_DATABASE_URL }}
          REDIS_URL: redis://127.0.0.1:6379

§4c

AI rules

Paste destination

File.cursor/rules/ts4-real-dependency-e2e.mdc

---
description: Prickles TS4 — Real-Dependency E2E
globs: "**/*.{e2e,spec,test}.{ts,tsx,js,jsx,py,java,php}"
alwaysApply: false
---

## Prickles TS4 — Real-Dependency E2E

E2E tests touch the real dependencies the production system touches. Real database, real message queue, real API surface — sealed inside a hermetic boundary, never replaced.

Mocks belong below the E2E line. Unit tests mock everything outside the unit; E2E tests mock nothing inside the system under test. The two layers do different jobs.

Determinism comes from rebuilding state, not from replacing the dependency. Seed the database fresh per run; bring up the queue from a known commit; pin the upstream version.

If a dependency cannot be run hermetically (third-party SaaS, payment processor sandbox, real card-present terminal), wrap it in a contract test plus a recorded interaction; reach for in-process replacements only when the third party itself sells one.

Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.

§5

Counter-argument

Counter

The strongest pushback is J. B. Rainsberger's4: integrated tests give false confidence. They can pass while internal contracts are broken; the suite reports green and the next unrelated change perturbs the wrong cell. The test-pyramid school5 sharpens the cost angle: real-dependency E2E is slow, flaky, and expensive at scale; the bulk of the suite has to be unit-shaped or the pipeline grinds to a halt. A third voice argues that “real DB” in CI is still not production DB (different load, different data volumes, different network topology) so the fidelity gain is theatre.

§6

Counter-argument retort

Rainsberger's objection4 is conceded inside its scope and refused outside it. Yes, the integrated test can pass while internal contracts are broken; the answer is to add contract tests at the boundaries, not to remove the E2E that catches the migration the contract test can't see. Both ship. Contract testing covers the interface; E2E covers the assembled system. The two coverages are orthogonal, not substitutes.

The test-pyramid objection5 is a quantity claim disguised as a fidelity claim. The pyramid says the bulk of the suite has to be unit-shaped because integration tests are slow; TS4 doesn't argue against that. TS4 is about the few E2E tests you do run — it doesn't say everything should be E2E. The pyramid stays; TS4 is the rule for the apex. Fowler's own non-determinism playbook2 handles the slow-and-flaky concern: rebuild state from a fixed seed; pin the upstream; seal the boundary. The pipeline does not have to grind to a halt; it does have to be engineered.

The “real DB is theatre” objection inverts itself the moment a migration breaks. Production fidelity is a continuum: the CI MySQL that the application reaches via the same driver and the same SQL surface catches the migration that the ORM accepted and the database didn't. The objection is correct that CI is not production; it's wrong that half-fidelity is no fidelity. The half that matters is the half the failure mode lives in.

The narrow residue is the third-party dependency that genuinely cannot be run hermetically. The card-present terminal, the upstream SaaS that bills per request, the certificate authority that signs against a real HSM. For those, write a contract test against the vendor’s sandbox or staging endpoint, record the interaction with a tool the team owns (Pact, WireMock, MSW for HTTP), and document the gap explicitly. That's not a licence to mock elsewhere; it's an explicit, audited carve-out for a dependency the team didn't buy and can't replicate.

The discipline reduces to two lines on the wall: real dependency or contract test, never both-mocked. The unit tier mocks everything outside the unit; the E2E tier mocks nothing inside the system. Same operating principle as TS7 on its mirrored surface.

§7

Notes

[1]Kent C. Dodds — Tweet, 2020-04-20: “Write an E2E test that uses the real DB.” The snappiest formulation of TS4’s rule and the one the original Prickles standard cited.
[2]Martin Fowler — Eradicating Non-Determinism in Tests (martinfowler.com, 2011). The production-fidelity argument paired with the determinism playbook: rebuild starting state from scratch per run.
[3]Google Testing Blog — “Hermetic Servers” (testing.googleblog.com, 2012). The architectural pattern: real backends sealed behind a hermetic boundary the test owns.

Disagree? Found a hole in the argument? Take issue with this tenet →

Last revised: 2026-04-27