Real-Dependency E2E
Mocks hide the bugs that ship.
The bug that ships is the one the test never exercised. Mocked dependencies pass mocked tests; the migration that breaks production breaks the real database, the real queue, the real upstream. The only test that catches it is the one that touches them.
Opinion
I've cleaned up too many post-mortems where the test suite was green and the database migration shipped a broken column. The unit tests passed because the mock returned what the mock was told to return; the integration tests passed because they shared the same fakes; the bug was in the rename that the ORM accepted and the migration didn't. The fix is the Kent C. Dodds line written down twenty times in the back of the runbook: write an E2E test that uses the real DB.1Tweet, 2020-04-20: “Write an E2E test that uses the real DB.” The snappiest formulation of TS4’s rule and the one the original Prickles standard cited. Same instinct generalises (real queue, real cache, real upstream API) the moment the dependency is part of the production data path.
The unit-vs-E2E inversion is the part most teams misread. Unit tests should mock everything outside the unit, the discipline TS7 Test Isolation enforces. E2E tests should mock nothing inside the system, the discipline TS4 enforces. The two are paired tenets, not contradictions. The boundary between the layers is what counts as the unit; the rules either side of that boundary are different by design.
Determinism is the operational discipline that makes the rule survive contact with CI. The database is rebuilt from a fixed seed before each run; the queue starts from a known commit; the upstream is pinned to a tagged release. Fowler's Eradicating Non-Determinism in Tests2Eradicating Non-Determinism in Tests (martinfowler.com, 2011). The production-fidelity argument paired with the determinism playbook: rebuild starting state from scratch per run. is the playbook; Google's “hermetic servers” pattern3“Hermetic Servers” (testing.googleblog.com, 2012). The architectural pattern: real backends sealed behind a hermetic boundary the test owns. is the architectural pattern. The real dependency is sealed inside a boundary the test owns, not the world. The third-party SaaS that won't cooperate (a payment processor, a card-present terminal) gets a contract test plus a recorded interaction, and the narrow exception is documented as such, not as a green light to re-introduce mocks elsewhere.
Copy a note and link
Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.
End-to-end tests run against the real dependencies — real database, real queue, real API. Mocked dependencies pass mocked tests; only the real thing catches the migration that breaks production. Determinism comes from rebuilding state per run, not from replacing the dependency. The unit-tier mocks-everything contract is preserved; the E2E tier inverts it. /tenet/real-dependency-e2e/TS4
AI eyes only
Rule: E2E tests touch real dependencies. Unit tests mock outside the unit; E2E tests mock nothing inside the system.
Reject: vi.mock of any internal service inside an E2E test. Reject: an E2E test that passes against an in-memory fake. Reject: collapsing the unit and E2E tiers because the unit tier is easier to write.
Generate: E2E tests that run against the real database, real HTTP, real filesystem. Migration fixtures, seed loaders, and teardown helpers are part of the test harness.
Diagnostic: take the database offline. If the E2E suite still passes, it is not E2E; the tier is mislabelled.
Why?
- Catches the migration that breaks production. The ORM accepts the column rename; the database doesn't. Only the test that runs against the real database surfaces the mismatch in CI rather than at deploy.
- Surfaces network-shaped failures: the queue that times out, the cache that returns stale data, the upstream that rate-limits at 100 RPS. None of these reproduce against a fake.
- The hermetic boundary keeps the apex of the pyramid affordable. Service containers seed from a known commit; the suite stays minutes, not hours, because the run is engineered for determinism.
- Pairs cleanly with TS7 Test Isolation. The two rules carve the unit-vs-E2E boundary at the right altitude; both ship, both stay honest about their layer.
- Enables structural E2E. With the data real, the locator-side discipline of TS5 Structural Assertions can run against the live build — sitemap discovery, role-tree assertions, the works.
- Stops coding agents from collapsing the E2E tier into the unit tier. A clear rule in AGENTS.md plus a CI workflow with real services keeps the model writing fixtures and seed scripts rather than mocks.
- The narrow third-party carve-out forces honesty. Each unmockable dependency is documented with its contract test, recorded interaction, and rationale — not a green light to mock elsewhere.
Origins
The “real database in E2E” line has three load-bearing primary sources. Kent C. Dodds's 2020 tweet1Tweet, 2020-04-20: “Write an E2E test that uses the real DB.” The snappiest formulation of TS4’s rule and the one the original Prickles standard cited. is the snappiest formulation: write an E2E test that uses the real DB. Martin Fowler's Eradicating Non-Determinism in Tests2Eradicating Non-Determinism in Tests (martinfowler.com, 2011). The production-fidelity argument paired with the determinism playbook: rebuild starting state from scratch per run. set out the production-fidelity argument: a test that doesn't exercise production-shaped infrastructure cannot catch production-shaped failures. Google's “hermetic servers” series on the Testing Blog3“Hermetic Servers” (testing.googleblog.com, 2012). The architectural pattern: real backends sealed behind a hermetic boundary the test owns. formalised the architectural pattern: the real backend, sealed behind a hermetic boundary the test owns.
The framing generalises past the database. Justin Searls's “Don't mock what you don't own” line6“Don’t mock what you don’t own” framing (testdouble.js README, 2014). The boundary half: third-party dependencies get contract tests; the dependencies you do own go in real. is the boundary half — the third-party SaaS gets a recorded interaction or a contract test, but the dependencies you do own go in real. Software Engineering at Google's Test Sizes taxonomy7Software Engineering at Google (O’Reilly, 2020), ch. 11 Testing Overview. The Test Sizes taxonomy places real DB, real queue, real network in “Large” tests as the fidelity ceiling of the pyramid. places real DB, real queue, real network in “Large” tests as the fidelity ceiling of the pyramid. The Postgres-specific framing in the original Prickles standard is one instance of a broader rule that covers every real dependency in the production data path.
Vladimir Khorikov's classical school chapter8Unit Testing: Principles, Practices, and Patterns (Manning, 2020), ch. 8 Why integration testing? The classical school’s defence of real-DB integration: mocks for unmanaged dependencies only. documents the in-process variant: integration tests mock across process boundaries; touch the real database in-process. The chapter is the bridge between the unit-mock-everything discipline and the E2E-mock-nothing discipline; both ship, at different tiers, against the same architectural backbone. Hermetic but realistic3“Hermetic Servers” (testing.googleblog.com, 2012). The architectural pattern: real backends sealed behind a hermetic boundary the test owns. is the operational shorthand.
In the Prickles canon, TS4 is one half of a paired-tenets contrast. TS7 Test Isolation is the inverse rule at the unit tier: the unit touches nothing outside the unit. The two rows together draw the unit-vs-E2E boundary cleanly — the testing-pillar equivalent of the F3 DRY ↔ S2 Wait for Three contrast in style. Apparent contradiction, different layers, both true.
Quotes
Write an E2E test that uses the real DB.
For tests to run reliably, they need to start from a known good state and run to a known good state. Each test should rebuild that starting state from scratch, eradicating any leftover state from earlier tests.
A hermetic server contains all of the software dependencies it needs in order to run, and no others. The benefit is reliability: the test does not depend on production state, and production state does not depend on the test.
Use mocks only for unmanaged dependencies. For managed dependencies, use the real thing.
Evidence
Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.
- 01The snappiest formulation. The mainstream JavaScript-ecosystem articulation of TS4’s rule, in one sentence.
- 02The production-fidelity argument paired with the determinism playbook. Rebuild starting state from scratch for each test environment so results are deterministic.
- 03The architectural pattern: real backends sealed behind a hermetic boundary the test owns. The phrase “hermetic but realistic” is the operational shorthand.
- 04FAANG-scale evidence. The Test Sizes taxonomy explicitly puts “real DB”, “real network between processes” in Large tests as the fidelity ceiling of the pyramid.
- 05Sharpens the fidelity-over-speed argument at the integration tier. A test that passes with mocks but fails in production is a false positive at the worst time.
Twenty sources, three stances. The supports are the canon: Kent C. Dodds' real-DB essay, Fowler on Eradicating Non-Determinism, the Google “hermetic servers” line, and the SwE at Google Test Sizes taxonomy. The qualifiers further down carve out the cost angle: real-dependency E2E is the apex of the pyramid, not the base. The opposers carry the steelman the reply has to address.
Enforcement
Apply these rules in .github/workflows/e2e.yml. The full enforcement across every tenet lives on the implementation page.
| Rule | Tool | Catches |
|---|---|---|
| services: container | GitHub Actions service containers | in-memory database swaps. The <code>services</code> block in the workflow runs the real database, queue, or cache; the test process talks to it over the same driver and port shape as production. |
| playwright webServer + baseURL | Playwright | tests that bypass the deployed build. The <code>webServer</code> block makes Playwright start the actual application; the <code>baseURL</code> points at the assembled system, not a fixture. |
| playwright test.use({ trace }) | Playwright | missing diagnostics on real-dependency failures. Trace mode captures the request log and DOM snapshots so a failed E2E run is debuggable from the report. |
| @testcontainers/node | @testcontainers/node | local-dev parity issues. Testcontainers spins up the same image CI uses; the developer’s laptop and the workflow run agree on the dependency surface. |
| pact provider verification | Pact | the third-party carve-out. The contract test runs against the vendor’s sandbox; the recorded interaction stands in for the dependency the team didn’t buy. |
| no in-memory database in E2E | ESLint custom rule (recommended) | <code>better-sqlite3</code>, <code>:memory:</code>, or in-process database imports inside <code>e2e/</code> directories. The lint rule fails the PR before the suite runs. |
.github/workflows/e2e.ymlconfiguration snippet
name: e2e
on: [pull_request]
jobs:
e2e:
runs-on: ubuntu-latest
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: root
MYSQL_DATABASE: app_test
ports: ['3306:3306']
options: --health-cmd="mysqladmin ping" --health-interval=10s --health-timeout=5s --health-retries=5
redis:
image: redis:7
ports: ['6379:6379']
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v3
- run: pnpm install --frozen-lockfile
- run: pnpm migrate --url ${{ env.DATABASE_URL }}
- run: pnpm seed
- run: pnpm exec playwright install --with-deps
- run: pnpm test:e2e
env:
DATABASE_URL: ${{ secrets.TEST_DATABASE_URL }}
REDIS_URL: redis://127.0.0.1:6379AI rules
.cursor/rules/ts4-real-dependency-e2e.mdc---
description: Prickles TS4 — Real-Dependency E2E
globs: "**/*.{e2e,spec,test}.{ts,tsx,js,jsx,py,java,php}"
alwaysApply: false
---
## Prickles TS4 — Real-Dependency E2E
E2E tests touch the real dependencies the production system touches. Real database, real message queue, real API surface — sealed inside a hermetic boundary, never replaced.
Mocks belong below the E2E line. Unit tests mock everything outside the unit; E2E tests mock nothing inside the system under test. The two layers do different jobs.
Determinism comes from rebuilding state, not from replacing the dependency. Seed the database fresh per run; bring up the queue from a known commit; pin the upstream version.
If a dependency cannot be run hermetically (third-party SaaS, payment processor sandbox, real card-present terminal), wrap it in a contract test plus a recorded interaction; reach for in-process replacements only when the third party itself sells one.Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.
Counter-argument
The strongest pushback is J. B. Rainsberger's4“Integrated Tests Are A Scam” (thecodewhisperer.com, 2010). The principal counter-argument. Integrated tests give false confidence; the reply adds contract testing alongside, rather than removing E2E.: integrated tests give false confidence. They can pass while internal contracts are broken; the suite reports green and the next unrelated change perturbs the wrong cell. The test-pyramid school5Cohn’s Succeeding with Agile (2009) introduced the test pyramid; Fowler’s 2012 bliki entry codified it. The cost angle the reply addresses by treating TS4 as the apex rule. sharpens the cost angle: real-dependency E2E is slow, flaky, and expensive at scale; the bulk of the suite has to be unit-shaped or the pipeline grinds to a halt. A third voice argues that “real DB” in CI is still not production DB (different load, different data volumes, different network topology) so the fidelity gain is theatre.
Counter-argument retort
Rainsberger's objection4“Integrated Tests Are A Scam” (thecodewhisperer.com, 2010). The principal counter-argument. Integrated tests give false confidence; the reply adds contract testing alongside, rather than removing E2E. is conceded inside its scope and refused outside it. Yes, the integrated test can pass while internal contracts are broken; the answer is to add contract tests at the boundaries, not to remove the E2E that catches the migration the contract test can't see. Both ship. Contract testing covers the interface; E2E covers the assembled system. The two coverages are orthogonal, not substitutes.
The test-pyramid objection5Cohn’s Succeeding with Agile (2009) introduced the test pyramid; Fowler’s 2012 bliki entry codified it. The cost angle the reply addresses by treating TS4 as the apex rule. is a quantity claim disguised as a fidelity claim. The pyramid says the bulk of the suite has to be unit-shaped because integration tests are slow; TS4 doesn't argue against that. TS4 is about the few E2E tests you do run — it doesn't say everything should be E2E. The pyramid stays; TS4 is the rule for the apex. Fowler's own non-determinism playbook2Eradicating Non-Determinism in Tests (martinfowler.com, 2011). The production-fidelity argument paired with the determinism playbook: rebuild starting state from scratch per run. handles the slow-and-flaky concern: rebuild state from a fixed seed; pin the upstream; seal the boundary. The pipeline does not have to grind to a halt; it does have to be engineered.
The “real DB is theatre” objection inverts itself the moment a migration breaks. Production fidelity is a continuum: the CI MySQL that the application reaches via the same driver and the same SQL surface catches the migration that the ORM accepted and the database didn't. The objection is correct that CI is not production; it's wrong that half-fidelity is no fidelity. The half that matters is the half the failure mode lives in.
The narrow residue is the third-party dependency that genuinely cannot be run hermetically. The card-present terminal, the upstream SaaS that bills per request, the certificate authority that signs against a real HSM. For those, write a contract test against the vendor’s sandbox or staging endpoint, record the interaction with a tool the team owns (Pact, WireMock, MSW for HTTP), and document the gap explicitly. That's not a licence to mock elsewhere; it's an explicit, audited carve-out for a dependency the team didn't buy and can't replicate.
The discipline reduces to two lines on the wall: real dependency or contract test, never both-mocked. The unit tier mocks everything outside the unit; the E2E tier mocks nothing inside the system. Same operating principle as TS7 on its mirrored surface.
Notes
- [1]Kent C. Dodds — Tweet, 2020-04-20: “Write an E2E test that uses the real DB.” The snappiest formulation of TS4’s rule and the one the original Prickles standard cited.
- [2]Martin Fowler — Eradicating Non-Determinism in Tests (martinfowler.com, 2011). The production-fidelity argument paired with the determinism playbook: rebuild starting state from scratch per run.
- [3]Google Testing Blog — “Hermetic Servers” (testing.googleblog.com, 2012). The architectural pattern: real backends sealed behind a hermetic boundary the test owns.