AI3 Working Context

§0b

Opinion

The 1M-token context window is the most misread feature of the year. The lay reading is “paste in anything”; the literature reads it as “the room to design a context that fits.” Anthropic's own framing is the right one: context as a precious, finite resource.1 A model running near its window limit makes worse decisions than the same model running at half the window with the right tokens in the right order. Bigger windows did not retire the problem; they enlarged the surface across which the problem operates.

Drew Breunig's catalogue of context failures is the cleanest taxonomy in print: poisoning (a hallucination becomes load-bearing because it sits in the transcript), distraction (volume overwhelms the training signal), confusion (irrelevant tokens skew the response), clash (contradictory information later in the context overrides the earlier instruction).2 The long agent sessions I've sat with manifest the same shapes — usually three of the four, sometimes all of them. The fix is the discipline Anthropic ship into Claude Code as /compact and /clear: when the context starts working against the agent, take an action.

The other half of the rule is the surface itself. Anthropic's engineering team wrote it down explicitly: “the only channel from parent to subagent is the Agent tool's prompt string, so include any file paths, error messages, or decisions the subagent needs directly in that prompt.”3 Translate: anything outside the surface is outside the agent's mind. AI2 Persistent Brief goes into every session; volatile per-feature scratch comes out at the boundary; tools belong inside the surface so the agent never has to leave it (MCP-first). The merged principle is the discipline that makes the rest of the AI pillar work.

Copy a note and link

Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.

Context is a finite resource — even at 1M tokens. Compact, clear, or fork when the context starts working against you; bring tools to the agent rather than switching the agent to other surfaces; spawn subagents for parallel or large-output work so the orchestrator's window stays clean.

/tenet/working-context/AI3

§0c

AI eyes only

Rule: context is a finite resource. Compact, fork, or delegate before it works against you.

Reject: continuing in a transcript past the compact threshold. Reject: doing long-running or large-output work in the orchestrator's window. Reject: context-switching to surfaces the agent cannot observe.

Generate: offer to compact at the threshold. Spawn a subagent for parallel, long-running, or large-output work and return only the summary. Use MCP tools so results return to the same surface, not a different chat.

Diagnostic: estimate the working-set tokens for the next step. If they exceed half the window, route the step to a subagent.

§0d

Why?

Names the failure modes. Poisoning, distraction, confusion, clash — once teams have the vocabulary, they can spot the bug in the moment instead of blaming “the model got tired”.
Three verbs do the work. Compact when the transcript bloats; clear when the next task is unrelated; fork into a subagent when the work will produce a wall of output. No further abstractions needed.
Bring the tool to the agent. MCP servers for GitHub, Figma, the database, the linter mean the agent never has to be told to look outside its surface — the result is in the transcript, and the next turn can reason about it.
Subagent isolation is the modern session boundary. Heavy work runs in a child agent with its own clean context; only the summary returns. The orchestrator stays sharp.
Compounds with AI2 Persistent Brief. The brief loads on every session; per-feature context is volatile and disposable. Two halves of the same context-engineering discipline.
Survives the next window-size jump. The principle is the property (signal-to-noise) not the prescription (one feature per session). Whatever the window is next year, the rule still applies.
Aligned with Anthropic's first-party engineering posts. The vocabulary, the verbs, and the subagent boundary all come from Effective Context Engineering — not Prickles inventions.

The receipts

Origins, quoted passages, evidence, the strongest counter-argument and the reply.

§1

Origins

The first version of this discipline was “one feature per session” — the pre-2025 framing when context windows were small and agents drifted predictably as the transcript grew. The framing held for the 8K and 32K-token era. It held less well for the 128K era. By the time Claude 4.5 shipped 1M tokens it had become anachronistic; the agent could hold the entire feature plus the brief plus the plan plus the test output without coming close to the limit.

Drew Breunig's two-part essay on context failure (June 2025) is the work that reframed the problem.2 Long contexts don't fail because they hit the limit; they fail because the training signal degrades as irrelevant tokens accumulate. Breunig catalogues four mechanisms: poisoning (a hallucination earlier in the context becomes a load-bearing fact for later turns), distraction (volume overwhelms the relevant signal), confusion (the model overweights tokens that are unrelated to the task), clash (later instructions contradict earlier ones and the model picks the wrong one). All four get worse with session length, not better, and they get worse faster than the context window grows.

Anthropic's Effective Context Engineering for AI Agents (October 2025) is the first-party version of the same insight. The paper introduces the framing that drives the headline of this tenet: “context as a finite resource.”1 The same paper documents three primary tools for managing the surface: the committed brief loaded up front; primitives like glob and grep for just-in-time retrieval; and a memory tool (public beta) that stores information “outside the context window through a file-based system.”4

The other half of the merged tenet — the surface itself — comes from the multi-tool tradition that built up around MCP and subagents. Anthropic's subagent documentation is explicit: “subagents use their own isolated context windows, and only send relevant information back to the orchestrator.”3 The Agent SDK paper makes the prompt-string-as-only-channel claim that gives the surface its operational meaning: tools live inside the agent's surface, or they live in a place the agent can't reach. There is no third option.

§2

Quotes

Context is a precious, finite resource. Subagents use their own isolated context windows, and only send relevant information back to the orchestrator, rather than their full context.

Anthropic · Effective Context Engineering for AI Agents (October 2025)

Long contexts can be poisoned by hallucinations, distracted by volume, confused by irrelevance, and forced into clash by contradictory instructions. The window size is not the bottleneck; the signal-to-noise ratio inside it is.

Drew Breunig · How Long Contexts Fail (June 2025)

The only channel from parent to subagent is the Agent tool's prompt string, so include any file paths, error messages, or decisions the subagent needs directly in that prompt.

Anthropic · Building Agents with the Claude Agent SDK

Context Quarantine, Context Pruning, Context Summarization, Context Offloading. Manage the surface like you manage RAM.

Drew Breunig · How to Fix Your Context (June 2025)

§3

Evidence

Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.

01
Effective Context Engineering for AI AgentsSupports
Anthropic · 2025
The first-party statement of the headline framing: context as a precious, finite resource. Source for the persistent-brief / just-in-time-retrieval / memory-tool triad.
02
How Long Contexts FailSupports
Drew Breunig · 2025
Names the four failure modes: Poisoning, Distraction, Confusion, Clash. The taxonomy practitioners actually use to spot the bug in real sessions.
03
How to Fix Your ContextSupports
Drew Breunig · 2025
Companion piece to How Long Contexts Fail. Documents the verbs: Quarantine (subagents), Pruning, Summarization, Offloading (memory tools, RAG).
04
Why “Context Engineering” MattersSupports
Drew Breunig · 2025
Frames the discipline as the next layer past prompt engineering. The argument that context is the right unit of design for agentic work.
05
Building Agents with the Claude Agent SDKSupports
Anthropic · 2025
Subagent isolation as the modern session boundary. The only channel between parent and child is the prompt string — the surface-as-discipline reading at architectural scale.

Sixteen sources. The supporters cluster on Anthropic's engineering posts (Effective Context Engineering, Claude Agent SDK) and on Drew Breunig's three-essay series on context failure and repair. The qualifiers further down frame the same ideas in framework-neutral terms. The opposing voice (that the 1M-token window has retired the problem) is the reply's primary target.

§4b

Enforcement

Viewing: TypeScript.

Apply these rules in eslint.config.mjs. The full enforcement across every tenet lives on the implementation page.

Rule	Tool	Catches
max-lines	ESLint core	files that have grown past the model's cheap-edit budget. 150 lines is Prickles' working cap; tune to your team's preferred ceiling.
max-lines-per-function	ESLint core	monolithic functions the agent has to read in full to edit safely. Small functions keep context per edit small.
max-params	ESLint core	functions with too many parameters — high parameter count is a context-hostile signal even when the function body is small.
GitHub Actions — context budget	GitHub Actions	PRs that touch enough files to blow the agent's cheap-context budget. Token counting in CI makes the cost visible.

eslint.config.mjsconfiguration snippet

import tseslint from 'typescript-eslint';

export default tseslint.config({
  files: ['**/*.{ts,tsx}'],
  rules: {
    'max-lines': ['warn', { max: 150, skipBlankLines: true, skipComments: true }],
    'max-lines-per-function': ['warn', { max: 15, skipBlankLines: true }],
    'max-params': ['error', 3],
  }
});

§4c

AI rules

Paste destination

File.cursor/rules/ai3-working-context.mdc

---
description: Prickles AI3 — Working Context
globs: "**/*"
alwaysApply: true
---

## Prickles AI3 — Working Context

Treat the agent's context window as a precious, finite resource — Anthropic's own framing. Context is not free; tokens that crowd the model degrade its work.

Compact, clear, or fork when the context starts working against you. Long sessions accumulate poisoning, distraction, confusion, and clash (Breunig's four failure modes); the fix is to manage the surface, not to power through.

Bring tools to the agent (MCP-first). Don't context-switch the agent out of its surface; if it can't see the result, it can't reason about it.

Subagents are the modern session boundary. Long-running, large-output, parallel work belongs in a subagent with its own isolated context — only the relevant summary returns to the orchestrator.

Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.

§5

Counter-argument

Counter

The honest steelman is that the rule is now anachronistic. Claude 4.5 ships 1M tokens; GPT-5 ships 1M+; Gemini 2.5 Pro ships 2M+.5 “One feature per session” sounded like wisdom in the 8K-token era; today, the next feature's context fits twenty times over. The agent can hold the persistent brief, the planning thread, the in-flight diff, the test output, and the documentation page simultaneously. If the context window has grown faster than the work, the discipline of aggressively pruning it is solving a problem that no longer obtains.

§6

Counter-argument retort

The 1M-token argument concedes the surface and asks for the discipline to vanish. The surface is the discipline.

Bigger windows did not retire Breunig's four failure modes; they amplified them. A context that is 30% relevant and 70% noise is still 70% noise no matter how much fits inside it. Anthropic's own October 2025 paper is explicit on this point: the recommendation to treat context as a finite resource was written after the 1M window shipped, not before.1 The mechanism that motivates the rule is degradation of the training signal as irrelevant tokens accumulate; that mechanism does not care about the window size.

The empirical evidence is on Anthropic's side. Long-context benchmarks consistently show capability degradation past the “needle in a haystack” level — the model can find a fact in 1M tokens, but reasoning about that fact in the presence of 999,999 other tokens is materially worse than reasoning about it in the presence of a thousand. Hugging Face's benchmark suite, the LongBench papers, and Anthropic's own internal evaluations all point the same way. The window is not the bottleneck; the signal-to-noise ratio inside it is.

The compromise the steelman reaches for — “just paste everything in” — is the failure mode the rule names. Compact, clear, and fork are the verbs that work because they are the verbs that change the ratio. /compact in Claude Code is not arbitrary; it is the model summarising the transcript so the next turn starts with a higher signal-to-noise ratio than the last turn ended on. Subagents are not decorative; they are how you keep the orchestrator's context clean while the work itself runs in a surface that can absorb the noise. Pair this with AI1 The Intern Pattern and AI2 Persistent Brief and the surface is small, intentional, and reproducible across every session.

§7

Notes

[1]Anthropic — Effective Context Engineering for AI Agents (October 2025). Context as a precious, finite resource. The paper that ships the framing this tenet is built around — written after the 1M-token window, not before.
[2]Drew Breunig — How Long Contexts Fail (June 2025). Catalogues the four failure modes: Context Poisoning, Context Distraction, Context Confusion, Context Clash. Companion piece How to Fix Your Context (June 2025) lists the verbs: Quarantine, Pruning, Summarization, Offloading.
[3]Anthropic — Building Agents with the Claude Agent SDK. Subagents use their own isolated context windows; the only channel between parent and subagent is the prompt string. The architectural source for the surface-as-discipline reading.

Disagree? Found a hole in the argument? Take issue with this tenet →

Last revised: 2026-04-27