AI6 Self-Review Pass

§0b

Opinion

For years I've told juniors to read their own pull requests before requesting review. The intern's instinct is to type, hit publish, and watch for feedback; the senior engineer's instinct is to draft, then read the diff like a stranger, then fix what was missed, then request review. The agent's instinct is the intern's, only faster: a thousand lines typed, done declared, the human waiting to find the bug. The fix is to give the agent the senior engineer's habit, in a form the protocol can enforce.

The peer-reviewed cluster is the strongest in the AI pillar. Reflexion (Shinn et al., NeurIPS 2023): adding a self-evaluation loop pushes HumanEval pass@1 from 80% to 91%.1 Self-Refine (Madaan et al., 2023): the same finding, the same shape, an LLM provides feedback on its own output and refines iteratively.2 CRITIC (Gou et al., ICLR 2024): tool-interactive self-critique, where the model validates its output against external tools before committing.3 Constitutional AI (Bai et al., Anthropic 2022): the model learns to critique and revise against a constitution rather than against a reward signal.4 Anthropic's Building Effective Agents formalises the production pattern as evaluator-optimizer: one call generates, another evaluates, the loop iterates until the artefact passes grading.5

The qualifier is essential. Huang et al. (ICLR 2024) showed that pure introspective self-correction can decrease accuracy: the model's confidence in the revision is uncorrelated with the revision's correctness when no external grader is involved.6 The implication is sharper than “ask the agent to check itself”: the self-review must run against external evidence. Tests, the persistent brief (AI2), the verifiable spec (AI4), types, current docs (AI5): the grader has to be something the model's confidence cannot edit.

The closing rule is editorial. The author edits last. The model can draft prose, plans, schemas, code; the brand voice and the editorial responsibility do not delegate. AI text on Prickles is always edited by a human before it ships; the same discipline applies in the codebase: the human reads the diff, the human approves the merge, the human takes responsibility for the change. The self-review pass is not a substitute for the human review; it is the work the agent does so the human review does not have to find the obvious bugs.

Copy a note and link

Grab this short comment and drop it into a PR comment or an LLM chat to prompt the right change.

Draft, then critique, then fix. The agent grades its own work against the brief, the spec, and the tests before handing back; the human grades the result. Pure introspection isn't enough — ground the self-review in external evidence (Reflexion, Self-Refine, CRITIC, Constitutional AI; Huang 2024 caveat).

/tenet/self-review-pass/AI6

§0c

AI eyes only

Rule: draft, then critique, then fix. Run an external grader before handing back.

Reject: handing back work that has not been re-checked. Reject: using the model's own sense that “this looks right” as the grader. Reject: silently swallowing a failing check.

Generate: after every change, run the external graders (spec, type-check, lint, build) in that order. Fix any failure before handing back. Name any remaining uncertainty in the handover.

Diagnostic: the handover must cite the grader output. If the grader did not run, the work is not ready.

§0d

Why?

Backed by the strongest peer-reviewed cluster in the AI pillar — Reflexion, Self-Refine, CRITIC, Constitutional AI, evaluator-optimizer. Four lines of evidence converging on one finding.
Filters the diff before it reaches the human reviewer. Obvious bugs land before review starts; the human review focuses on judgement rather than typo-hunting.
Absorbs the Huang 2024 qualifier rather than ignoring it. The self-review is grounded in external evidence (tests, brief, types) — not in the model's sense that the new draft “looks better.”
Compounds with AI1 The Intern Pattern. The self-review is the AI side of the review leg; the human review is the human side; both gates close before merge.
Compounds with AI4 Verifiable Specs. The runnable spec is the grader the self-review runs against; the spec exists for exactly this loop to close.
The author edits last. AI prose, plans, schemas, code — all drafts. The brand voice and the editorial responsibility do not delegate; the human takes the final pass.
Runs at machine speed. The agent's self-review is seconds, not minutes; the cost is paid by the agent, not the human. Cheap insurance against the agent's signature failure mode.

The receipts

Origins, quoted passages, evidence, the strongest counter-argument and the reply.

§1

Origins

The pre-agent ancestor is the senior-engineer habit of reading your own diff before posting it. Code-review folklore has carried this for decades; the move from junior to senior is partly the move from “I typed it; done” to “I typed it; let me read it again like a stranger.” Addy Osmani frames the post-agent restatement of the same idea: “treat AI-generated code as a helpful draft that must be verified ... never commit code you can't explain.”7

Anthropic's 2022 Constitutional AI paper opened the formal academic line.4 The model is trained to critique and revise its own outputs against a constitution; the constitution is the external grader. The principle generalised quickly: Reflexion (Shinn et al., NeurIPS 2023) showed verbal self-reflection on task feedback produces large benchmark gains;1 Self-Refine (Madaan et al., 2023) showed iterative self-feedback works cross-domain;2 CRITIC (Gou et al., ICLR 2024) showed tool-interactive self-critique works at production scale.3

Huang et al. (2024) is the qualifier the row is built around.6 Pure introspective self-correction without external grounding can decrease accuracy. The model's confidence in its revision is not a reliable signal of the revision's correctness. The implication for the row is sharp: the self-review must ground in external evidence — tests, the persistent brief, types, the runnable spec — not in the model's sense that the new draft “looks better.” The finding strengthens the rule rather than weakening it.

Anthropic's Building Effective Agents (December 2024) shipped the production pattern.5 The evaluator-optimizer workflow names the cycle: one call generates, another evaluates, the loop iterates until the artefact passes grading. The pattern is the agent analogue of pair programming where one engineer types and the other catches bugs — except the typing engineer is also the catching engineer, on a separate pass, against an external rubric.

§2

Quotes

Reflexion agents verbally reflect on task feedback signals, then maintain their own reflective text in an episodic memory buffer to induce better decision-making in subsequent trials.

Shinn et al. · Reflexion (NeurIPS 2023)

Generate an initial output using an LLM; then, the same LLM provides feedback for its output and uses it to refine itself, iteratively.

Madaan et al. · Self-Refine (NeurIPS 2023)

One LLM call generates a response while another provides evaluation and feedback in a loop. This workflow is particularly effective when there are clear evaluation criteria.

Anthropic · Building Effective Agents (December 2024)

Self-correction without external feedback can degrade performance. The qualifier the rule is built around: ground the self-review in tests, docs, and types — not in the model's sense that the revision looks better.

Huang et al. · Large Language Models Cannot Self-Correct Reasoning Yet (ICLR 2024)

§3

Evidence

Twenty external sources, ranked by author authority. The first five are the canon; expand to see the rest, including the qualifiers and the named opposers. Each links out to its primary source.

01
Reflexion: Language Agents with Verbal Reinforcement LearningSupports
Noah Shinn et al. · 2023
NeurIPS paper. HumanEval pass@1 91% with Reflexion vs 80% prior SOTA. The single most-cited paper for the self-review pattern at the agent layer.
02
Self-Refine: Iterative Refinement with Self-FeedbackSupports
Aman Madaan et al. · 2023
Cross-domain confirmation of the Reflexion finding. The same LLM provides feedback on its own output and refines iteratively across math, code, and creative tasks.
03
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive CritiquingSupports
Zhibin Gou et al. · 2024
ICLR paper. Tool-interactive self-critique. The external-grader requirement the row depends on; tools are the grader, not introspection.
04
Constitutional AI: Harmlessness from AI FeedbackSupports
Yuntao Bai et al. (Anthropic) · 2022
Anthropic 2022. The model learns to critique and revise against a constitution. Earliest of the agent-self-critique cluster and the source of the framing.
05
Building Effective AgentsSupports
Anthropic · 2024
Names the evaluator-optimizer pattern as the production workflow for the row. Direct architectural source.

Sixteen sources. The supports are the strongest peer-reviewed cluster on the site: Reflexion (Shinn), Self-Refine (Madaan), CRITIC (Gou), Constitutional AI (Bai), and Anthropic's evaluator-optimizer pattern. The qualifier further down is load-bearing: pure introspection without external grounding fails; the rule is built around that finding, not against it. The opposing voice (review fatigue) gets the reply.

§4

Examples

Viewing: TypeScript.

Avoid

Filewrite-rescue-hedgehog.ts

// Before: single-shot. Agent drafts, ships, waits for the human to find the bug.async function writeRescueHedgehog(spec: Spec): Promise<string> {  const draft = await agent.generate(spec);  return draft;}

Prefer

Filewrite-rescue-hedgehog.ts

// After: draft, critique, refine. The author edits last; the agent self-edits first.async function writeRescueHedgehog(spec: Spec): Promise<string> {  // 1. Draft against the spec.  const draft = await agent.generate(spec);  // 2. Critique against external graders: spec, tests, types.  const critique = await agent.critique(draft, spec,    "find every off-by-one and every assumption that contradicts the spec");  // 3. Refine. The agent edits before the author does.  const refined = await agent.refine(draft, critique);  // 4. External grader (Huang 2024): introspection alone can decrease accuracy.  await runTests(refined);  return refined;}

§4b

Enforcement

Viewing: TypeScript.

Apply these rules in dangerfile.ts. The full enforcement across every tenet lives on the implementation page.

Rule	Tool	Catches
tsc --noEmit (grader 1)	tsc	imaginary signatures, missing fields, wrong arity. The first external grader for the self-review.
vitest run (grader 2)	Vitest	the verifiable spec from AI4. The grader is the test suite; the self-review must report a green run.
ESLint (grader 3)	ESLint	rule violations the agent introduced and didn't notice. The lint output is part of the grader package.
Danger PR body section	Danger JS	PRs missing the Self-Review section, or missing the external-grader naming inside it. Without the section, the loop never closed.

dangerfile.tsconfiguration snippet

import { danger, fail, warn, message } from 'danger';

const body = (danger.github.pr.body ?? '').toLowerCase();
const hasSelfReview = /## self[- ]review\b/.test(body);
const hasGrader = /spec passes|tests pass|type-check passes|lint passes/.test(body);

if (!hasSelfReview) {
  fail('AI6: PR body must include a "## Self-Review" section.');
}
if (!hasGrader) {
  fail('AI6: self-review must name the external grader (spec / tests / type-check / lint).');
}
message('AI6: human review follows the agent self-review. Both gates close before merge.');

§4c

AI rules

Paste destination

File.cursor/rules/ai6-self-review-pass.mdc

---
description: Prickles AI6 — Self-Review Pass
globs: "**/*"
alwaysApply: true
---

## Prickles AI6 — Self-Review Pass

Draft, then critique, then fix. The model can draft. The model also has to grade what it drafted before handing back. Reflexion + Self-Refine + CRITIC + Anthropic's evaluator-optimizer.

Pure introspection isn't enough — Huang et al. 2024. Ground the self-review in tests, docs, types, and the persistent brief; introspection without external grounding can decrease accuracy.

Every change opens its own self-PR. The agent reviews against the brief, the verifiable spec, and the test runner before declaring done. The human reviews after.

The author edits last. AI prose, AI plans, AI schemas — all drafts. The final edit is human; the brand voice and the editorial responsibility do not delegate.

Repo layout, CI, and ESLint wiring for these paths live on /implementation — not repeated on every tenet.

§5

Counter-argument

Counter

The honest steelman is review fatigue. Two reviews per change (the agent's self-review then the human's) doubles the friction without doubling the value, and teams under deadline pressure will rationalise the self-review away as ceremony. The Cognition / Devin school argues for the opposite extreme: full autonomy with retroactive review at the merge gate, on the bet that the agent's self-grading approaches a human's quickly enough that the upfront pass becomes overhead. If the agent's next-generation self-review is good enough, the human review is the only one the rule needs.

§6

Counter-argument retort

The review-fatigue argument concedes the rule for the present and asks for it to retire as models improve. Two responses; both empirical.

First, the cost is overstated. The agent's self-review is fast — seconds, not minutes — and the cost is paid by the agent, not the human. The human review that follows is faster precisely because the obvious bugs have already been caught; the self-review filters the diff so the human review can focus on judgement rather than typo-hunting. The two passes catch different things. The literature is consistent on this: ground the self-review in external evidence and the marginal cost-per-bug is lower than the single-pass alternative.

Second, the optimistic future does not retire the rule; it changes who runs the second pass. If the agent's self-review approaches human-quality, then the human review can become a sampling rather than a comprehensive reading — closer to a pull-request audit than a pull-request approval. Anthropic's evaluator-optimizer pattern already anticipates this; in production agent systems the “evaluator” is sometimes a separate model rather than a human. But until that future arrives reliably, the human remains the second grader, and the agent's self-review is the cheapest way to make the human's grading job easier.

The Devin-flavoured full-autonomy bet does not survive the Huang qualifier. Without the external grader, the agent's confidence in its own work is uncorrelated with the work's correctness. Long-running autonomous agents that grade their own output against their own intuition produce fluent confusion at machine speed; the loop only works when the grader is something the model's confidence cannot edit. Pair this with AI1 The Intern Pattern — the four moves and the self-review reinforce each other; the agent's self-review is the AI side of the review leg.

§7

Notes

[1]Noah Shinn et al. — Reflexion: Language Agents with Verbal Reinforcement Learning (NeurIPS 2023). HumanEval pass@1 91% with Reflexion vs 80% prior SOTA. Reflexive self-evaluation produces large, repeatable benchmark gains.
[2]Aman Madaan et al. — Self-Refine: Iterative Refinement with Self-Feedback (NeurIPS 2023). The same LLM generates output, provides feedback, and refines iteratively. Cross-domain confirmation of the Reflexion finding.
[3]Zhibin Gou et al. — CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (ICLR 2024). The model validates its output by interacting with appropriate tools — the external grader the rule depends on.

Disagree? Found a hole in the argument? Take issue with this tenet →

Last revised: 2026-04-27