Skip to main content
manual page/ Start here

Reviewing output

sourceSource: suspec/docs/08-reviewing-output.mdModified: 2026-06-29
Sections
10
Format
Markdown
Order
08 / 156

Review checks the worker's output against the task and spec.

Do not read a large diff from line 1. Start with:

  • requirement coverage
  • failed, unverified, or blocked rows
  • human-attention items
  • changed files

Then open the diff where the packet points.

Reviewer rule

The implementer does not render the review result. The reviewer is not the implementer — but the spec or task author may review, as long as they did not implement the change.

Use a fresh session, another agent, or a human reviewer.

The independent judgment is the invariant; the formal packet scales with risk. A substantial or high-diffusion change gets the full packet below; a trivial change the owner verified needs the judgment, not the paperwork.

When the change warrants the formal packet, a review lead orchestrates it: it reads the task, cited spec, run summary, and diff, then sends at least three independent lens reviewers — requirement correctness, verification/evidence, and maintainability/design by default; security, migration, performance, and others as the change warrants. Each lens returns findings and evidence only. The lead reconciles them, deduplicates, and writes the one packet; a lens reviewer never renders the status or suggested decision.

Review packet

The packet lives in reviews/ and follows the kit template.

Required sections:

  • Summary
  • Changed files
  • Requirement coverage
  • Change-plan coverage, when relevant
  • Human attention
  • Task status
  • Suggested decision

Coverage table

One row per scoped requirement:

code
| ID | Result | Evidence | Human attention |
| --- | --- | --- | --- |
| AC-001 | Pass | `npm test -- expired-session` -> `3 passed` | yes |

Results:

  • Pass: evidence shows the requirement is met.
  • Fail: evidence shows the requirement is not met.
  • Unverified: evidence is missing or insufficient.
  • Blocked: the requirement cannot be judged yet.

Empty evidence means Unverified, never Pass.

Evidence

Valid evidence:

  • pasted command output
  • CI link with the relevant job
  • named manual observation

Invalid evidence:

  • tests passed
  • the worker's summary alone
  • a screenshot with no stated check
  • a claim that a command was run, without output or link

Human attention

Route anything a reviewer must inspect:

  • failed, unverified, or blocked requirements
  • out-of-scope edits
  • changed Do not change paths
  • risky files
  • public interface changes
  • migrations
  • security-sensitive changes
  • missing or weak test output
  • candidate findings
  • unresolved questions

If no trigger applies, say None.

Spot-check

Spot-check at least one green row.

Record what you checked:

code
Spot-checked: AC-001 - reran `npm test -- expired-session`; output matched the evidence row.

Suggested decision

Use the table:

StateDecision
All scoped rows Pass, exceptions routedMerge
Any FailDo not merge
Any Unverified without waiverDo not merge
Any BlockedDo not merge
Fail or Unverified waived by ownerMerge with waiver

Waivers name:

  • who accepted it
  • which rows it covers
  • why
  • expiry or follow-up

Review status

Use:

  • draft
  • pass
  • waived
  • blocked
  • needs-human

The status is about the review packet, not the PR platform state.

App-run evidence

An agent driving the app can produce evidence: actions taken, screen state, logs, or screenshots.

It does not produce the verdict. A reviewer still judges the evidence against the requirement.

Starter kit: Set up a workspace