For agent rebuilds
Use this when version 1 looked promising but did not survive real operations: tool failures, retry loops, unclear recovery, cost spikes, brittle handoffs, or work that still depends on hidden human decisions.
V1 failed in production
You need to know whether the failure came from the model, the workflow, the tools, the data path, or the surrounding operating environment.
V2 needs a map
The output is a failure map, not a generic recommendation: recovery points, external checks, handoff risks, and the next action that is safe to take.
Human gates stay explicit
High-impact decisions should escalate early. A circuit breaker is not a weakness; it is how an agent system becomes trustworthy enough to run.
What gets checked
A trace can look clean while the route is dead, unsafe, unpaid, or built on repeated weak signals. The review checks not only local execution, but the environment around the agent.
False Autonomy
When a process looks autonomous but actually depends on hidden manual decisions, assumptions, or unverifiable steps.
Route Risk
Whether the market, task, buyer, payment path, and route to a real result are viable.
Coordination Failure
Where subagents duplicate work, amplify weak signals, or converge on an internal consensus that reality does not confirm.
Input / Output / Constraints
Input
An agent-generated plan, workflow, trace, market route, architecture sketch, or multi-agent role setup that you want to test before execution.
Output
A failure map with a verdict, missing evidence, hidden constraints, route risks, and the next action that is safe to take.
Constraints
No confidential data. No legal, financial, or security advice. No public naming unless explicitly allowed. No guarantees.
Packages
Agent Output Red-Team
One-page teardown of an agent-generated plan, workflow, trace, or result.
Corrected Action Plan
Teardown plus a corrected next step with explicit evidence and control points.
Agentic SLAM Audit
Workflow topology with inter-agent boundaries, handoff failure matrix, metric degradation matrix, control metric gaps, and route continuity map.
Base prices are for initial validation. Full control-plane and benchmark work is scoped separately.
Who this is for
I have an agent-generated plan
You want to know whether it preserves the real constraints: budget, time, buyer, route, autonomy, and evidence.
I have an agent workflow
You want to find hidden human decisions, unclear acceptance criteria, weak routes, and coordination failure.
I want my agent to earn
You need to know which marketplaces or task routes are live, payable, low-friction, and worth testing first.
Proof library
The first public sample is live. It shows the expected shape of the EUR 99 tier: verdict, what is sound, failure modes, repair, and next allowed action. Additional public or anonymized examples are being added as real submissions are cleared for publication.
Example verdict: DOWNGRADE
The plan is directionally interesting, but not execution-ready. It sounds autonomous while hiding live-world gates: account actions, payment route, acceptance criteria, and operator dependency.
False Consensus
Unguided multi-agent debate can collapse into agreement without external verification.
Payment Gates
Agent earnings routes still depend on setup, acceptance, escrow release, and payout gates.
Schema Drift
Browser agents fail when page structure is treated as a stable contract.
Live Credentials
Production-impacting credentials can turn a small agent mistake into a business incident.
Submit a public teardown candidate
Selected public or anonymized teardowns are used to expand the proof library. Send one agent-generated plan or workflow you do not fully trust and get a short diagnostic map if it fits the current review queue.
Best fit
Agent-generated business plans, automation workflows, agent marketplace routes, and multi-agent role setups where the main question is whether the route survives contact with reality. Especially where you suspect the agent is skipping real-world gates: accounts, payments, approvals, or SLAs.