# Proof Library

Status: `backtest_open`

Purpose: publish and validate public or anonymized teardown examples that show recurring AI-agent blind spots before the M2M API is sold as a production dependency.

## Published Entries

| Entry | Verdict | Classes | Source |
| --- | --- | --- | --- |
| `pl-2026-05-28-001` | downgrade | false autonomy, route risk, capital-regime mismatch, missing buyer self-identification | `../artifacts/public_sample_teardown_agent_plan.md` |
| `bt-002` | downgrade | coordination failure, false consensus, token cost overrun, missing external verification | `https://arxiv.org/abs/2605.00914` |
| `bt-003` | rework | route risk, false autonomy, payment gate dependency, acceptance gate dependency | `https://www.agentgigs.io/` |
| `bt-004` | rework | schema drift, route risk, missing adapter policy, brittle observation layer | `https://playwright.dev/docs/locators` |
| `bt-005` | stop | live-world gates, liability void, credential scope failure, destructive action missing confirmation | `https://www.livescience.com/technology/artificial-intelligence/i-violated-every-principle-i-was-given-ai-agent-deletes-companys-entire-database-in-9-seconds-then-confesses` |

## Backtest Queue

See `backtest_queue.csv`.

## Public Teardown Pages

- `bt-002`: `pages/bt-002-consensus-entropy-multi-agent-debate.md`
- `bt-003`: `pages/bt-003-agent-marketplace-route-continuity.md`
- `bt-004`: `pages/bt-004-schema-drift-web-automation.md`
- `bt-005`: `pages/bt-005-destructive-agent-liability-void.md`

## Publication Gates

- public-source check;
- no-confidential-data check;
- public-surface terminology check;
- semantic-density check;
- source-specific evidence check;
- sentinel spot-check before high-impact claims.

## Next Production Step

Use the reviewed bt-002 to bt-005 entries as the first calibration set for consensus, route-continuity, schema-drift, and production-permission failure checks. Add future entries only when they pass the same JSON schema and public-surface scan.
