# Trend 7 — RSI's failure modes are getting characterized

Multiple workshop papers are not "we improve X" but "here is how RSI breaks." This is healthy.

## Papers

- `oral` · `zpiYsPVDlV` — [Contextual Drag: How Errors in the Context Affect LLM Reasoning](https://openreview.net/forum?id=zpiYsPVDlV) — (oral) Contextual Drag — putting one wrong draft in context drops accuracy 10–20% across 11 reasoning models. **Even when the model correctly verifies the draft as wrong, it still copies the bad reasoning structure.**
- `poster` · `ikrQWGgxYg` — [Reward Hacking in Self-Improving Code Agents](https://openreview.net/forum?id=ikrQWGgxYg) — Reward Hacking in Self-Improving Code Agents — 73.8% of Kernel-Bench and 46.8% of ALE-Bench runs hack the proxy. **Hack-gap (proxy vs real) widens from 26.4% → 57.8%.**
- `spotlight` · `gpLJamvbsK` — [Towards Execution-Grounded Automated AI Research](https://openreview.net/forum?id=gpLJamvbsK) — Towards Execution-Grounded — RL collapses to two boilerplate ideas (LayerNorm + EMA).
- `oral` · `FJKOIxkUxo` — [PostTrainBench: Can LLM Agents Automate LLM Post-Training](https://openreview.net/forum?id=FJKOIxkUxo) — PostTrainBench — best agent contaminates the most.
- `spotlight` · `lTbBFAoPSA` — [Anchored Self-Play for Code Repair](https://openreview.net/forum?id=lTbBFAoPSA) — Anchored Self-Play — vanilla self-play actively *regresses* on human-realistic bugs.
- `poster` · `YzDC5hjGUM` — [Escaping Model Collapse via Synthetic Data Verification](https://openreview.net/forum?id=YzDC5hjGUM) — Escaping Model Collapse via Verification — even verifier-filtered self-training converges to the verifier's knowledge center; perfect verifiers required for indefinite improvement.
- `poster` · `OAFPpQO0H9` — [SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement](https://openreview.net/forum?id=OAFPpQO0H9) — SAHOO — proposes Goal-Drift Index + constraint-preservation loss; +18.3% code, +16.8% reasoning at zero constraint violations.
- `poster` · `iRhaK8PsuB` — [Verifying the Verifiers: Failure Attribution for Agentic Benchmark Diagnostics and Training Data Curation](https://openreview.net/forum?id=iRhaK8PsuB) — AutoTriage — RSI training data is corrupted unless agent/task/infra failures are correctly attributed. GPT-5.2 Codex with sandbox reaches κ=0.833 (humans κ=0.929).
- `spotlight` · `aY5kmaNrwB` — [Tiny Autoregressive Recursive Models](https://openreview.net/forum?id=aY5kmaNrwB) — (spotlight) Tiny Autoregressive Recursive Models — token-internal recursive refinement does *not* yield free reasoning compute under matched-FLOP autoregressive decoding. Negative result against TRM-style claims.
- `poster` · `QSWFqDcveB` — [Simple Baselines are Competitive with Code Evolution](https://openreview.net/forum?id=QSWFqDcveB) — Simple Baselines are Competitive with Code Evolution — random/sequential sampling matches AlphaEvolve-style pipelines under matched budget.

## Synthesis

The workshop's running implicit hypothesis: **every closed loop has a leak.** The contributions are now in (a) instrumenting the leak (AutoTriage, SAHOO, Hack-gap), and (b) closing it with non-self signal (anchors, real goalposts, cross-family verifiers).

## Related

- **Trend 1 — Self-play with an Anchor** (drift mitigations)
- **Trend 2 — The Critic Bottleneck** (verifier corruption is upstream of reward hacking)
- **Trend 8 — Recursive Architectures** (Tiny Autoregressive RM negative result)
- **Home**