← Research

PAPER 002 JUNE 2026 · UPCOMING

Reward hacking patterns in LLM agent benchmarks.

Cataloging reward-hacking strategies observed across 200+ trajectory annotations on Terminal-Bench. We propose a taxonomy and recommend evaluation patterns that resist gaming.

  • Evaluation
  • ML systems

Cataloging reward-hacking strategies observed across 200+ trajectory annotations on Terminal-Bench. We propose a taxonomy and recommend evaluation patterns that resist gaming.