June 25, 2026 Research Brief
Agent control gets explicit.
Today’s strongest papers replace prompt-only agent design with governed memory, formal verification, and system-level security evaluation, while more realistic benchmarks expose where long-horizon agents still break.
Persistent memory is no longer just a convenience layer; it is a control plane for future actions. Several papers show that failures arise at write-time authority assignment, retrieval-time surfacing, cross-agent propagation, and experience consolidation.
As traces get longer and tasks more consequential, post hoc answer checking is too coarse. The strongest systems now verify steps, localize decisive faults, or maintain explicit beliefs over correctness before acting.
Security failures increasingly emerge from end-to-end pipelines rather than isolated prompts. Today’s papers show that evaluation must include retrieval, memory, tool execution, multimodal judges, and sandbox boundaries.
Recent Briefs
Each issue should tell you why the day is worth reopening, not only when it was published.