Daily AI Paper Report (2026-04-13)

Published:

Chinese version: [中文]

Run stats

  • Candidates: 3253
  • Selected: 30
  • Deepread completed: 30
  • Window (UTC): 2026-04-10T00:00:00Z → 2026-04-11T00:00:00Z (weekend_backlog_sat, expanded=0)
Show selected papers
arXiv IDTitle / LinksCategoriesScoreWhyTags
2604.07835Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation
PDF
cs.AI96New efficient inference-time jailbreak method via hidden-state subspace ablation; high safety relevancejailbreaks, inference-time attacks, representation engineering, guardrails, robustness
2604.08401Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
PDF
cs.AI, cs.CL93Self-auditing verification for LLM-agent beliefs to prevent drift in long-horizon tool use.llm-agents, self-auditing, faithful-reasoning, verification, agent-safety, long-horizon
2604.08291VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery
PDF
cs.GT, cs.CR, cs.OS92Agentic orchestration with verifiers for OS vuln discovery; strong security/agent workflow framingagents, cybersecurity, vulnerability discovery, verification, tool-augmented LLMs, game theory
2604.08304Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions
PDF
cs.CR, cs.AI92Clear secure-RAG framing + taxonomy across pipeline stages; useful for audits/defenses.RAG, security, taxonomy, prompt-injection, data-poisoning, threat-modeling, LLM-systems
2604.08455KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
PDF
cs.AI92Interactive benchmark for proactive/personalized mobile agents incl. consent/when-to-act decisionsagents, evaluation, mobile, personalization, proactivity, human-in-the-loop, safety
2604.08388Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover
PDF
cs.AI92Shows agentic tool-use can collapse after SFT and be restored with ~100 tracesagents, tool-use, capability-recovery, fine-tuning, formal-math, function-calling
2604.07778The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
PDF
cs.AI90Formal impossibility result for accountability in human-agent collectives as autonomy growsagent-governance, accountability, formal-methods, causal-models, multi-agent
2604.07745The Cartesian Cut in Agentic AI
PDF
cs.AI, q-bio.NC90Conceptual framework for where control lives in LLM agents; governance implications.agents, agent-architecture, governance, control, conceptual
2604.08326ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection
PDF
cs.AI89Fine-grained medical alignment w/ explicit criteria + multidimensional reward model; useful safety patternalignment, medical LLMs, reward modeling, rubrics, safety constraints, datasets
2604.07853QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch
PDF
cs.LG, cs.AI89Quantization-aware RL aligns rollout precision to stabilize LLM RL and speed trainingLLM-RL, post-training, quantization, efficiency, training-inference-mismatch
2604.05795Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation
PDF
cs.CL88FAITH-M benchmark scores therapist responses on expert therapeutic principles; strong safety eval value.evaluation, mental-health, alignment, safety, benchmark, rubric, clinical
2604.08457CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
PDF
cs.CV, cs.AI, cs.RO88Safety-critical VLM benchmark for real crash videos; tests grounding + causal/mechanistic reasoningevaluation, vlm, autonomous-driving, safety, video, reasoning, benchmark
2604.08124Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search
PDF
cs.AI88Improves RL-trained LLM search agents via hierarchical experience; targets stability/efficiency.agents, search, reinforcement-learning, reasoning, training-stability
2604.07264Validated Intent Compilation for Constrained Routing in LEO Mega-Constellations
PDF
cs.CR, cs.AI86LLM intent compiler + verifier loop for safety-critical routing constraints; strong benchmark resultsLLM, program-synthesis, verification, tool-use, networking, constraints, reliability
2604.06693Aegon: Auditable AI Content Access with Ledger-Bound Tokens and Hardware-Attested Mobile Receipts
PDF
cs.CR, cs.CY86Auditable content-access protocol with append-only ledger proofs; practical governance infra.AI-governance, auditing, content-licensing, cryptography, transparency-logs, attestation, JWT
2604.08340PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models
PDF
cs.CV, cs.AI86Long-horizon VLM benchmark in complex 3D game with strict RGB-only isolation and evaluatorbenchmarks, embodied-agents, vision-language-models, long-horizon, evaluation
2604.04604AI Agents Under EU Law
PDF
cs.CY, cs.AI, cs.CR, cs.MA86Systematic mapping of EU AI Act+GDPR etc. obligations for autonomous AI agents.ai-agents, governance, regulation, EU-AI-Act, GDPR, compliance, risk-management
2604.07054Sell More, Play Less: Benchmarking LLM Realistic Selling Skill
PDF
cs.CL86Realistic sales benchmark + auto eval; tests goal-directed persuasion in multi-turn dialogsbenchmark, dialogue, persuasion, evaluation, user-simulation, DPO
2604.08519Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
PDF
cs.CL, stat.ML86Data pruning to improve factual memorization; info-theoretic framing of capacity limitsfactuality, hallucinations, data-selection, memorization, scaling, information-theory
2604.07007AgentCity: Constitutional Governance for Autonomous Agent Economies via Separation of Power
PDF
cs.MA, cs.AI, cs.CY84Governance architecture for open agent economies using separation-of-powers; novel but blockchain-heavyagent governance, multi-agent systems, auditing, mechanism design, smart contracts
2604.07967AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification
PDF
cs.CL, cs.AI84AtomEval detects semantic corruption in adversarial claim rewrites; improves fact-checking robustness eval.fact-verification, adversarial-evaluation, metrics, robustness, nlp-eval
2604.08003Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs
PDF
eess.AS, cs.CL, cs.SD84Entropy-allocation view for LLM-ASR; targets hallucinations + latency with principled training strategyhallucinations, ASR, LLM, reliability, training, evaluation-metrics
2604.08539OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
PDF
cs.CV, cs.AI, cs.CL84New RL objective (Gaussian GRPO) to stabilize multi-task reward topologies for multimodal reasoningmultimodal, rl, post-training, optimization, reasoning, grpo
2603.17692Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization
PDF
cs.LG, cs.AI, q-fin.CP, q-fin.PM84Anonymization-first eval for LLM trading agents to reduce memorization/survivorship bias.llm-agents, evaluation, data-leakage, memorization, finance, multi-agent
2604.06148Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries
PDF
cs.CR, cs.AI, cs.MA84Taxonomy for machine identity governance; relevant to agent credentials, tokens, and abuse.agent-security, machine-identities, access-control, credentials, governance, risk-taxonomy, enterprise
2604.07892Data Selection for Multi-turn Dialogue Instruction Tuning
PDF
cs.CL, cs.AI84Dialogue-level data selection for multi-turn instruction tuning; tackles noise and drift.instruction-tuning, data-selection, multi-turn, post-training, dataset-quality
2603.22709Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics
PDF
cs.CL, eess.AS84New semantic+overlap-aware ASR metrics; probes LLM robustness in multi-speaker settingsevaluation, speech, ASR, LLM-robustness, metrics, overlap, semantic-fidelity
2604.06814OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale
PDF
cs.LG, cs.AI843030-dataset tabular benchmark; large-scale comparison of GBDT/NN/foundation modelsbenchmark, tabular, evaluation, foundation-models, GBDT
2604.08417Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs
PDF
cs.SE, cs.CR84Empirical study of LLM vuln detection with interprocedural context; cost vs accuracysecurity, vulnerability-detection, code-LLMs, evaluation, interprocedural-analysis
2604.01554EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild
PDF
cs.CR, cs.LG, cs.SE82EXHIB benchmark for binary function similarity; broad, realistic security eval suite with model comparisonsbenchmarks, software security, binary analysis, evaluation, vulnerability analysis

AI Paper Insight Brief

2026-04-13

0) Executive takeaways (read this first)

  • Evaluation is shifting from surface metrics to “meaning-/structure-preserving” metrics: tcpSemER for conversational ASR and AtomEval for adversarial fact verification both show that common metrics can dramatically misstate progress/robustness when paraphrase or semantic corruption is involved.
  • Agent safety is increasingly about the interfaces and governance layers around models: EU-law mapping for agents, machine-identity governance (MIGT), and RAG security taxonomies all converge on “external actions + toolchains + identity + auditability” as the real compliance/security boundary.
  • Inference-time and training-time mismatches are a recurring failure mode: quantized rollouts destabilize RL (QaRL/TBPO), LLM-ASR joint training can drift into hallucination (entropy allocation + IA-SFT), and heavy SFT can suppress tool-use (“agentic collapse”)—all pointing to the need for explicit alignment between what’s optimized and what’s deployed.
  • Long-horizon embodied/GUI agents still fail on low-level recovery and initiative calibration: PokeGym finds deadlock/collision recovery as the dominant bottleneck; KnowU-Bench shows large drops on personalized/proactive tasks even for strong models.
  • Security research is becoming more “systems + economics”: VCAO’s game-theoretic orchestration improves validated vuln yield per budget; EXHIB exposes BFSD generalization gaps across firmware/semantic variation; interprocedural context in LLM vuln detection often hurts while doubling cost.

2) Key themes (clusters)

Theme: Validity-aware evaluation (semantic/structure > surface form)

Theme: Agent governance, compliance, and identity as first-class engineering

Theme: Stabilizing agent training & deployment under mismatch and drift

Theme: Long-horizon interactive agents: recovery, proactivity, and persuasion

Theme: Security & robustness in the wild (benchmarks + orchestration + cost)

3) Technical synthesis

  • Multiple papers converge on “pipeline-level invariants”: tcpSemER preserves time collars + permutation invariance; AtomEval enforces relation-structure consistency; RAG security frames threats by pipeline stage; EU agent compliance centers on external-action inventories.
  • Decomposition is the new default: overlap vs non-overlap error attribution (CASR), low/mid/high binary variation (EXHIB), metafeature-conditioned winners (OmniTabBench), failure taxonomies (PokeGym deadlocks; KnowU clarify/partial; CrashSight category gaps).
  • Mismatch correction appears in three distinct forms:
    • Systems mismatch (quantized sampler vs BF16 learner → QaRL aligned low-bit forward).
    • Representation drift mismatch (speech encoder becomes too semantic → CTC pretrain + IA-SFT hot-swapping).
    • Capability suppression mismatch (domain SFT suppresses tool use → tiny agentic trace reactivation).
  • Robustness often requires “hard gates” + “soft scores”: AtomEval hard relation gate + soft degradations; SAVER typed violations + minimal repair; LEO intent compiler uses deterministic 8-pass validator with ACCEPT/REJECT/ABSTAIN.
  • Graph structure keeps showing up as a stabilizer/accelerator: SemGAT in anonymized trading; GAT router distilled from Dijkstra for LEO; attack graphs in VCAO; semantic edges in finance and routing both used to propagate relational constraints.
  • Cost-aware evaluation is becoming standard: vulnerability detection paper reports token-cost totals and shows context doubles tokens; QaRL reports per-step speedups; VCAO reports MILP solve time (<5s for ~75k vars).
  • “Overlap / concurrency” is a core unsolved regime: CASR shows overlap regions dominate errors (~90% of error from ~32% overlap); similar “concurrency” issues appear in multi-agent governance (accountability horizon) and toolchains (RAG trust boundaries).
  • Inference-time attacks are moving into representation space: CRA uses gradient-attributed masking to suppress refusal subspaces, suggesting defenses must consider activation integrity, not just prompt filtering.
  • Benchmarks increasingly include intervention studies (PokeGym forced recovery improves SR; MDS shows long-dialogue robustness; CrashSight shows fine-tuning gains but persistent perceptual bottlenecks).

4) Top 5 papers (with “why now”)

1) QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training–Inference Mismatch

  • Aligns learner forward-pass arithmetic with quantized rollout engines to reduce PPO instability from mismatch.
  • TBPO introduces sequence-level ratios + dual clipping to suppress “error-token” ratio explosions under quantized decoding.
  • Demonstrates near-BF16 recovery while keeping most throughput gains (e.g., Qwen3-30B-A3B: 45.7 → 51.2 vs BF16 52.1).
  • Skepticism: still slower than pure quantized-rollout training (1.3× vs 1.4× on MoE) and relies on low-bit kernel availability.

2) Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation

  • Training-free, inference-time jailbreak that targets refusal subspaces via gradient attribution and masking.
  • Large ASR gains reported across multiple 7B aligned models (e.g., Llama-2-7B-Chat ASR-O 53.0%; λ≈1.0 gives RRSR 96.3%).
  • Highlights a concrete latent-space attack surface distinct from prompt-only jailbreaks.
  • Skepticism: assumes white-box access to activations/gradients; quality degrades at high suppression strengths.

3) Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics

  • Introduces tcpSemER (time-constrained, permutation-invariant semantic error) and overlap-aware tcpWER decomposition.
  • Shows overlap dominates errors (NSF1: ~32% overlap accounts for ~90% of error), and semantic metrics reduce sensitivity to normalization.
  • Provides a realistic comparison of modular vs LLM-based CASR under increasing overlap/speaker counts.
  • Skepticism: primarily evaluation; does not propose architectural fixes for overlap handling.

4) KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

  • Online Android benchmark that tests preference elicitation, proactivity/consent, and post-rejection restraint—beyond navigation.
  • Shows strong models drop sharply on hard personalized tasks (e.g., Claude Sonnet 4.6: 60.4% overall vs 44.2% hard personalized).
  • Hybrid evaluation (rule checks + LLM judge) better aligns with human ratings than rules alone.
  • Skepticism: simulator dependence (LLM user simulator) and synthetic/curated profiles/logs may limit ecological validity.

5) VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery

  • Frames vuln discovery as repeated Bayesian Stackelberg game; allocates tool budget via DOBSS-derived MILP + belief updates.
  • Claims large gains in severity-weighted validated findings per budget (2.7× vs coverage-only fuzzing) and reduces false positives to ~15.1%.
  • Includes a six-layer orchestration architecture and a stated online regret bound.
  • Skepticism: relies on rational-attacker assumptions and calibrated tool likelihoods; attack-path enumeration is exponential and needs heuristics.

5) Practical next steps

  • Adopt validity-aware metrics in your eval stack: for multi-speaker ASR, add tcpSemER + overlap decomposition; for adversarial fact verification, add atomic-structure validity checks (AtomEval-style) to avoid counting “semantic drift” as successful attacks.
  • Instrument agent systems around external actions: build an “external-action inventory” (EU-law paper’s Step 0) and map it to identity, logging, and trust boundaries (MIGT + RAG security taxonomy).
  • Harden against representation-space jailbreaks: if you operate open-weight models or internal deployments, test CRA-like activation ablations in a red-team setting to understand whether refusal relies on low-rank directions.
  • If doing RL with quantized rollouts, measure mismatch-induced ratio pathologies (token/sequence ratios, error-token frequency) and consider aligned low-bit forward passes + sequence-level clipping/masking (QaRL/TBPO).
  • For long-horizon VLM/GUI agents, track process metrics (deadlocks/ineffective moves; clarify rate; intervention/passivity) and run targeted interventions (e.g., deterministic recovery primitives) rather than only improving high-level planning.
  • For specialized tool-using models, test for “agentic collapse” after heavy SFT; try small targeted agentic trace injections (including explicit no-tool negatives) to recover tool use without destroying domain skill.
  • In security tooling, avoid naive context expansion: interprocedural context can degrade detection while doubling tokens; instead, experiment with selective retrieval of only the most relevant callers/callees and measure cost-per-validated-finding.

Generated from per-paper analyses; no external browsing.