AI 论文日报(2026-04-13)

Published:

English version: /paper-news/2026-04-13/

运行统计

  • 候选论文: 3253
  • 入选论文: 30
  • 已精读完成: 30
  • 时间窗口 (UTC): 2026-04-10T00:00:00Z → 2026-04-11T00:00:00Z (weekend_backlog_sat, expanded=0)
展开查看用于总结的论文列表
arXiv ID标题 / 链接分类评分入选理由标签
2604.07835Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation
PDF
cs.AI96New efficient inference-time jailbreak method via hidden-state subspace ablation; high safety relevancejailbreaks, inference-time attacks, representation engineering, guardrails, robustness
2604.08401Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
PDF
cs.AI, cs.CL93Self-auditing verification for LLM-agent beliefs to prevent drift in long-horizon tool use.llm-agents, self-auditing, faithful-reasoning, verification, agent-safety, long-horizon
2604.08291VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery
PDF
cs.GT, cs.CR, cs.OS92Agentic orchestration with verifiers for OS vuln discovery; strong security/agent workflow framingagents, cybersecurity, vulnerability discovery, verification, tool-augmented LLMs, game theory
2604.08304Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions
PDF
cs.CR, cs.AI92Clear secure-RAG framing + taxonomy across pipeline stages; useful for audits/defenses.RAG, security, taxonomy, prompt-injection, data-poisoning, threat-modeling, LLM-systems
2604.08455KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
PDF
cs.AI92Interactive benchmark for proactive/personalized mobile agents incl. consent/when-to-act decisionsagents, evaluation, mobile, personalization, proactivity, human-in-the-loop, safety
2604.08388Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover
PDF
cs.AI92Shows agentic tool-use can collapse after SFT and be restored with ~100 tracesagents, tool-use, capability-recovery, fine-tuning, formal-math, function-calling
2604.07778The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
PDF
cs.AI90Formal impossibility result for accountability in human-agent collectives as autonomy growsagent-governance, accountability, formal-methods, causal-models, multi-agent
2604.07745The Cartesian Cut in Agentic AI
PDF
cs.AI, q-bio.NC90Conceptual framework for where control lives in LLM agents; governance implications.agents, agent-architecture, governance, control, conceptual
2604.08326ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection
PDF
cs.AI89Fine-grained medical alignment w/ explicit criteria + multidimensional reward model; useful safety patternalignment, medical LLMs, reward modeling, rubrics, safety constraints, datasets
2604.07853QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch
PDF
cs.LG, cs.AI89Quantization-aware RL aligns rollout precision to stabilize LLM RL and speed trainingLLM-RL, post-training, quantization, efficiency, training-inference-mismatch
2604.05795Measuring What Matters!! Assessing Therapeutic Principles in Mental-Health Conversation
PDF
cs.CL88FAITH-M benchmark scores therapist responses on expert therapeutic principles; strong safety eval value.evaluation, mental-health, alignment, safety, benchmark, rubric, clinical
2604.08457CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
PDF
cs.CV, cs.AI, cs.RO88Safety-critical VLM benchmark for real crash videos; tests grounding + causal/mechanistic reasoningevaluation, vlm, autonomous-driving, safety, video, reasoning, benchmark
2604.08124Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search
PDF
cs.AI88Improves RL-trained LLM search agents via hierarchical experience; targets stability/efficiency.agents, search, reinforcement-learning, reasoning, training-stability
2604.07264Validated Intent Compilation for Constrained Routing in LEO Mega-Constellations
PDF
cs.CR, cs.AI86LLM intent compiler + verifier loop for safety-critical routing constraints; strong benchmark resultsLLM, program-synthesis, verification, tool-use, networking, constraints, reliability
2604.06693Aegon: Auditable AI Content Access with Ledger-Bound Tokens and Hardware-Attested Mobile Receipts
PDF
cs.CR, cs.CY86Auditable content-access protocol with append-only ledger proofs; practical governance infra.AI-governance, auditing, content-licensing, cryptography, transparency-logs, attestation, JWT
2604.08340PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models
PDF
cs.CV, cs.AI86Long-horizon VLM benchmark in complex 3D game with strict RGB-only isolation and evaluatorbenchmarks, embodied-agents, vision-language-models, long-horizon, evaluation
2604.04604AI Agents Under EU Law
PDF
cs.CY, cs.AI, cs.CR, cs.MA86Systematic mapping of EU AI Act+GDPR etc. obligations for autonomous AI agents.ai-agents, governance, regulation, EU-AI-Act, GDPR, compliance, risk-management
2604.07054Sell More, Play Less: Benchmarking LLM Realistic Selling Skill
PDF
cs.CL86Realistic sales benchmark + auto eval; tests goal-directed persuasion in multi-turn dialogsbenchmark, dialogue, persuasion, evaluation, user-simulation, DPO
2604.08519Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
PDF
cs.CL, stat.ML86Data pruning to improve factual memorization; info-theoretic framing of capacity limitsfactuality, hallucinations, data-selection, memorization, scaling, information-theory
2604.07007AgentCity: Constitutional Governance for Autonomous Agent Economies via Separation of Power
PDF
cs.MA, cs.AI, cs.CY84Governance architecture for open agent economies using separation-of-powers; novel but blockchain-heavyagent governance, multi-agent systems, auditing, mechanism design, smart contracts
2604.07967AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification
PDF
cs.CL, cs.AI84AtomEval detects semantic corruption in adversarial claim rewrites; improves fact-checking robustness eval.fact-verification, adversarial-evaluation, metrics, robustness, nlp-eval
2604.08003Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs
PDF
eess.AS, cs.CL, cs.SD84Entropy-allocation view for LLM-ASR; targets hallucinations + latency with principled training strategyhallucinations, ASR, LLM, reliability, training, evaluation-metrics
2604.08539OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
PDF
cs.CV, cs.AI, cs.CL84New RL objective (Gaussian GRPO) to stabilize multi-task reward topologies for multimodal reasoningmultimodal, rl, post-training, optimization, reasoning, grpo
2603.17692Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization
PDF
cs.LG, cs.AI, q-fin.CP, q-fin.PM84Anonymization-first eval for LLM trading agents to reduce memorization/survivorship bias.llm-agents, evaluation, data-leakage, memorization, finance, multi-agent
2604.06148Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries
PDF
cs.CR, cs.AI, cs.MA84Taxonomy for machine identity governance; relevant to agent credentials, tokens, and abuse.agent-security, machine-identities, access-control, credentials, governance, risk-taxonomy, enterprise
2604.07892Data Selection for Multi-turn Dialogue Instruction Tuning
PDF
cs.CL, cs.AI84Dialogue-level data selection for multi-turn instruction tuning; tackles noise and drift.instruction-tuning, data-selection, multi-turn, post-training, dataset-quality
2603.22709Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics
PDF
cs.CL, eess.AS84New semantic+overlap-aware ASR metrics; probes LLM robustness in multi-speaker settingsevaluation, speech, ASR, LLM-robustness, metrics, overlap, semantic-fidelity
2604.06814OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale
PDF
cs.LG, cs.AI843030-dataset tabular benchmark; large-scale comparison of GBDT/NN/foundation modelsbenchmark, tabular, evaluation, foundation-models, GBDT
2604.08417Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs
PDF
cs.SE, cs.CR84Empirical study of LLM vuln detection with interprocedural context; cost vs accuracysecurity, vulnerability-detection, code-LLMs, evaluation, interprocedural-analysis
2604.01554EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild
PDF
cs.CR, cs.LG, cs.SE82EXHIB benchmark for binary function similarity; broad, realistic security eval suite with model comparisonsbenchmarks, software security, binary analysis, evaluation, vulnerability analysis

AI 论文洞察简报

2026-04-13

0) 执行要点(先读这个)

  • 评估正在从表层指标转向“保持意义/结构”的指标:用于对话式 ASR 的 tcpSemER 与用于对抗性事实核验的 AtomEval 都表明:当涉及改写(paraphrase)或语义污染时,常见指标可能会严重误判进展/鲁棒性。
  • 智能体安全越来越取决于模型周边的接口治理层:面向智能体的欧盟法律映射、机器身份治理(MIGT)以及 RAG 安全分类法都指向同一结论——“外部动作 + 工具链 + 身份 + 可审计性”才是真正的合规/安全边界。
  • 推理时与训练时不匹配是反复出现的失效模式:量化 rollout 会使 RL 不稳定(QaRL/TBPO);LLM-ASR 联合训练可能漂移到幻觉(熵分配 + IA-SFT);重度 SFT 会抑制工具使用(“agentic collapse”)——都指向需要显式对齐“优化目标”和“实际部署形态”。
  • 长时程具身/GUI 智能体仍在低层恢复与主动性校准上失败:PokeGym 发现死锁/碰撞恢复是主要瓶颈;KnowU-Bench 显示即便强模型在个性化/主动任务上也会大幅下滑。
  • 安全研究正在变得更“系统 + 经济学”:VCAO 的博弈论式编排在预算内提升了已验证漏洞产出;EXHIB 揭示 BFSD 在固件/语义变化下的泛化缺口;在 LLM 漏洞检测中加入跨过程上下文往往更差且成本翻倍。

2) 关键主题(聚类)

主题:有效性感知评估(语义/结构 > 表面形式)

主题:将智能体治理、合规与身份作为一等工程问题

主题:在不匹配与漂移下稳定智能体训练与部署

主题:长时程交互式智能体:恢复、主动性与说服

主题:野外安全与鲁棒性(基准 + 编排 + 成本)

3) 技术综合

  • 多篇论文在“流水线级不变性”上趋同:tcpSemER 保持时间 collar + 置换不变性;AtomEval 强制关系结构一致性;RAG 安全按流水线阶段刻画威胁;欧盟智能体合规以外部动作清单为中心。
  • 分解(decomposition)成为新默认:重叠 vs 非重叠的错误归因(CASR)、低/中/高二进制变化(EXHIB)、按元特征条件化的赢家(OmniTabBench)、失败分类法(PokeGym 死锁;KnowU 澄清/部分;CrashSight 类别缺口)。
  • 不匹配修正出现三种不同形式
    • 系统不匹配(量化采样器 vs BF16 学习器 → QaRL 对齐低比特前向)。
    • 表征漂移不匹配(语音编码器变得过于语义化 → CTC 预训练 + IA-SFT 热切换)。
    • 能力抑制不匹配(领域 SFT 抑制工具使用 → 少量 agentic 轨迹再激活)。
  • 鲁棒性往往需要“硬门控” + “软评分”:AtomEval 关系硬门控 + 软降级;SAVER 类型化违规 + 最小修复;LEO 意图编译器使用确定性的 8 轮验证器并输出 ACCEPT/REJECT/ABSTAIN。
  • 图结构不断作为稳定器/加速器出现:匿名化交易中的 SemGAT;从 Dijkstra 蒸馏出的用于 LEO 的 GAT 路由器;VCAO 的攻击图;金融与路由中的语义边都用于传播关系约束。
  • 成本感知评估正在成为标准:漏洞检测论文报告 token 总成本并显示上下文使 token 翻倍;QaRL 报告每步加速;VCAO 报告 MILP 求解时间(<5s,约 75k 变量)。
  • “重叠/并发”是核心未解区间:CASR 显示重叠区域主导错误(约 32% 重叠贡献约 90% 错误);类似“并发”问题也出现在多智能体治理(问责时间跨度)与工具链(RAG 信任边界)。
  • 推理时攻击正在进入表征空间:CRA 使用基于梯度归因的掩码抑制拒答子空间,提示防御需考虑激活完整性,而不只是提示过滤。
  • 基准越来越包含干预研究(PokeGym 强制恢复提升 SR;MDS 显示长对话鲁棒性;CrashSight 显示微调收益但感知瓶颈仍在)。

4) Top 5 论文(含“为何是现在”)

1) QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training–Inference Mismatch

  • 对齐学习器前向算术与量化 rollout 引擎,减少由不匹配导致的 PPO 不稳定。
  • TBPO 引入序列级比率 + 双重裁剪,以抑制量化解码下“错误 token”比率爆炸。
  • 在保留大部分吞吐收益的同时,展示接近 BF16 的恢复(例如 Qwen3-30B-A3B:45.7 → 51.2,而 BF16 为 52.1)。
  • 质疑点:仍比纯量化-rollout 训练更慢(MoE 上 1.3× vs 1.4×),且依赖低比特 kernel 可用性。

2) Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation

  • 无需训练、推理时越狱:通过梯度归因与掩码定位并攻击拒答子空间。
  • 在多个 7B 对齐模型上报告显著 ASR 提升(例如 Llama-2-7B-Chat ASR-O 53.0%;λ≈1.0 时 RRSR 96.3%)。
  • 强调一种区别于纯提示越狱的具体潜在空间攻击面。
  • 质疑点:假设对白盒激活/梯度可访问;在高抑制强度下质量下降。

3) Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics

  • 引入 tcpSemER(时间约束、置换不变的语义错误)与重叠感知 tcpWER 分解。
  • 显示重叠主导错误(NSF1:约 32% 重叠贡献约 90% 错误),且语义指标降低对归一化的敏感性。
  • 在重叠/说话人数增加下,对模块化与基于 LLM 的 CASR 给出更现实的比较。
  • 质疑点:主要是评估;未提出处理重叠的架构性修复。

4) KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

  • 在线 Android 基准,测试偏好引导、主动性/同意、以及被拒后克制——超越纯导航。
  • 显示强模型在困难个性化任务上显著下滑(例如 Claude Sonnet 4.6:总体 60.4% vs 困难个性化 44.2%)。
  • 混合评估(规则检查 + LLM 裁判)比仅规则更贴近人工评分。
  • 质疑点:模拟器依赖(LLM 用户模拟器)以及合成/策展的画像与日志可能限制生态有效性。

5) VCAO: Verifier-Centered Agentic Orchestration for Strategic OS Vulnerability Discovery

  • 将漏洞发现建模为重复的贝叶斯 Stackelberg 博弈;通过 DOBSS 推导的 MILP + 信念更新分配工具预算。
  • 声称在预算内严重度加权的已验证发现数大幅提升(相对仅覆盖率 fuzzing 为 2.7×),并将误报降至约 15.1%。
  • 包含六层编排架构与给出的在线遗憾界。
  • 质疑点:依赖理性攻击者假设与校准后的工具似然;攻击路径枚举是指数级并需要启发式。

5) 实用下一步

  • 在评估栈中采用有效性感知指标:对多说话人 ASR,加入 tcpSemER + 重叠分解;对对抗性事实核验,加入原子结构有效性检查(AtomEval 风格)以避免将“语义漂移”计为成功攻击。
  • 围绕外部动作对智能体系统做仪表化:构建“外部动作清单”(欧盟法律论文的 Step 0),并将其映射到身份、日志与信任边界(MIGT + RAG 安全分类法)。
  • 加固对表征空间越狱的防护:若运营开源权重模型或内部部署,在红队中测试 CRA 类激活消融,以理解拒答是否依赖低秩方向。
  • 若使用量化 rollouts 做 RL,测量不匹配诱发的比率病态(token/序列比率、错误 token 频率),并考虑对齐的低比特前向 + 序列级裁剪/掩码(QaRL/TBPO)。
  • 对长时程 VLM/GUI 智能体,跟踪过程指标(死锁/无效动作;澄清率;干预/被动性),并进行定向干预(如确定性恢复原语),而非只提升高层规划。
  • 对专用工具型模型,在重度 SFT 后测试是否出现“agentic collapse”;尝试注入少量定向 agentic 轨迹(包含显式 no-tool 负例),以在不破坏领域技能的情况下恢复工具使用。
  • 在安全工具链中,避免朴素的上下文扩展:跨过程上下文可能降低检测且 token 翻倍;改为选择性检索最相关的 caller/callee,并衡量每个已验证发现的成本。

由逐篇论文分析生成;未进行外部浏览。