AI 论文日报(2026-04-18)

Published:

English version: /paper-news/2026-04-18/

运行统计

  • 候选论文: 3670
  • 入选论文: 30
  • 已精读完成: 30
  • 时间窗口 (UTC): 2026-04-17T00:00:00Z → 2026-04-18T00:00:00Z (weekend_backlog_sun, expanded=0)
展开查看用于总结的论文列表
arXiv ID标题 / 链接分类评分入选理由标签
2604.12951The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime
PDF
cs.LG95Proves minimax limits for calibration auditing in rare-error regime; big implications for AI eval & governance.calibration, auditing, evaluation, statistical-limits, rare-errors, reliability
2604.12548DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection
PDF
cs.CR92Black-box prompt-injection fuzzing combining semantic + char obfuscation; timely robustness eval on DeepSeekprompt-injection, jailbreaks, robustness-eval, fuzzing, black-box, LLM-security, Chinese-LLMs
2604.12666From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation
PDF
cs.LG, cs.CL, cs.HC90590k web-agent dataset + hard negatives + curriculum to improve robust web navigation generalizationweb-agents, robustness, dataset, hard-negatives, curriculum-learning, evaluation
2604.12461CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems
PDF
cs.AI90Black-box attack infers LLM multi-agent communication topology; concrete new MAS privacy/security risk.multi-agent, security, privacy, black-box-attack, topology-inference, LLM-agents
2604.05846AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning
PDF
cs.CL90RL-driven LLM agent for graph-native tool use; relevant to agentic systems design & control.LLM agents, tool use, reinforcement learning, graph learning, agentic retrieval
2604.11518From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python
PDF
cs.SE, cs.AI90Real production coding agent port; benchmark-driven method + SWE-bench/Terminal-Bench results.agents, coding-agents, SWE-bench, evaluation, software-engineering, LLM-assisted-development
2604.12601LLM-Guided Prompt Evolution for Password Guessing
PDF
cs.CR, cs.AI90LLM prompt evolution boosts password cracking; important offensive-security signal for LLM misuse evalscybersecurity, LLM-misuse, prompt-optimization, red-teaming, password-guessing
2604.12160PubSwap: Public-Data Off-Policy Coordination for Federated RLVR
PDF
cs.LG90Federated RLVR with public off-policy signal sharing; practical for private-data reasoning post-training.RLVR, federated-learning, post-training, LoRA, reasoning, privacy
2604.12459Operationalising the Right to be Forgotten in LLMs: A Lightweight Sequential Unlearning Framework for Privacy-Aligned Deployment in Politically Sensitive Environments
PDF
cs.AI88Practical sequential unlearning for Right-to-be-Forgotten; layer-restricted negative FT on benchmarkunlearning, privacy, right-to-be-forgotten, LLMs, deployment, fine-tuning
2604.06802Riemann-Bench: A Benchmark for Moonshot Mathematics
PDF
cs.AI88Research-level math benchmark beyond olympiad; curated hard problems for frontier reasoning eval.evaluation, math-reasoning, benchmarks, moonshot, LLM-reasoning
2604.11661Towards Autonomous Mechanistic Reasoning in Virtual Cells
PDF
cs.LG, cs.AI88Multi-agent verified mechanistic reasoning + new dataset for grounded scientific agentsagents, verification, grounding, scientific-discovery, dataset, multi-agent
2604.12913CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference
PDF
cs.SE, cs.AI, cs.CR86LLM decompiler refinement targeting hallucinations/semantic mismatch; practical security RE workflow impactcode-LLMs, reverse-engineering, decompilation, hallucinations, rationale-guidance, robust-inference, security
2604.11772Towards Automated Pentesting with Large Language Models
PDF
cs.CR86LLM-assisted pentesting framework; concrete offensive code generation results raise security/dual-use stakescybersecurity, LLMs, pentesting, code-generation, dual-use, PowerShell
2604.12867QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence
PDF
cs.AI86Long-horizon deep-search agent for medical domain with data+training+benchmarks; strong agentic relevanceagents, deep-search, tool-use, medical, benchmarks, post-training
2603.24389When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools
PDF
cs.CL, cs.AI, cs.CY86Large real-world LLM assessment dataset for teacher-child interaction; scalable evaluation implications.LLM, evaluation, education, dataset, human-AI collaboration, Chinese
2604.05767Beyond the Beep: Scalable Collision Anticipation and Real-Time Explainability with BADAS-2.0
PDF
cs.CV, cs.CL86Safety-critical collision anticipation with long-tail benchmark + scalable data curation pipeline.safety, autonomous driving, long-tail evaluation, video understanding, explainability, benchmark
2604.07240$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture
PDF
cs.MS, cs.AI, cs.LG86Open-ended automated discovery benchmark for k-server conjecture; sound refutation-based eval.automated-discovery, math, benchmarks, agents, program-synthesis, evaluation
2604.12748Generating Effective CoT Traces for Mitigating Causal Hallucination
PDF
cs.CL86Targets causal hallucination with generated CoT traces and proposes a new hallucination metric (CHR)hallucinations, reasoning, chain-of-thought, evaluation, dataset-generation
2603.23253On the Vulnerability of FHE Computation to Silent Data Corruption
PDF
cs.CR, cs.AR86Reliability risk for FHE on real hardware; silent corruption is critical for privacy-preserving AI.security, privacy, FHE, reliability, faults, robust-computation
2604.12196Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus 评分
PDF
cs.CL86Training-free best-of-N via embedding consensus; improves reliability beyond majority voting.best-of-n, self-consistency, reliability, decoding, embeddings, selection
2604.12737Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge
PDF
cs.CR, cs.LG84Real red-team setting: DP vs membership inference in federated learning; stacked black-box attack analysisprivacy, membership-inference, federated-learning, differential-privacy, red-teaming, genomics
2604.06712Broken Quantum: A Systematic Formal Verification Study of Security Vulnerabilities Across the Open-Source Quantum Computing Simulator Ecosystem
PDF
cs.CR, cs.SE, quant-ph84Large formal security audit (547 findings) + novel QASM injection; strong, reusable security evidencesecurity, formal-verification, static-analysis, SMT, quantum, software-supply-chain
2604.12446Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling
PDF
cs.CR, cs.CV84Practical input-level backdoor detection for T2I diffusion via cross-attention scaling probes.backdoors, diffusion-models, text-to-image, model-security, detection, cross-attention
2604.12944Distorted or Fabricated? A Survey on Hallucination in Video LLMs
PDF
cs.CV, cs.AI84Survey+taxonomy of hallucinations in Video-LLMs with eval/mitigation overview; reliability-relevanthallucinations, video-llm, evaluation, mitigation, survey, reliability
2603.19169ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis
PDF
cs.CV, cs.AI84Uses DPO + explicit rejection in medical VLM/RL pipeline; reliability-oriented design in high-stakes setting.DPO, rejection, medical AI, VLM, RL, reliability
2603.23043Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts
PDF
cs.LG, cs.AI84OOD robustness eval for climate foundation models under true no-analog shifts; tackles contamination.distribution shift, OOD evaluation, robustness, foundation models, climate
2604.11801CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation
PDF
cs.CL84Dual-head tuning to get calibrated probabilities without losing LLM explanation abilitycalibration, uncertainty, probabilities, fine-tuning, explanations, reliability
2604.01538Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging
PDF
cs.CL, cs.AI84Weight-space model merging to reduce instruction-following forgetting during domain adaptation.model-merging, catastrophic-forgetting, instruction-following, domain-adaptation, LLMs
2604.10905Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
PDF
cs.SD, cs.AI, cs.CL, eess.AS83Major open audio-language model upgrade: 30-min context + timestamped reasoning (temporal CoT).audio-language-models, long-context, multimodal, reasoning, temporal-grounding, datasets
2604.11129DeCoVec: Building Decoding Space based Task Vector for Large Language Models via In-Context Learning
PDF
cs.CL83Training-free task steering via decoding-space vectors from ICL; broadly useful for control/guardrailssteering, task-vectors, in-context-learning, logits, LLM-control

AI 论文洞察简报

2026-04-18

0) 核心要点(先读这个)

  • “主动探测(Active probing)”正在成为一种稳健的安全原语:在扩散模型内部缩放交叉注意力可暴露后门触发器(SET),而精心构造的查询可诱导中间代理轨迹,从而推断多智能体通信拓扑(CIA)。
  • RL 风格的后训练正在从聊天扩展到领域代理与结构化决策流水线:用于临床狭窄定位的 PPO(ARIADNE),用于联邦推理(PubSwap)与医疗深度搜索(QuarkMedSearch)的 GRPO/RLVR,以及用于网页导航鲁棒性的 GRPO(Triton 课程)。
  • 数据/基准设计与模型扩展同等重要:长尾挖掘 + SSL + 蒸馏带来实时碰撞预判(BADAS-2.0);困难负样本 + 拒绝样本 + 合成落地(grounding)驱动网页代理泛化(Triton);私有“登月式”数学基准显示前沿模型仍 <10%(Riemann-Bench)。
  • 可靠性越来越被表述为“选择 + 验证”:基于嵌入一致性的 best-of-N 选择得到提升(RCS);通过双路径生成 + 重新编译检查改进去反编译(CoDe-R);生物推理通过结构化 DAG 轨迹并由专用验证器过滤得到提升(VCR-Agent/VC-Traces)。
  • 评估在稀有错误(rare-error)区间触及基本极限:在低于某个验证下限时,不进行主动查询就会在统计上无法完成校准审计;并且验证成本可能在流水线组合中呈爆炸式增长(Verification Tax)。

2) 关键主题(聚类)

主题:用于安全与模型取证的主动探测

主题:RL/偏好优化作为代理与流水线的“胶水”

主题:长尾鲁棒性 + 边缘部署:数据挖掘、SSL 与蒸馏

  • 重要性:安全关键领域往往在稀有场景失效;扩大数据覆盖并将模型压缩到可实时推理,常比架构微调更有影响。
  • 代表论文
  • 共同方法
    • 定向数据获取(oracle 挖掘 + 地理空间采集;仅历史切分以避免污染)。
    • 领域 SSL 以适配表征(在 225 万未标注驾驶视频上做 V-JEPA 风格 SSL)。
    • 将大教师蒸馏为可部署学生,并衡量延迟/精度权衡。
  • 开放问题 / 失效模式
    • 真实 OOD 下的“精度 vs 稳定性”(ClimaX 误差最低但相对退化更大;降水脆弱)。
    • 仍然困难的长尾类别(BADAS 动物 EWR <80%,即便最大模型也是如此)。
    • 基准现实性:未测试的 OOD 轴(更多 SSPs/GCMs;空间/分辨率变化)。

主题:用于可靠性的验证、选择与结构化输出

主题:基础设施中的隐私与可靠性风险(FHE、量子模拟器、FL)

3) 技术综合

  • 对齐技术正在被重新用作约束执行器:DPO 用于偏好拓扑连通的血管掩膜(ARIADNE),而 ORPO/GRPO 用于强化网页导航中的判别能力与长时程一致性(Triton)。
  • “拒绝/弃权”正在成为一等动作:ARIADNE 的 MDP 包含 Reject 以减少误报;Triton 增加显式 None/拒绝样本;遗忘(unlearning)工作旨在诱导对敏感提示的拒答。
  • 主动 vs 被动评估是反复出现的分界线:SET 与 CIA 通过主动探测/诱导取得成功;Verification Tax 形式化说明当错误稀有时被动审计为何失败。
  • 一致性/质心(center-of-mass)思想以不同形式出现:RCS 在嵌入空间用 Fréchet 均值做 best-of-N;SET 在响应偏移空间学习良性“中心”用于单类检测。
  • 验证器门控训练数据是常见的可靠性杠杆:VC-Traces 用 DTI/DE 验证器过滤机制动作;Triton 的合成 DOM 落地仅在双代理一致时接受;QuarkMedSearch 用严格正确性门控奖励避免 reward hacking。
  • 蒸馏与领域 SSL 配对以满足部署约束:BADAS-2.0 在 225 万未标注视频上做 SSL,然后 KD 到 86M/22M 学生模型,带来显著延迟收益。
  • OOD 鲁棒性正以稳定性而非仅误差来衡量:气候仿真报告在情景迁移下的百分比退化,并强调降水脆弱性。
  • 系统安全正在扩展到“元”属性:CIA 将 MAS 拓扑视为敏感 IP;Broken Quantum 展示与 2^n 扩展相关的生态级漏洞模式。
  • 计算/延迟开销越来越被显式呈现:DeCoVec 报告约 1.6–1.7× 开销;SET 需要多次运行探测;CoDe-R 增加双路径推理;BADAS 报告端到端延迟预算低至几十毫秒。

4) Top 5 论文(含“为何是现在”)

1) The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime

  • 证明被动 ECE 估计速率为 Θ((L·ε/m)^{1/3}),并在 m·ε ≈ 1 附近出现检测相变。
  • 表明无标签自评在最坏情况下无信息;主动查询可提升到 Θ(√(ε/m))。
  • 解释为何许多基准差异在统计上不可区分,以及为何流水线验证会随深度组合性爆炸。
  • 需要保持怀疑:假设(Lipschitz 校准、i.i.d. 样本、分箱 ECE)以及最坏情况组合可能夸大了结构化真实部署中的难度。

2) Scaling Exposes the Trigger: Input-Level Backdoor Detection in T2I Diffusion via Cross-Attention Scaling (SET)

  • 引入 CSRD:在交叉注意力缩放轨迹下,后门提示与良性提示发生发散。
  • 从响应偏移特征构建单类检测器;报告跨攻击平均 AUROC 95.1% 与 ACC 84.8%。
  • 特别针对表面检测器失效的隐蔽隐式触发器。
  • 需要保持怀疑:白盒要求与每输入的计算开销(多缩放、多步探测);评估仅限 SD v1.4 + MS-COCO 提示。

3) Beyond the Beep: BADAS-2.0 collision anticipation + real-time explainability

  • 将标注数据扩展到 178.5k 视频并加入长尾基准;结合领域 SSL + KD 到边缘模型。
  • 报告 Kaggle mAP 0.940(vs 0.925)与显著延迟下降(~2.5s → 每窗口 35ms),支持端侧预算。
  • 增加注意力热力图与 VLM 解释模块(BADAS-Reason)以输出可执行信息。
  • 需要保持怀疑:注意力热力图是 patch 级代理;部分长尾组仍具挑战(如动物 EWR <80%)。

4) From Imitation to Discrimination: Progressive curriculum for robust web navigation (Triton)

  • 数据集工程(困难负样本 + 反事实拒绝 + 双代理验证的合成落地)+ SFT→ORPO→GRPO。
  • 报告 Mind2Web 上 58.7% Step SR,在论文表格中超过 GPT-4.5(42.4%)与 Claude-4.5(41.4%)。
  • 展示“不要点什么”的训练(拒绝)对 DOM 密集页面至关重要。
  • 需要保持怀疑:评估基于静态 Mind2Web 快照;纯文本(无像素线索);GRPO 增加 rollout 成本。

5) ARIADNE: DPO-aligned topology-preserving angiography segmentation + RL stenosis reasoning

  • 将 DPO 应用于偏好对,偏向连通的血管拓扑;提升拓扑敏感指标(clDice 0.8378)。
  • 下游 PPO 代理带 Reject 动作以减少误报(FPPI 0.85 vs ~1.89–2.45 基线),同时保持召回 0.867。
  • 展示一种具体模式:将感知对齐到结构约束,然后在决策时用带非对称临床奖励的 RL。
  • 需要保持怀疑:单机构训练数据;2D 投影歧义;RL 假设每段至多一个主导狭窄;DPO 增加约 ~2.8× 训练时间。

5) 实用的下一步

  • 如果你部署 best-of-N:原型化 RCS 风格的嵌入一致性选择,并在更高 N 下对比自一致性(self-consistency)的增益;跟踪“语义中心”却错误的失败案例。
  • 对于 代理安全评估:将“验证下限(verification floor)”作为一等指标——报告置信区间,并说明差异是否超过由你的错误率与样本量所隐含的 (L·ε/m)^{1/3} 分辨率。
  • 对于 多智能体系统:增加对拓扑泄漏的防御(例如防止中间轨迹诱导;约束输出格式),并用 CIA 风格诱导提示进行红队测试。
  • 对于 扩散模型供应链安全:在具备白盒访问与少量干净参考集时,将 SET 类主动探测纳入模型验收测试。
  • 对于 长时程网页代理:加入显式 reject/None 训练与困难负样本挖掘;不仅评估成功率,也评估在密集页面上的错误动作率。
  • 对于 联邦 RLVR:若你有小型公共提示池,可测试 PubSwap 风格的公共协同;扫描交换频率以量化离策略漂移与通信节省的权衡。
  • 对于 隐私保护计算:若在生产中使用 CKKS/FHE,应为校验和式 ABFT 预留预算(报告称约 ~13–16% 开销),而不是假设密文计算对故障透明。

由逐篇论文分析生成;未进行外部浏览。