AI 论文日报（2026-04-18）

Published: April 18, 2026

English version: /paper-news/2026-04-18/

运行统计

候选论文: 3670
入选论文: 30
已精读完成: 30
时间窗口 (UTC): 2026-04-17T00:00:00Z → 2026-04-18T00:00:00Z (weekend_backlog_sun, expanded=0)

展开查看用于总结的论文列表

arXiv ID	标题 / 链接	分类	评分	入选理由	标签
`2604.12951`	The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime PDF	cs.LG	95	Proves minimax limits for calibration auditing in rare-error regime; big implications for AI eval & governance.	calibration, auditing, evaluation, statistical-limits, rare-errors, reliability
`2604.12548`	DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection PDF	cs.CR	92	Black-box prompt-injection fuzzing combining semantic + char obfuscation; timely robustness eval on DeepSeek	prompt-injection, jailbreaks, robustness-eval, fuzzing, black-box, LLM-security, Chinese-LLMs
`2604.12666`	From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation PDF	cs.LG, cs.CL, cs.HC	90	590k web-agent dataset + hard negatives + curriculum to improve robust web navigation generalization	web-agents, robustness, dataset, hard-negatives, curriculum-learning, evaluation
`2604.12461`	CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems PDF	cs.AI	90	Black-box attack infers LLM multi-agent communication topology; concrete new MAS privacy/security risk.	multi-agent, security, privacy, black-box-attack, topology-inference, LLM-agents
`2604.05846`	AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning PDF	cs.CL	90	RL-driven LLM agent for graph-native tool use; relevant to agentic systems design & control.	LLM agents, tool use, reinforcement learning, graph learning, agentic retrieval
`2604.11518`	From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python PDF	cs.SE, cs.AI	90	Real production coding agent port; benchmark-driven method + SWE-bench/Terminal-Bench results.	agents, coding-agents, SWE-bench, evaluation, software-engineering, LLM-assisted-development
`2604.12601`	LLM-Guided Prompt Evolution for Password Guessing PDF	cs.CR, cs.AI	90	LLM prompt evolution boosts password cracking; important offensive-security signal for LLM misuse evals	cybersecurity, LLM-misuse, prompt-optimization, red-teaming, password-guessing
`2604.12160`	PubSwap: Public-Data Off-Policy Coordination for Federated RLVR PDF	cs.LG	90	Federated RLVR with public off-policy signal sharing; practical for private-data reasoning post-training.	RLVR, federated-learning, post-training, LoRA, reasoning, privacy
`2604.12459`	Operationalising the Right to be Forgotten in LLMs: A Lightweight Sequential Unlearning Framework for Privacy-Aligned Deployment in Politically Sensitive Environments PDF	cs.AI	88	Practical sequential unlearning for Right-to-be-Forgotten; layer-restricted negative FT on benchmark	unlearning, privacy, right-to-be-forgotten, LLMs, deployment, fine-tuning
`2604.06802`	Riemann-Bench: A Benchmark for Moonshot Mathematics PDF	cs.AI	88	Research-level math benchmark beyond olympiad; curated hard problems for frontier reasoning eval.	evaluation, math-reasoning, benchmarks, moonshot, LLM-reasoning
`2604.11661`	Towards Autonomous Mechanistic Reasoning in Virtual Cells PDF	cs.LG, cs.AI	88	Multi-agent verified mechanistic reasoning + new dataset for grounded scientific agents	agents, verification, grounding, scientific-discovery, dataset, multi-agent
`2604.12913`	CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference PDF	cs.SE, cs.AI, cs.CR	86	LLM decompiler refinement targeting hallucinations/semantic mismatch; practical security RE workflow impact	code-LLMs, reverse-engineering, decompilation, hallucinations, rationale-guidance, robust-inference, security
`2604.11772`	Towards Automated Pentesting with Large Language Models PDF	cs.CR	86	LLM-assisted pentesting framework; concrete offensive code generation results raise security/dual-use stakes	cybersecurity, LLMs, pentesting, code-generation, dual-use, PowerShell
`2604.12867`	QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence PDF	cs.AI	86	Long-horizon deep-search agent for medical domain with data+training+benchmarks; strong agentic relevance	agents, deep-search, tool-use, medical, benchmarks, post-training
`2603.24389`	When AI Meets Early Childhood Education: Large Language Models as Assessment Teammates in Chinese Preschools PDF	cs.CL, cs.AI, cs.CY	86	Large real-world LLM assessment dataset for teacher-child interaction; scalable evaluation implications.	LLM, evaluation, education, dataset, human-AI collaboration, Chinese
`2604.05767`	Beyond the Beep: Scalable Collision Anticipation and Real-Time Explainability with BADAS-2.0 PDF	cs.CV, cs.CL	86	Safety-critical collision anticipation with long-tail benchmark + scalable data curation pipeline.	safety, autonomous driving, long-tail evaluation, video understanding, explainability, benchmark
`2604.07240`	$k$-server-bench: Automating Potential Discovery for the $k$-Server Conjecture PDF	cs.MS, cs.AI, cs.LG	86	Open-ended automated discovery benchmark for k-server conjecture; sound refutation-based eval.	automated-discovery, math, benchmarks, agents, program-synthesis, evaluation
`2604.12748`	Generating Effective CoT Traces for Mitigating Causal Hallucination PDF	cs.CL	86	Targets causal hallucination with generated CoT traces and proposes a new hallucination metric (CHR)	hallucinations, reasoning, chain-of-thought, evaluation, dataset-generation
`2603.23253`	On the Vulnerability of FHE Computation to Silent Data Corruption PDF	cs.CR, cs.AR	86	Reliability risk for FHE on real hardware; silent corruption is critical for privacy-preserving AI.	security, privacy, FHE, reliability, faults, robust-computation
`2604.12196`	Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus 评分 PDF	cs.CL	86	Training-free best-of-N via embedding consensus; improves reliability beyond majority voting.	best-of-n, self-consistency, reliability, decoding, embeddings, selection
`2604.12737`	Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge PDF	cs.CR, cs.LG	84	Real red-team setting: DP vs membership inference in federated learning; stacked black-box attack analysis	privacy, membership-inference, federated-learning, differential-privacy, red-teaming, genomics
`2604.06712`	Broken Quantum: A Systematic Formal Verification Study of Security Vulnerabilities Across the Open-Source Quantum Computing Simulator Ecosystem PDF	cs.CR, cs.SE, quant-ph	84	Large formal security audit (547 findings) + novel QASM injection; strong, reusable security evidence	security, formal-verification, static-analysis, SMT, quantum, software-supply-chain
`2604.12446`	Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling PDF	cs.CR, cs.CV	84	Practical input-level backdoor detection for T2I diffusion via cross-attention scaling probes.	backdoors, diffusion-models, text-to-image, model-security, detection, cross-attention
`2604.12944`	Distorted or Fabricated? A Survey on Hallucination in Video LLMs PDF	cs.CV, cs.AI	84	Survey+taxonomy of hallucinations in Video-LLMs with eval/mitigation overview; reliability-relevant	hallucinations, video-llm, evaluation, mitigation, survey, reliability
`2603.19169`	ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography Analysis PDF	cs.CV, cs.AI	84	Uses DPO + explicit rejection in medical VLM/RL pipeline; reliability-oriented design in high-stakes setting.	DPO, rejection, medical AI, VLM, RL, reliability
`2603.23043`	Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts PDF	cs.LG, cs.AI	84	OOD robustness eval for climate foundation models under true no-analog shifts; tackles contamination.	distribution shift, OOD evaluation, robustness, foundation models, climate
`2604.11801`	CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation PDF	cs.CL	84	Dual-head tuning to get calibrated probabilities without losing LLM explanation ability	calibration, uncertainty, probabilities, fine-tuning, explanations, reliability
`2604.01538`	Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging PDF	cs.CL, cs.AI	84	Weight-space model merging to reduce instruction-following forgetting during domain adaptation.	model-merging, catastrophic-forgetting, instruction-following, domain-adaptation, LLMs
`2604.10905`	Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music PDF	cs.SD, cs.AI, cs.CL, eess.AS	83	Major open audio-language model upgrade: 30-min context + timestamped reasoning (temporal CoT).	audio-language-models, long-context, multimodal, reasoning, temporal-grounding, datasets
`2604.11129`	DeCoVec: Building Decoding Space based Task Vector for Large Language Models via In-Context Learning PDF	cs.CL	83	Training-free task steering via decoding-space vectors from ICL; broadly useful for control/guardrails	steering, task-vectors, in-context-learning, logits, LLM-control

AI 论文洞察简报

2026-04-18

0) 核心要点（先读这个）

“主动探测（Active probing）”正在成为一种稳健的安全原语：在扩散模型内部缩放交叉注意力可暴露后门触发器（SET），而精心构造的查询可诱导中间代理轨迹，从而推断多智能体通信拓扑（CIA）。
RL 风格的后训练正在从聊天扩展到领域代理与结构化决策流水线：用于临床狭窄定位的 PPO（ARIADNE），用于联邦推理（PubSwap）与医疗深度搜索（QuarkMedSearch）的 GRPO/RLVR，以及用于网页导航鲁棒性的 GRPO（Triton 课程）。
数据/基准设计与模型扩展同等重要：长尾挖掘 + SSL + 蒸馏带来实时碰撞预判（BADAS-2.0）；困难负样本 + 拒绝样本 + 合成落地（grounding）驱动网页代理泛化（Triton）；私有“登月式”数学基准显示前沿模型仍 <10%（Riemann-Bench）。
可靠性越来越被表述为“选择 + 验证”：基于嵌入一致性的 best-of-N 选择得到提升（RCS）；通过双路径生成 + 重新编译检查改进去反编译（CoDe-R）；生物推理通过结构化 DAG 轨迹并由专用验证器过滤得到提升（VCR-Agent/VC-Traces）。
评估在稀有错误（rare-error）区间触及基本极限：在低于某个验证下限时，不进行主动查询就会在统计上无法完成校准审计；并且验证成本可能在流水线组合中呈爆炸式增长（Verification Tax）。

2) 关键主题（聚类）

主题：用于安全与模型取证的主动探测

重要性：被动检测器常常难以对抗隐蔽攻击；主动扰动内部或诱导隐藏轨迹可以揭示攻击者难以掩盖的稳定信号。
代表论文：
共同方法：
- 主动探测系统（注意力缩放；对抗性查询约束），而不是依赖静态特征。
- 将检测/推断归约为紧凑表示（响应偏移向量；去偏嵌入）与简单决策规则（单类边界；相似度阈值）。
- 在多种攻击家族上评估，并包含消融实验以展示哪些探测维度关键。
开放问题 / 失效模式：
- 白盒假设与探测成本（SET 需要多次去噪步/缩放运行）。
- 自适应攻击者：他们能否正则化消除类似 CSRD 的发散，或抵抗推理输出诱导？
- 迁移性：结果展示在特定目标上（Stable Diffusion v1.4；特定 MAS 生成器；DeepSeek）。

主题：RL/偏好优化作为代理与流水线的“胶水”

重要性：当系统变为多阶段（检索 → 推理 → 行动）时，仅靠监督模仿会在拒绝、效率与长时程行为上训练不足；RL 风格目标正被用于塑造这些属性。
代表论文：
共同方法：
- 使用 GRPO/RLVR 优化可验证奖励（数学/医疗推理；工具使用正确性门控）。
- 增加显式拒绝/终止动作或奖励塑形，以减少误报与浪费的工具调用。
- 将 RL 与课程学习结合（SFT → ORPO/GRPO；短 → 长轨迹）。
开放问题 / 失效模式：
- 离策略漂移与协同稳定性（PubSwap 的公共步骤复用；对交换频率敏感）。
- 奖励投机（reward hacking）与严格门控之间的权衡（QuarkMedSearch 强调正确性门控的格式奖励）。
- 超出已评测环境的泛化（Mind2Web 静态快照；医疗搜索基准范围）。

主题：长尾鲁棒性 + 边缘部署：数据挖掘、SSL 与蒸馏

重要性：安全关键领域往往在稀有场景失效；扩大数据覆盖并将模型压缩到可实时推理，常比架构微调更有影响。
代表论文：
- Beyond the Beep: Scalable Collision Anticipation and Real-Time Explainability with BADAS-2.0
- Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts
共同方法：
- 定向数据获取（oracle 挖掘 + 地理空间采集；仅历史切分以避免污染）。
- 领域 SSL 以适配表征（在 225 万未标注驾驶视频上做 V-JEPA 风格 SSL）。
- 将大教师蒸馏为可部署学生，并衡量延迟/精度权衡。
开放问题 / 失效模式：
- 真实 OOD 下的“精度 vs 稳定性”（ClimaX 误差最低但相对退化更大；降水脆弱）。
- 仍然困难的长尾类别（BADAS 动物 EWR <80%，即便最大模型也是如此）。
- 基准现实性：未测试的 OOD 轴（更多 SSPs/GCMs；空间/分辨率变化）。

主题：用于可靠性的验证、选择与结构化输出

重要性：随着原始准确率上升，剩余错误更稀有且更难检测；系统越来越需要选择机制、结构化输出与验证器来保持可信。
代表论文：
共同方法：
- 用基于语义结构的 best-of-N 选择替代“单一答案”（RCS）。
- 将输出约束为可验证格式（机制性 DAG 动作；可重新编译代码），并用领域验证器过滤。
- 显式建模审计的样本复杂度，并在可能时偏好主动测试（Verification Tax）。
开放问题 / 失效模式：
- 基于嵌入的一致性仍可能偏向“居中但错误”的答案；加权方案很关键（RCSfreq 偏置）。
- 验证器覆盖缺口（VC-Traces 过滤主要使用 DTI/DE；其他动作原语未验证）。
- 基本审计极限意味着许多“小幅提升”在没有主动协议时低于分辨率。

主题：基础设施中的隐私与可靠性风险（FHE、量子模拟器、FL）

重要性：当隐私保护计算与科学计算栈成为 AI 依赖时，其失效模式（静默损坏、内存安全、DP 下泄漏）会成为系统级风险。
代表论文：
共同方法：
- 经验性故障/攻击测量（在 CKKS 中做故障注入；在 NIST 基准上叠加 MIA）。
- 形式化/静态分析并证明可达性（SMT/Z3 验证漏洞模式）。
- 量化防御权衡（DMR vs 校验和开销；DP ε vs 泄漏 vs 效用）。
开放问题 / 失效模式：
- 跨硬件/方案的普适性（FHE 研究为 CKKS/OpenFHE + Xeon；单比特单故障模型）。
- 生态修复与供应链传播（量子模拟器中被 vendored 的漏洞）。
- 泄漏仍存在的 DP 设置（ε=200 在集成 MIA 下仍保留可测泄漏）。

3) 技术综合

对齐技术正在被重新用作约束执行器：DPO 用于偏好拓扑连通的血管掩膜（ARIADNE），而 ORPO/GRPO 用于强化网页导航中的判别能力与长时程一致性（Triton）。
“拒绝/弃权”正在成为一等动作：ARIADNE 的 MDP 包含 Reject 以减少误报；Triton 增加显式 None/拒绝样本；遗忘（unlearning）工作旨在诱导对敏感提示的拒答。
主动 vs 被动评估是反复出现的分界线：SET 与 CIA 通过主动探测/诱导取得成功；Verification Tax 形式化说明当错误稀有时被动审计为何失败。
一致性/质心（center-of-mass）思想以不同形式出现：RCS 在嵌入空间用 Fréchet 均值做 best-of-N；SET 在响应偏移空间学习良性“中心”用于单类检测。
验证器门控训练数据是常见的可靠性杠杆：VC-Traces 用 DTI/DE 验证器过滤机制动作；Triton 的合成 DOM 落地仅在双代理一致时接受；QuarkMedSearch 用严格正确性门控奖励避免 reward hacking。
蒸馏与领域 SSL 配对以满足部署约束：BADAS-2.0 在 225 万未标注视频上做 SSL，然后 KD 到 86M/22M 学生模型，带来显著延迟收益。
OOD 鲁棒性正以稳定性而非仅误差来衡量：气候仿真报告在情景迁移下的百分比退化，并强调降水脆弱性。
系统安全正在扩展到“元”属性：CIA 将 MAS 拓扑视为敏感 IP；Broken Quantum 展示与 2^n 扩展相关的生态级漏洞模式。
计算/延迟开销越来越被显式呈现：DeCoVec 报告约 1.6–1.7× 开销；SET 需要多次运行探测；CoDe-R 增加双路径推理；BADAS 报告端到端延迟预算低至几十毫秒。

4) Top 5 论文（含“为何是现在”）

1) The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime

证明被动 ECE 估计速率为 Θ((L·ε/m)^{1/3})，并在 m·ε ≈ 1 附近出现检测相变。
表明无标签自评在最坏情况下无信息；主动查询可提升到 Θ(√(ε/m))。
解释为何许多基准差异在统计上不可区分，以及为何流水线验证会随深度组合性爆炸。
需要保持怀疑：假设（Lipschitz 校准、i.i.d. 样本、分箱 ECE）以及最坏情况组合可能夸大了结构化真实部署中的难度。

2) Scaling Exposes the Trigger: Input-Level Backdoor Detection in T2I Diffusion via Cross-Attention Scaling (SET)

引入 CSRD：在交叉注意力缩放轨迹下，后门提示与良性提示发生发散。
从响应偏移特征构建单类检测器；报告跨攻击平均 AUROC 95.1% 与 ACC 84.8%。
特别针对表面检测器失效的隐蔽隐式触发器。
需要保持怀疑：白盒要求与每输入的计算开销（多缩放、多步探测）；评估仅限 SD v1.4 + MS-COCO 提示。

3) Beyond the Beep: BADAS-2.0 collision anticipation + real-time explainability

将标注数据扩展到 178.5k 视频并加入长尾基准；结合领域 SSL + KD 到边缘模型。
报告 Kaggle mAP 0.940（vs 0.925）与显著延迟下降（~2.5s → 每窗口 35ms），支持端侧预算。
增加注意力热力图与 VLM 解释模块（BADAS-Reason）以输出可执行信息。
需要保持怀疑：注意力热力图是 patch 级代理；部分长尾组仍具挑战（如动物 EWR <80%）。

4) From Imitation to Discrimination: Progressive curriculum for robust web navigation (Triton)

数据集工程（困难负样本 + 反事实拒绝 + 双代理验证的合成落地）+ SFT→ORPO→GRPO。
报告 Mind2Web 上 58.7% Step SR，在论文表格中超过 GPT-4.5（42.4%）与 Claude-4.5（41.4%）。
展示“不要点什么”的训练（拒绝）对 DOM 密集页面至关重要。
需要保持怀疑：评估基于静态 Mind2Web 快照；纯文本（无像素线索）；GRPO 增加 rollout 成本。

5) ARIADNE: DPO-aligned topology-preserving angiography segmentation + RL stenosis reasoning

将 DPO 应用于偏好对，偏向连通的血管拓扑；提升拓扑敏感指标（clDice 0.8378）。
下游 PPO 代理带 Reject 动作以减少误报（FPPI 0.85 vs ~1.89–2.45 基线），同时保持召回 0.867。
展示一种具体模式：将感知对齐到结构约束，然后在决策时用带非对称临床奖励的 RL。
需要保持怀疑：单机构训练数据；2D 投影歧义；RL 假设每段至多一个主导狭窄；DPO 增加约 ~2.8× 训练时间。

5) 实用的下一步

如果你部署 best-of-N：原型化 RCS 风格的嵌入一致性选择，并在更高 N 下对比自一致性（self-consistency）的增益；跟踪“语义中心”却错误的失败案例。
对于 代理安全评估：将“验证下限（verification floor）”作为一等指标——报告置信区间，并说明差异是否超过由你的错误率与样本量所隐含的 (L·ε/m)^{1/3} 分辨率。
对于 多智能体系统：增加对拓扑泄漏的防御（例如防止中间轨迹诱导；约束输出格式），并用 CIA 风格诱导提示进行红队测试。
对于 扩散模型供应链安全：在具备白盒访问与少量干净参考集时，将 SET 类主动探测纳入模型验收测试。
对于 长时程网页代理：加入显式 reject/None 训练与困难负样本挖掘；不仅评估成功率，也评估在密集页面上的错误动作率。
对于 联邦 RLVR：若你有小型公共提示池，可测试 PubSwap 风格的公共协同；扫描交换频率以量化离策略漂移与通信节省的权衡。
对于 隐私保护计算：若在生产中使用 CKKS/FHE，应为校验和式 ABFT 预留预算（报告称约 ~13–16% 开销），而不是假设密文计算对故障透明。

由逐篇论文分析生成；未进行外部浏览。

Di Tang

AI 论文洞察简报

2026-04-18

0) 核心要点（先读这个）

2) 关键主题（聚类）

主题：用于安全与模型取证的主动探测

主题：RL/偏好优化作为代理与流水线的“胶水”

主题：长尾鲁棒性 + 边缘部署：数据挖掘、SSL 与蒸馏

主题：用于可靠性的验证、选择与结构化输出

主题：基础设施中的隐私与可靠性风险（FHE、量子模拟器、FL）

3) 技术综合

4) Top 5 论文（含“为何是现在”）

5) 实用的下一步