AI 论文日报(2026-03-22)

Published:

English version: /paper-news/2026-03-22/

运行统计

  • 候选论文: 1253
  • 入选论文: 30
  • 已精读完成: 30
  • 时间窗口 (UTC): 2026-03-20T00:00:00Z → 2026-03-21T00:00:00Z (weekend_backlog_unknown, expanded=0)
展开查看用于总结的论文列表
arXiv ID标题 / 链接分类评分入选理由标签
2603.14987Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI
PDF
cs.CL, cs.DB93Argues for representative trustworthiness eval for agentic AI; proposes HAA framework.agent-evaluation, trustworthiness, sociotechnical, benchmarks, agents
2603.19011Security awareness in LLM agents: the NDAI zone case
PDF
cs.CR, cs.AI92Measures whether LLM agents can infer secure vs insecure execution; key for TEE/tool-use safety.agent-security, TEE, situational-awareness, evaluation, tool-use
2603.18577MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning
PDF
cs.AI92Large benchmark + grounded reasoning for medical deepfake detection; strong safety relevancedeepfake-detection, multimodal, benchmark, grounded-reasoning, medical-safety, localization
2603.15542InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social Systems
PDF
cs.CY, cs.AI92InterveneBench: 744 real studies to test LLM causal intervention & design reasoning; strong eval gap.benchmark, evaluation, causal-reasoning, interventions, social-science, LLM
2603.14761BrainBench: Exposing the Commonsense Reasoning Gap in Large Language Models
PDF
cs.AI92New commonsense benchmark; shows big gaps on brainteasers even for frontier LLMs.evaluation, commonsense, reasoning, benchmark, robustness
2603.17623ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery
PDF
cs.LG, cs.CR92Practical gradient inversion attack (no arch mods) reconstructs data from large FL batches.federated-learning, privacy, gradient-inversion, security, data-leakage, attack
2603.14730GNNVerifier: Graph-based Verifier for LLM Task Planning
PDF
cs.LG91Non-LLM graph verifier for LLM plans; targets structural hallucinations & dependency errors in agentsagents, planning, verification, hallucinations, graph-methods, robustness
2603.15397SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration
PDF
cs.CR, cs.AI90Monitors/calibrates unsafe intermediate CoT steps to resist jailbreaks, not just final output.jailbreaks, chain-of-thought, safety-monitoring, calibration, defense
2603.15615Mechanistic Origin of Moral Indifference in Language Models
PDF
cs.CL, cs.AI90Mechanistic study of moral concept collapse + latent “moral indifference”; proposes representation fix.mechanistic-interpretability, alignment, representations, moral-reasoning, safety
2603.18895From Accuracy to Readiness: Metrics and Benchmarks for Human-AI Decision-Making
PDF
cs.HC, cs.AI, cs.LG90Practical readiness metrics for human-AI teaming; targets miscalibrated reliance & safety signals.human-AI teaming, evaluation, calibration, reliance, safety-metrics, deployment
2603.17948VideoAtlas: Navigating Long-Form Video in Logarithmic Compute
PDF
cs.CV, cs.AI90Hierarchical lossless video representation enabling long-video navigation with log compute.long-context, video, agents, memory, efficient-inference, multimodal
2603.18767A Concept is More Than a Word: Diversified Unlearning in Text-to-Image Diffusion Models
PDF
cs.AI89Improves diffusion concept unlearning beyond keywords; reduces brittle/over-forgetting in safety editsdiffusion, unlearning, content-safety, model-editing, robustness
2603.15364CRASH: Cognitive Reasoning Agent for Safety Hazards in Autonomous Driving
PDF
cs.AI, cs.CL89LLM agent for AV incident analysis + curated 2,168-case dataset; practical safety auditingagent, autonomous-driving, safety, incident-analysis, dataset, LLM
2603.15372SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations
PDF
cs.SE, cs.AI, cs.CR88Tool-using LLM agent benchmark with live mock APIs + deterministic rubrics for telecom ops.agents, tool-use, benchmark, evaluation, enterprise, APIs
2603.14778$p^2$RAG: Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval
PDF
cs.CR, cs.AI88Privacy-preserving RAG enabling arbitrary top-k without costly secure sorting; practical for LLM apps.RAG, privacy, secure-retrieval, cryptography, deployment
2603.17759Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor
PDF
cs.CL, cs.AI88Multimodal+multilingual benchmark for harmful humor incl. covert harm; strong safety eval valueAI safety, benchmark, harmful content, multimodal, multilingual, humor, toxicity detection, Arabic
2603.17683Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents
PDF
cs.AI, cs.LG88Structured test-time learning for LLM game agents; curriculum + steerable context control-plane.llm-agents, test-time-learning, curriculum-learning, agent-architecture, memory, evaluation
2603.18680Revisiting Label Inference Attacks in Vertical Federated Learning: 入选理由 They Are Vulnerable and How to Defend
PDF
cs.LG, cs.CR88Reframes label inference in VFL via mutual info; explains vulnerabilities and proposes defenses.vertical-federated-learning, privacy, label-inference, mutual-information, defense
2603.18793Functional Subspace Watermarking for Large Language Models
PDF
cs.CR, cs.AI86LLM watermarking robust to fine-tune/quantize/distill by anchoring signals in functional subspace.watermarking, model-ownership, robustness, LLMs, security
2603.14756Towards Privacy-Preserving Machine Translation at the Inference Stage: A New Task and Benchmark
PDF
cs.CL, cs.AI86Defines inference-time privacy task+benchmark for MT; fills evaluation gap for privacy-preserving NLPprivacy, machine-translation, benchmark, inference, evaluation
2603.14771OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence
PDF
cs.AI86Interactive arena to evolve/benchmark multi-agent collective intelligence; strong eval framing.agents, multi-agent, collective-intelligence, benchmark, evaluation, healthcare
2603.14911Fine-tuning RoBERTa for CVE-to-CWE Classification: A 125M Parameter Model Competitive with LLMs
PDF
cs.CR, cs.CL86CVE→CWE classifier competitive with LLMs; large dataset + strong macro-F1 on rare classes.cybersecurity, vulnerability-classification, CVE, CWE, robustness, dataset
2603.14855PCodeTrans: Translate Decompiled Pseudocode to Compilable and Executable Equivalent
PDF
cs.SE, cs.AI86Feedback + dynamic validation to prevent semantic hallucinations in decompiled code recovery.code, verification, hallucinations, program-synthesis, security
2603.15566Lore: Repurposing Git Commit Messages as a Structured Knowledge Protocol for AI Coding Agents
PDF
cs.SE, cs.AI, eess.SY86Practical protocol to preserve agent coding rationale in git; improves auditability & safer agent workflowscoding-agents, software-engineering, auditability, agent-workflows, knowledge-management, tooling
2603.09253Efficient Reasoning at Fixed Test-Time Cost via Length-Aware Attention Priors and Gain-Aware Training
PDF
cs.LG86Training-only priors for efficient reasoning at fixed test-time compute; broadly reusable.efficient-reasoning, test-time-compute, attention, training-tricks, transformers
2603.18570Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks
PDF
cs.LG, cs.CR85Shows approximate unlearning can be weaponized into attacks; introduces unlearning corruption.machine-unlearning, adversarial-attacks, privacy, GNNs, security
2603.17522Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions
PDF
cs.CL, cs.AI84Broad benchmark of AI-text detectors across domains/LLMs with adversarial conditions; useful for eval.evaluation, AI-generated-text, robustness, adversarial, benchmark
2603.18538Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning
PDF
cs.LG, stat.ME84Active auditing metrics + topology-aware defenses for decentralized FL backdoors; practical security anglefederated-learning, backdoors, auditing, anomaly-detection, security, graph-topology
2603.19182Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
PDF
cs.AI, cs.CL84Process-control architecture to reduce hallucination/adversarial failures; safety-oriented framingLLM-safety, hallucination, robustness, process-supervision, architecture, adversarial
2603.15421CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents
PDF
cs.CL, cs.AI84Agent memory clustering to reduce irrelevant/corrupt context; practical for small-model agentsagents, memory, retrieval, small language models, RAG, context management, robustness

AI 论文洞察简报

2026-03-22

0) 执行要点(先读这个)

  • 验证正在从“再问一个 LLM”转向结构化、可检查的信号:基于图结构的计划验证(带节点/边风险,GNNVerifier)以及逐步 CoT 安全评分 + 干预(SFCoT)都相较仅靠提示词的基线展现出显著的鲁棒性提升。
  • 隐私/安全研究正变得更“系统真实”:私有 RAG 现在高效支持任意大 top‑k(p²RAG);联邦学习攻击去除了“需要修改架构”的假设(ARES);VFL 防御利用标签信息实际“集中在哪儿”(移动 cut layer)。
  • 基准测试更具诊断性(也更多维):BrainBench 将准确性与一致性(随机性)分离;有害幽默加入多模态 + 阿拉伯语 + 隐性伤害;AI 文本检测在长度匹配 + 领域迁移 + 对抗改写下进行压力测试。
  • Agent 可靠性的瓶颈越来越在于表征与记忆组织:CLAG 的簇内局部记忆演化提升 SLM 鲁棒性与时延;“道德冷漠(moral indifference)”工作指出行为对齐可能让潜在几何仍不对齐,并展示基于 SAE 的引导可提升对抗安全指标。
  • 以执行为落地的反馈闭环优于静态检查(代码/安全流水线):PCodeTrans 通过原位二进制替换 + ASan + 差分追踪驱动 LLM 修复,在 coreutils/binutils 上实现接近完美的函数级等价。

2) 关键主题(聚类)

主题:结构化验证与面向过程的 Agent 安全

主题:隐私保护推理与泄露感知的 ML 系统

主题:记忆、长上下文导航与固定算力效率

主题:暴露可靠性缺口的基准(随机性、迁移、隐性伤害)

主题:模型与 ML 流水线的安全与溯源

3) 技术综合

  • “结构优先(structure-first)”是反复出现的模式:计划→图(GNNVerifier)、CoT→步骤(SFCoT)、记忆→簇(CLAG)、视频→递归网格(VideoAtlas)。共同押注是:显式结构带来更好的诊断、门控与算力控制。
  • 当缺少细粒度标注时,合成监督正在成为默认:计划扰动(REPLACE/DROP/COMPRESS)、沙盒场景(HAAF)、合成病人(OpenHospital)、医疗伪造生成(MedForge-90K)。
  • 验证闭环越来越需要验收标准:GNNVerifier 仅在图分数提升时接受编辑;SFCoT 基于逐步安全分数重写/截断;PCodeTrans 迭代直到测试 + ASan/BP-Diff 通过。
  • 算力预算被正式化为一等旋钮:VideoAtlas 深度上界 d;RPA 缓存偏置 + 仅训练期控制器;CLAG 两阶段检索降低搜索空间与时延。
  • 隐私中“信息定位”很关键:VFL 显示标签信息集中在更深/更上层;防御可通过结构性手段(cut-layer 放置)而非仅加噪。
  • 攻击现实性在提升:ARES 假设攻击者可设置权重/偏置(无需改架构)并用稀疏恢复;遗忘污染以法律强制删除为触发;p²RAG 面向任意 top‑k(贴近长上下文实用)。
  • 可靠性正以方差而非仅均值来衡量:BrainBench 的准确率–一致性差距(平均 10.3 个百分点)凸显随机推理是安全/可靠性维度。
  • “评审模型(judge models)”无处不在但角色不同:评分(InterveneBench)、披露评分(NDAI-zone 研究)、推理质量(MedForge)、BrainBench 答案判定——引出关于评审偏差与可复现性的横向担忧。
  • 以执行为落地的评估是强区分点:PCodeTrans 用原始二进制 + 官方测试套件作为 oracle;这是减少代码变换“语义幻觉”的模板。

4) Top 5 论文(含“为什么是现在”)

1) GNNVerifier: Graph-based Verifier for LLM Task Planning(GNNVerifier:用于 LLM 任务规划的图验证器)

  • 引入图结构验证器,对整体计划打分并定位高风险节点/边(工具/步骤不匹配、依赖问题)。
  • 使用合成扰动在缺少真实标注时构造节点/边监督,从而训练诊断头。
  • 展示验证引导的局部编辑(替换/插入),仅当验证器分数提升时接受;报告相较 VeriPlan 在多数据集/规划器上稳定提升。
  • 质疑点:合成错误分布可能与真实规划失败不一致;未做在线工具执行评估。

2) $p^2$RAG: Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval($p^2$RAG:支持任意 Top‑$k$ 检索的隐私保护 RAG 服务)

  • 交互式二分替代安全排序,高效支持任意/大 k——契合长上下文 LLM 趋势。
  • 使用标准 MPC 原语(Shamir sharing、Beaver triples、DCFs),并报告在 k=16–1024 时相较 PRAG 3–300× 加速
  • 给出明确泄露界:物理泄露 O(log²N) + 功能泄露 k+ξ。
  • 质疑点:假设可信 dealer + 两个不串通的半诚实服务器;PIR 与离线阶段未做基准。

3) SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration(SFCoT:通过主动安全评估与校准实现更安全的思维链)

  • 将安全从最终输出过滤前移到逐步 CoT 监控,包含词法/语义/策略评分与灰区校准。
  • 报告显著越狱降低:ASR 58.97% → 12.31%,同时在 MMLU/GSM8K/MBPP 上保持约 91.2% 平均效用。
  • 消融将收益归因于一致性验证器重写干预。
  • 质疑点:未报告运行/时延开销;仅在单一模型(Qwen3-8B)上评估。

4) PCodeTrans: Translate Decompiled Pseudocode to Compilable and Executable Equivalent(PCodeTrans:将反编译伪代码翻译为可编译且可执行的等价实现)

  • 提出原位可替换执行(in-situ substitutable execution):将修复后的函数热替换进原始二进制,用真实执行作为等价性 oracle。
  • 使用 ASan(仅替换部分)+ 断点匹配的差分追踪生成可操作的运行时差异,驱动 LLM 迭代修复。
  • 在 coreutils/binutils(未剥离)上实现100% 函数级编译~99.6–99.9% 行为等价
  • 质疑点:平台特定(Linux ELF/x86_64);间接调用签名恢复与独立重编译仍困难。

5) Mechanistic Origin of Moral Indifference in Language Models(语言模型中“道德冷漠”的机理起源)

  • 将“道德冷漠”诊断为潜在几何问题(类别/梯度/结构/维度),并用基于原型的道德向量真值进行分析。
  • 使用 SAEs + 定向特征微调 + 加性引导(additive steering)提升 Flames 上的对抗安全结果(如 PSC1 908→953;胜率峰值 75.4%)。
  • 将机理可解释性与对齐连接起来,展示对内部特征的因果干预。
  • 质疑点:干预主要在 Qwen3-8B 上展示;仅极少 SAE 特征与道德维度相关;引导对 α 敏感。

5) 实用下一步

  • 如果你在构建工具调用型 agent:原型化一个计划图验证器输出节点/边风险,并用它驱动带验收测试的局部编辑(分数必须提升),对齐 GNNVerifier。
  • 对启用 CoT 的系统做越狱防护:对比有/无逐步 CoT 门控的 ASR;记录逐步安全分数,并在核心任务上量化效用保留(SFCoT 风格)。
  • 对私有 RAG:评估产品是否需要动态/大 top‑k;若需要,在真实 RTT 与 PIR 成本下基准测试阈值/二分式检索 vs 基于排序的安全 top‑k(p²RAG 指出应测什么)。
  • 对联邦/垂直 FL 部署:做按层互信息(MI-by-layer)诊断以定位标签信息集中位置,再测试cut-layer 前移作为零开销缓解——同时衡量特征泄露风险(VFL 论文的权衡)。
  • 对小型 agent 的长上下文记忆:尝试簇内局部记忆演化 + 两阶段检索,同时跟踪答案质量与时延;消融“局部演化 vs 全局检索”(CLAG)。
  • 对评估:在内部推理基准中加入多次运行一致性(不只准确率)(BrainBench 协议),并在依赖 AI 文本检测器时加入领域迁移 + 对抗改写
  • 对溯源/IP:若分发的模型可能被量化/蒸馏,在实际变换流水线下测试子空间水印鲁棒性并保持载荷适度(FSW 暗示 ~16-bit 的实用容量)。

由逐篇论文分析生成;未进行外部浏览。