2026.03.02 | dLLM统一扩散框架；SpatialScore让AI读懂空间 - HuggingFace 每日AI论文速递

【赞助商】

通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事

【目录】

本期的 15 篇论文如下：

00:29 🛠 dLLM: Simple Diffusion Language Modeling（dLLM：简单的扩散语言建模）

01:15 🧠 Enhancing Spatial Understanding in Image Generation via Reward Modeling（通过奖励建模增强图像生成中的空间理解）

02:11 🌍 Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets（在翻译中恢复：自动化基准测试与数据集翻译的高效流程）

03:08 ⚡ CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation（CUDA Agent：用于高性能CUDA内核生成的大规模智能体强化学习系统）

03:59 🎬 Mode Seeking meets Mean Seeking for Fast Long Video Generation（模式寻求与均值寻求相遇：实现快速长视频生成）

04:44 🧩 Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models（组合泛化要求视觉嵌入模型具备线性正交表示）

05:31 ⚡ LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding（LK损失函数：用于推测解码的直接接受率优化）

06:21 🔍 CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era（CiteAudit：你引用了它，但你读过吗？大语言模型时代科学参考文献验证基准）

07:16 ⚡ Accelerating Masked Image Generation by Learning Latent Controlled Dynamics（通过学习潜在控制动力学加速掩码图像生成）

08:00 🧠 Memory Caching: RNNs with Growing Memory（记忆缓存：具有增长记忆能力的循环神经网络）

08:38 📊 InfoNCE Induces Gaussian Distribution（InfoNCE诱导高斯分布）

09:28 🧠 Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks（Ref-Adv：探索多模态大语言模型在指代表达任务中的视觉推理能力）

10:28 ⚡ SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching（SenCache：基于敏感度感知的缓存加速扩散模型推理）

11:15 🎬 LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding（LongVideo-R1：面向低成本长视频理解的智能导航）

11:53 ⚡ Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators（向量化字典树：面向加速器的高效约束解码用于基于LLM的生成式检索）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递