2026.04.23 | LLaDA2.0统一多模态；未来经验外挂RL - HuggingFace 每日AI论文速递

【目录】
本期的 15 篇论文如下：
00:28 🔮 LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model（LLaDA2.0-Uni：基于扩散大语言模型统一多模态理解与生成）
01:17 🔮 Near-Future Policy Optimization（近未来策略优化）
02:07 🤖 DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data（DR-Venus：仅用1万条开放数据迈向前沿边缘规模深度研究代理）
02:53 🤖 DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation（DeVI：基于物理的灵巧人机交互通过合成视频模仿）
03:42 🎭 Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges（大模型时代的奖励黑客：机制、涌现性失调与挑战）
04:36 🧠 Exploring Spatial Intelligence from a Generative Perspective（从生成视角探索空间智能）
05:21 🤖 A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression（一种通过观测上下文压缩实现高效终端代理的自演化框架）
06:18 🎤 WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training（WavAlign：通过自适应混合后训练增强口语对话模型的智能与表现力）
07:06 🤖 SWE-chat: Coding Agent Interactions From Real Users in the Wild（SWE-chat：来自真实用户的编码智能体交互记录）
07:53 🤖 Cortex 2.0: Grounding World Models in Real-World Industrial Deployment（Cortex 2.0：在现实工业部署中基于世界模型进行规划）
08:36 🧠 Convergent Evolution: How Different Language Models Learn Similar Number Representations（趋同演化：不同语言模型如何学习相似的数值表示）
09:21 🤝 SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution（SAVOIR：通过沙普利值奖励归因学习社交智慧）
09:57 🎬 ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis（ReImagine：通过图像优先合成重新思考可控的高质量人类视频生成）
10:34 🔧 Visual Reasoning through Tool-supervised Reinforcement Learning（通过工具监督强化学习实现视觉推理）
11:09 🤖 AI scientists produce results without reasoning scientifically（AI科学家产生结果但未进行科学推理）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递