2026.03.16 | LMEB填补长记忆评测盲区；Cheers解耦语义与细节实现多模态统一 - HuggingFace 每日AI论文速递

【赞助商】

通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事

【目录】

本期的 15 篇论文如下：

00:28 🧠 LMEB: Long-horizon Memory Embedding Benchmark（LMEB：长时程记忆嵌入基准）

01:12 🔄 Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation（Cheers：通过解耦补丁细节与语义表征实现统一的多模态理解与生成）

01:59 🐳 daVinci-Env: Open SWE Environment Synthesis at Scale（daVinci-Env：大规模开源软件工程环境合成）

02:46 🔍 Can Vision-Language Models Solve the Shell Game?（视觉语言模型能破解“猜球游戏”吗？）

03:26 ⚡ OmniForcing: Unleashing Real-time Joint Audio-Visual Generation（OmniForcing：释放实时联合视听生成）

04:14 🎯 Visual-ERM: Reward Modeling for Visual Equivalence（Visual-ERM：面向视觉等价性的奖励建模）

05:11 🔍 MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning（MM-CondChain：一个经程序验证的视觉基础深度组合推理基准）

06:18 🌉 V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration（V-Bridge：将视频生成先验桥接至通用少样本图像复原）

07:05 🔍 Multimodal OCR: Parse Anything from Documents（多模态OCR：从文档中解析一切）

07:49 🧠 Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously（视频流式思考：VideoLLMs能够边观看边推理）

08:22 ⚠ HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios（HomeSafe-Bench：评估视觉语言模型在家庭场景具身智能体不安全动作检测中的表现）

09:13 🔍 From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space（从稀疏到稠密：通过增强条件空间实现流模型的多视图GRPO）

09:59 ⚡ HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration（HybridStitch：用于扩散加速的像素与时间步级别模型拼接）

11:04 🧠 Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation（史蒂夫进化：通过细粒度诊断与双轨知识蒸馏实现开放世界具身自我进化）

11:54 🎬 VQQA: An Agentic Approach for Video Evaluation and Quality Improvement（VQQA：一种用于视频评估与质量提升的智能体方法）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递