【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:28 🧠 LMEB: Long-horizon Memory Embedding Benchmark(LMEB:长时程记忆嵌入基准)
01:12 🔄 Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation(Cheers:通过解耦补丁细节与语义表征实现统一的多模态理解与生成)
01:59 🐳 daVinci-Env: Open SWE Environment Synthesis at Scale(daVinci-Env:大规模开源软件工程环境合成)
02:46 🔍 Can Vision-Language Models Solve the Shell Game?(视觉语言模型能破解“猜球游戏”吗?)
03:26 ⚡ OmniForcing: Unleashing Real-time Joint Audio-Visual Generation(OmniForcing:释放实时联合视听生成)
04:14 🎯 Visual-ERM: Reward Modeling for Visual Equivalence(Visual-ERM:面向视觉等价性的奖励建模)
05:11 🔍 MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning(MM-CondChain:一个经程序验证的视觉基础深度组合推理基准)
06:18 🌉 V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration(V-Bridge:将视频生成先验桥接至通用少样本图像复原)
07:05 🔍 Multimodal OCR: Parse Anything from Documents(多模态OCR:从文档中解析一切)
07:49 🧠 Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously(视频流式思考:VideoLLMs能够边观看边推理)
08:22 ⚠ HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios(HomeSafe-Bench:评估视觉语言模型在家庭场景具身智能体不安全动作检测中的表现)
09:13 🔍 From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space(从稀疏到稠密:通过增强条件空间实现流模型的多视图GRPO)
09:59 ⚡ HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration(HybridStitch:用于扩散加速的像素与时间步级别模型拼接)
11:04 🧠 Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation(史蒂夫进化:通过细粒度诊断与双轨知识蒸馏实现开放世界具身自我进化)
11:54 🎬 VQQA: An Agentic Approach for Video Evaluation and Quality Improvement(VQQA:一种用于视频评估与质量提升的智能体方法)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
