2025.08.19 | Ovis2.5提升多模态；ComoRAG优化长叙事推理 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:20 ✨ Ovis2.5 Technical Report（Ovis2.5 技术报告）

00:51 🧠 ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning（ComoRAG：一种认知启发式记忆组织RAG，用于有状态长叙事推理）

01:14 🎥 4DNeX: Feed-Forward 4D Generative Modeling Made Easy（4DNeX：前馈4D生成建模轻松实现）

01:38 ✨ Next Visual Granularity Generation（下一视觉粒度生成）

01:57 ⚡ Speed Always Wins: A Survey on Efficient Architectures for Large Language Models（速度至上：大型语言模型高效架构综述）

02:30 🤔 Has GPT-5 Achieved Spatial Intelligence? An Empirical Study（GPT-5是否已实现空间智能？一项实证研究）

03:00 🎮 HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds（HeroBench：虚拟世界中长周期规划与结构化推理的基准测试）

03:26 ❗ When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs（当标点符号至关重要时：大型语言模型提示鲁棒性方法的大规模比较）

03:56 🎮 Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model（矩阵游戏 2.0：一个开源、实时、流式的交互式世界模型）

04:21 💡 Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models（Lumen：基于视频生成模型的一致性视频重打光与和谐背景替换）

04:47 🌐 G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration（G-CUT3R：融合相机与深度先验的引导式三维重建）

05:15 ✨ S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models（S^2-Guidance：扩散模型无训练增强的随机自引导）

05:49 👂 Representing Speech Through Autoregressive Prediction of Cochlear Tokens（通过自回归预测耳蜗令牌实现语音表征）

06:09 💡 Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping（逆向LLaVA：通过文本到视觉映射消除对齐预训练）

06:40 🎬 Precise Action-to-Video Generation Through Visual Action Prompts（通过视觉动作提示实现精确的动作到视频生成）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递