2025.11.24 | 开源7B模型刷新多模态推理；GeoVista小模型精准地理定位 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:21 🧠 OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe（OpenMMReasoner：以开放通用方案推动多模态推理前沿）

01:04 🌍 GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization（GeoVista：用于地理定位的Web增强智能视觉推理）

01:41 🎯 SAM 3: Segment Anything with Concepts（SAM 3：基于概念的通用分割模型）

02:31 📊 Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story（揭示文本的内在维度：从学术摘要到创意故事）

03:09 🧠 O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents（O-Mem：面向个性化、长周期、自进化智能体的全能记忆系统）

03:43 🦜 Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs（鹦鹉：输出真相的说服与一致性鲁棒性评级——一个面向大语言模型的谄媚鲁棒性基准）

04:26 🧠 RynnVLA-002: A Unified Vision-Language-Action and World Model（RynnVLA-002：统一的视觉-语言-动作与世界模型）

05:19 🧠 VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models（VisMem：潜在视觉记忆解锁视觉语言模型潜力）

05:51 🌍 WorldGen: From Text to Traversable and Interactive 3D Worlds（WorldGen：从文本到可遍历交互式3D世界）

06:34 🎨 Loomis Painter: Reconstructing the Painting Process（Loomis Painter：重建绘画过程）

07:06 🔮 Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight（Mantis：具有解耦视觉预测能力的多功能视觉-语言-动作模型）

07:48 🎨 InstructMix2Mix: Consistent Sparse-View Editing Through Multi-View Model Personalization（InstructMix2Mix：通过多视图模型个性化实现一致的稀疏视图编辑）

08:21 🔬 OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists（全能科学家：迈向人类与AI科学家共同进化的生态系统）

09:07 🧬 MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging（MergeDNA：基于动态标记化的上下文感知基因组建模）

09:41 🔍 Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination（Video-R4：通过视觉反刍增强文本丰富视频推理）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递