2025.05.09 | 多模态推理模型发展综述；通用智能评估框架提出 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:22 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models（感知、推理、思考与规划：大型多模态推理模型综述）

00:57 🤖 On Path to Multimodal Generalist: General-Level and General-Bench（迈向多模态通用智能：通用水平与通用基准）

01:40 🤖 Flow-GRPO: Training Flow Matching Models via Online RL（Flow-GRPO：通过在线强化学习训练Flow Matching模型）

02:23 🧠 Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models（作为裁判的感知代理：评估大型语言模型中的高阶社会认知）

03:05 🧠 Scalable Chain of Thoughts via Elastic Reasoning（基于弹性推理的可扩展思维链）

03:41 🔍 FG-CLIP: Fine-Grained Visual and Textual Alignment（FG-CLIP：细粒度视觉与文本对齐）

04:19 🏞 3D Scene Generation: A Survey（三维场景生成：综述）

05:02 🧮 ICon: In-Context Contribution for Automatic Data Selection（ICon：用于自动数据选择的上下文贡献度学习）

05:39 🎬 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant（StreamBridge：将离线视频大语言模型转化为主动流式助手）

06:19 🤖 LiftFeat: 3D Geometry-Aware Local Feature Matching（LiftFeat: 三维几何感知局部特征匹配）

06:56 🧱 Generating Physically Stable and Buildable LEGO Designs from Text（基于文本生成物理稳定且可搭建的乐高设计）

07:38 🧠 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains（X-Reasoner：迈向跨模态和领域的通用推理）

08:22 🌐 Crosslingual Reasoning through Test-Time Scaling（基于测试时缩放的跨语言推理）

09:04 🖼 PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes（PlaceIt3D：语言引导的真实3D场景物体放置）

09:42 🌐 BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese（BrowseComp-ZH：中文环境下评估大型语言模型网页浏览能力的基准）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递