本期的 15 篇论文如下:
00:22 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models(感知、推理、思考与规划:大型多模态推理模型综述)
00:57 🤖 On Path to Multimodal Generalist: General-Level and General-Bench(迈向多模态通用智能:通用水平与通用基准)
01:40 🤖 Flow-GRPO: Training Flow Matching Models via Online RL(Flow-GRPO:通过在线强化学习训练Flow Matching模型)
02:23 🧠 Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models(作为裁判的感知代理:评估大型语言模型中的高阶社会认知)
03:05 🧠 Scalable Chain of Thoughts via Elastic Reasoning(基于弹性推理的可扩展思维链)
03:41 🔍 FG-CLIP: Fine-Grained Visual and Textual Alignment(FG-CLIP:细粒度视觉与文本对齐)
04:19 🏞 3D Scene Generation: A Survey(三维场景生成:综述)
05:02 🧮 ICon: In-Context Contribution for Automatic Data Selection(ICon:用于自动数据选择的上下文贡献度学习)
05:39 🎬 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant(StreamBridge:将离线视频大语言模型转化为主动流式助手)
06:19 🤖 LiftFeat: 3D Geometry-Aware Local Feature Matching(LiftFeat: 三维几何感知局部特征匹配)
06:56 🧱 Generating Physically Stable and Buildable LEGO Designs from Text(基于文本生成物理稳定且可搭建的乐高设计)
07:38 🧠 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains(X-Reasoner:迈向跨模态和领域的通用推理)
08:22 🌐 Crosslingual Reasoning through Test-Time Scaling(基于测试时缩放的跨语言推理)
09:04 🖼 PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes(PlaceIt3D:语言引导的真实3D场景物体放置)
09:42 🌐 BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese(BrowseComp-ZH:中文环境下评估大型语言模型网页浏览能力的基准)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递