2025.05.09 | 多模态推理模型发展综述;通用智能评估框架提出

2025.05.09 | 多模态推理模型发展综述;通用智能评估框架提出

11分钟 ·
播放数117
·
评论数1

本期的 15 篇论文如下:

00:22 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models(感知、推理、思考与规划:大型多模态推理模型综述)

00:57 🤖 On Path to Multimodal Generalist: General-Level and General-Bench(迈向多模态通用智能:通用水平与通用基准)

01:40 🤖 Flow-GRPO: Training Flow Matching Models via Online RL(Flow-GRPO:通过在线强化学习训练Flow Matching模型)

02:23 🧠 Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models(作为裁判的感知代理:评估大型语言模型中的高阶社会认知)

03:05 🧠 Scalable Chain of Thoughts via Elastic Reasoning(基于弹性推理的可扩展思维链)

03:41 🔍 FG-CLIP: Fine-Grained Visual and Textual Alignment(FG-CLIP:细粒度视觉与文本对齐)

04:19 🏞 3D Scene Generation: A Survey(三维场景生成:综述)

05:02 🧮 ICon: In-Context Contribution for Automatic Data Selection(ICon:用于自动数据选择的上下文贡献度学习)

05:39 🎬 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant(StreamBridge:将离线视频大语言模型转化为主动流式助手)

06:19 🤖 LiftFeat: 3D Geometry-Aware Local Feature Matching(LiftFeat: 三维几何感知局部特征匹配)

06:56 🧱 Generating Physically Stable and Buildable LEGO Designs from Text(基于文本生成物理稳定且可搭建的乐高设计)

07:38 🧠 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains(X-Reasoner:迈向跨模态和领域的通用推理)

08:22 🌐 Crosslingual Reasoning through Test-Time Scaling(基于测试时缩放的跨语言推理)

09:04 🖼 PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes(PlaceIt3D:语言引导的真实3D场景物体放置)

09:42 🌐 BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese(BrowseComp-ZH:中文环境下评估大型语言模型网页浏览能力的基准)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

展开Show Notes
02:21 rl提升3d渲染模型的随机采样能力