2025.08.08 | 动态微调优推理;零数据自演进强推理

2025.08.08 | 动态微调优推理;零数据自演进强推理

7分钟 ·
播放数100
·
评论数0

本期的 15 篇论文如下:

00:16 ✨ On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification(关于SFT泛化性的研究:一个基于奖励修正的强化学习视角)

00:41 🌱 R-Zero: Self-Evolving Reasoning LLM from Zero Data(R-Zero:零数据自演进推理大语言模型)

01:00 🤖 Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation(Genie Envisioner:一个用于机器人操作的统一世界基础平台)

01:27 🤔 DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning(DeepPHY:具身视觉语言模型物理推理基准测试)

01:49 📊 Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity(Hi3DEval:基于分层有效性的3D生成评估进展)

02:12 🤔 Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?(文档检索增强生成评估:我们走在正确的道路上吗?)

02:40 🔍 Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability(大型多模态模型能否主动识别有缺陷的输入?一项对其输入审查能力的系统性评估框架)

03:08 💡 Are Today's LLMs Ready to Explain Well-Being Concepts?(当今大型语言模型能否胜任解释幸福感概念?)

03:30 🚀 CoAct-1: Computer-using Agents with Coding as Actions(CoAct-1:以编程为行动的计算机操作代理)

03:57 🚀 InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities(InfiAlign:可扩展、样本高效的LLM推理能力对齐框架)

04:18 💬 Evaluating, Synthesizing, and Enhancing for Customer Support Conversation(评估、合成与提升客户支持对话)

04:41 💡 Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models(拒绝过度思考:高效R1风格大型推理模型综述)

05:02 🤯 MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes(MOSEv2:复杂场景视频目标分割的更具挑战性数据集)

05:22 🎤 Marco-Voice Technical Report(Marco-Voice 技术报告)

05:47 🎨 StrandDesigner: Towards Practical Strand Generation with Sketch Guidance(StrandDesigner:迈向草图引导的实用毛发生成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递