2025.09.03 | 智能体RL提升大模型自主性;SimpleTIR解多轮工具推理

2025.09.03 | 智能体RL提升大模型自主性;SimpleTIR解多轮工具推理

7分钟 ·
播放数149
·
评论数1

本期的 15 篇论文如下:

00:19 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey(面向大语言模型的智能体强化学习全景:一项综述)

00:40 🚀 SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning(SimpleTIR:面向多轮工具集成推理的端到端强化学习)

01:12 🤖 UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning(UI-TARS-2技术报告:通过多轮强化学习推进GUI代理)

01:41 🎥 ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding(ELV-Halluc:长视频理解中的语义聚合幻觉基准测试)

02:12 🔄 LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model(LLaVA-Critic-R1:你的评论模型其实是一个强大的策略模型)

02:43 🔧 VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use(VerlTool:迈向整体性代理强化学习与工具使用)

03:11 📄 POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion(POINTS-Reader:无蒸馏适配的视觉-语言模型用于文档转换)

03:33 🩺 Baichuan-M2: Scaling Medical Capability with Large Verifier System(百川-M2:通过大规模验证系统扩展医疗能力)

03:57 🎥 Kwai Keye-VL 1.5 Technical Report(快手 Keye-VL 1.5 技术报告)

04:20 🤖 Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR(通过监督学习框架实现隐式Actor-Critic耦合用于RLVR)

04:45 🧠 Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic(推理向量:通过任务算术传递思维链能力)

05:11 🔄 Jointly Reinforcing Diversity and Quality in Language Model Generations(在语言模型生成中联合强化多样性与质量)

05:42 🚀 DCPO: Dynamic Clipping Policy Optimization(DCPO: 动态裁剪策略优化)

06:04 🚀 OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning(OpenVision 2:用于多模态学习的生成式预训练视觉编码器系列)

06:27 🎬 GenCompositor: Generative Video Compositing with Diffusion Transformer(GenCompositor:基于扩散变换器的生成式视频合成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

展开Show Notes
Elstevo
Elstevo
2025.9.04
咱要不换个豆包主播吧,这个tts也太机械化了哈哈哈