本期的 15 篇论文如下:
00:22 🤖 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models(用于推理语言模型的强化学习的熵机制)
00:56 🛣 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing(R2R:通过大小模型令牌路由高效导航不同的推理路径)
01:40 🧠 Skywork Open Reasoner 1 Technical Report(Skywork开放推理器1技术报告)
02:20 🔍 Sherlock: Self-Correcting Reasoning in Vision-Language Models(夏洛克:视觉-语言模型中的自我纠正推理)
02:55 🤖 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO(基于GRPO的无监督后训练提升多模态LLM推理能力)
03:35 🤖 SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents(SWE-rebench:一个用于软件工程代理任务收集和去污染评估的自动化流程)
04:25 🚀 SageAttention2++: A More Efficient Implementation of SageAttention2(SageAttention2++:一种更高效的SageAttention2实现)
05:12 🧠 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start(通过强化学习与冷启动推进多模态推理)
05:59 🎬 Fostering Video Reasoning via Next-Event Prediction(通过预测下一事件促进视频推理)
06:42 💡 RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination(RenderFormer:基于Transformer的三角形网格全局光照神经渲染)
07:25 🔬 DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research(DeepResearchGym:一个免费、透明且可复现的深度研究评估沙盒)
08:16 🖼 Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment(链式缩放:通过尺度自回归和偏好对齐实现极限超分辨率)
08:58 🧩 Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs(通用推理器:一个用于冻结LLM的单一、可组合的即插即用推理器)
09:38 🚚 SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem(SVRPBench:一个面向随机车辆路径问题的真实基准)
10:26 🌐 Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models(跨语言质量评估:一种基于语言模型的多语种预训练数据过滤方法)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
