本期的 15 篇论文如下:
00:22 🤖 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play(Voila:用于实时自主交互和语音角色扮演的语音-语言基础模型)
01:09 🤔 RM-R1: Reward Modeling as Reasoning(RM-R1:将奖励建模视为推理)
01:52 🧠 Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers(野外Grokking:用于Transformer真实世界多跳推理的数据增强)
02:32 🧮 FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models(FormalMATH:大规模语言模型的形式化数学推理基准)
03:17 ✂ ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations(ReplaceMe:基于层剪枝和线性变换的网络简化)
03:59 🧠 Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL(通过拒绝采样和强化学习中的梯度方差最小化优化思维链推理器)
04:39 🚀 Practical Efficiency of Muon for Pretraining(Muon在预训练中的实际效率)
05:18 ⚙ A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency(大语言模型推理引擎综述:优化与效率的视角)
06:01 🤖 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning(R1-奖励:通过稳定强化学习训练多模态奖励模型)
06:44 🤔 Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents(随机应变:基于强化学习的社交智能体自适应思考)
07:24 🤖 SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations(SkillMimic-V2:从稀疏和嘈杂的示范中学习鲁棒且可泛化的交互技能)
08:03 🤖 Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning(基于强化学习的LLM自主推理与工具集成)
08:50 🖼 SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing(SuperEdit:修正并促进基于指令的图像编辑的监督)
09:30 🧮 Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities(大语言模型低精度训练:方法、挑战与机遇)
10:11 🎨 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction(Ming-Lite-Uni:自然多模态交互统一架构的进展)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递