

2026.02.19 | 可学习路由+量化加速视频扩散;残差追踪让人形90%抓取【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 14 篇论文如下: [00:30] ⚡ SLA2: Sparse-Linear Attention with Learnable Routing and QAT(SLA2:具有可学习路由和量化感知训练的稀疏线性注意力) [01:16] 🤖 Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation(面向开放词汇视觉移动操作的人形机器人末端执行器控制学习) [02:02] 🧠 RynnBrain: Open Embodied Foundation Models(RynnBrain:开放式具身基础模型) [02:46] 🔑 Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality(空书架还是丢钥匙?回忆是参数化事实性的瓶颈) [03:33] 🕺 SAM 3D Body: Robust Full-Body Human Mesh Recovery(SAM 3D 人体:鲁棒的全身体三维人体网格重建) [04:41] 🤝 Multi-agent cooperation through in-context co-player inference(通过上下文共玩家推断实现多智能体合作) [05:28] 📊 MAEB: Massive Audio Embedding Benchmark(MAEB:大规模音频嵌入基准测试) [06:04] 🤖 World Action Models are Zero-shot Policies(世界行动模型是零样本策略) [06:44] 🔬 Towards a Science of AI Agent Reliability(迈向AI智能体可靠性的科学) [07:20] 🧠 MMA: Multimodal Memory Agent(MMA:多模态记忆智能体) [08:09] 🚀 Optimizing Few-Step Generation with Adaptive Matching Distillation(通过自适应匹配蒸馏优化少步生成) [08:56] 🧭 Learning Situated Awareness in the Real World(在现实世界中学习情境感知) [09:28] ⚠ Visual Memory Injection Attacks for Multi-Turn Conversations(面向多轮对话的视觉记忆注入攻击) [10:10] 🤖 BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models(BiManiBench:用于评估多模态大语言模型双手协调能力的层次化基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.18 | GLM-5智能体工程登顶50分;SAE可解释性遭随机基线打脸【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:31] 🤖 GLM-5: from Vibe Coding to Agentic Engineering(GLM-5:从氛围编码到智能体工程) [01:11] 🔍 Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?(稀疏自编码器的合理性检验:SAE是否优于随机基线?) [01:57] 🤖 Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook(人工智能代理社会是否会出现社会化现象?以Moltbook为例的研究) [02:41] 🧪 ResearchGym: Evaluating Language Model Agents on Real-World AI Research(ResearchGym:在真实世界AI研究上评估语言模型智能体) [03:54] 🧠 UniT: Unified Multimodal Chain-of-Thought Test-time Scaling(UniT:统一多模态思维链测试时扩展) [04:50] ⚙ COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression(COMPOT:面向Transformer压缩的校准优化矩阵正交Procrustes方法) [05:38] 🧠 Revisiting the Platonic Representation Hypothesis: An Aristotelian View(重访柏拉图式表征假说:一种亚里士多德式的观点) [06:23] ⚖ Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models(理解与生成:多模态模型中的优化困境探索) [07:11] 🎭 On Surprising Effectiveness of Masking Updates in Adaptive Optimizers(论掩码更新在自适应优化器中的惊人有效性) [07:56] ⚕ ClinAlign: Scaling Healthcare Alignment from Clinician Preference(ClinAlign:基于临床医生偏好的医疗对齐扩展) [08:51] ⚖ STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens(STAPO:通过抑制罕见伪标记来稳定大语言模型的强化学习) [09:37] 🔍 Visual Persuasion: What Influences Decisions of Vision-Language Models?(视觉说服:什么影响了视觉-语言模型的决策?) [10:32] ⚡ Learning Native Continuation for Action Chunking Flow Policies(学习原生连续性以实现动作分块流策略) [11:19] 🎥 Geometry-Aware Rotary Position Embedding for Consistent Video World Model(面向一致视频世界模型的几何感知旋转位置嵌入) [12:07] 🧠 TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models(TAROT:基于测试驱动和能力自适应课程强化微调的大语言模型代码生成方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.17 | 查询锚定用户画像;量子原生数据库【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🧠 Query as Anchor: Scenario-Adaptive User Representation via Large Language Model(查询作为锚点:基于大型语言模型的场景自适应用户表征) [01:14] ⚛ Qute: Towards Quantum-Native Database(Qute:迈向量子原生数据库) [01:59] 🧠 InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem(InnoEval:将研究思想评估视为知识驱动、多视角推理问题) [03:05] 🔍 REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents(REDSearcher:一种可扩展且经济高效的长视野搜索智能体框架) [03:56] 🚀 BitDance: Scaling Autoregressive Generative Models with Binary Tokens(BitDance:使用二进制令牌扩展自回归生成模型) [04:38] 🧠 Experiential Reinforcement Learning(经验性强化学习) [05:24] 🧠 Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings(Embed-RL:基于强化学习的推理驱动多模态嵌入方法) [06:21] 🧩 UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model(UniWeTok:一种用于统一多模态大语言模型的、具有$\mathit{2^{128}}$码本大小的统一二进制分词器) [07:13] 🔍 BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents(BrowseComp-V³:面向多模态浏览代理的视觉、垂直与可验证基准) [08:18] 🧠 LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models(LaViDa-R1:推进统一多模态扩散语言模型的推理能力) [09:02] 🗣 Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision(对话式图像分割:通过可扩展监督将抽象概念落地) [10:00] 🧠 Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts(Nanbeige4.1-3B:一个能够推理、对齐与行动的小型通用模型) [10:49] 🎨 FireRed-Image-Edit-1.0 Techinical Report(FireRed-图像编辑-1.0 技术报告) [11:26] 🧬 Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training(数据达尔文主义第一部分:释放科学数据在预训练中的价值) [12:04] 🌐 WebWorld: A Large-Scale World Model for Web Agent Training(WebWorld:用于网络智能体训练的大规模世界模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.16 | 特征激活补数据;区域蒸馏藏放大【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:30] 🧠 Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs(少即是够:在大型语言模型特征空间中合成多样化数据) [01:19] 🔍 Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception(无需缩放:面向细粒度多模态感知的区域到图像蒸馏) [02:03] 🏥 MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs(MedXIAOHE:构建医疗多模态大语言模型的完整方案) [02:43] 🎯 OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence(OneVision-编码器:以编解码器对齐的稀疏性作为多模态智能的基础原则) [03:29] 🔬 What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis(强化学习对视觉推理有何改进?一项弗兰肯斯坦式分析) [04:18] 🤖 RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models(RLinf-Co:基于强化学习的仿真-现实协同训练VLA模型) [05:05] 🤖 ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning(ABot-M0:基于动作流形学习的机器人操作VLA基础模型) [05:53] 🎬 Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions(迈向具有属性结构和质量验证指令的通用视频多模态大语言模型) [06:55] 🤝 Intelligent AI Delegation(智能AI委托框架) [07:49] 📍 GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics(GeoAgent:通过强化地理特征学习实现无处不在的地理定位) [08:39] ⚙ BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models(BPDQ:基于可变网格的比特平面分解量化用于大语言模型) [09:37] ⚡ FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching(FLAC:通过动能正则化桥匹配实现最大熵强化学习) [10:14] 🔍 On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs(关于RL微调视觉语言模型的鲁棒性与思维链一致性研究) [11:03] ⚡ DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels(DICE:扩散大语言模型在生成CUDA内核方面表现出色) [11:48] ⚡ CoPE-VideoLM: Codec Primitives For Efficient Video Language Models(CoPE-VideoLM:面向高效视频语言模型的编解码器原语) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
【周末特辑】2月第3周最火AI论文 | OPUS精准选数据;弱模型反向助攻强模型【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 5 篇论文如下: [00:52] TOP1(🔥305) | 🚀 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration(OPUS:迈向大规模语言模型预训练中高效且原理化的逐轮数据选择) [02:42] TOP2(🔥250) | 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger(弱驱动学习:弱智能体如何使强智能体更强) [04:59] TOP3(🔥186) | 💻 Code2World: A GUI World Model via Renderable Code Generation(Code2World:通过可渲染代码生成的GUI世界模型) [07:19] TOP4(🔥179) | 📈 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining(QuantaAlpha:一种基于大语言模型驱动的阿尔法挖掘进化框架) [10:02] TOP5(🔥172) | ⚡ Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters(Step 3.5 Flash:拥有110亿活跃参数的前沿级智能模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.13 | 自演化AI难守安全;音频大模型统一token【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:31] ⚠ The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies(魔书背后的魔鬼:在自我进化的AI社会中,人类安全价值总是趋于消失) [01:24] 🎵 MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models(MOSS-Audio-Tokenizer:为未来音频基础模型扩展音频分词器) [02:28] 🧠 Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation(超越教师的学习:基于奖励外推的广义策略蒸馏) [03:05] 🤖 GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning(GigaBrain-0.5M*:一种通过世界模型强化学习训练的视觉-语言-动作模型) [03:56] ⚖ LawThinker: A Deep Research Legal Agent in Dynamic Environments(LawThinker:动态环境中的深度研究法律智能体) [04:33] 🔍 Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning(思之愈久,探之愈深:通过长度激励强化学习实现上下文内探索) [05:16] 🎨 Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching(惊喜之笔:矢量草图绘制中的渐进式语义错觉) [06:01] 🚀 DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing(DeepGen 1.0:一个用于推进图像生成与编辑的轻量级统一多模态模型) [06:55] 🧩 Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models(Composition-RL:为大型语言模型强化学习组合可验证提示) [07:38] 🧠 Thinking with Drafting: Optical Decompression via Logical Reconstruction(思维与草稿:通过逻辑重构实现光学解压缩) [08:17] 🗳 dVoting: Fast Voting for dLLMs(dVoting:面向扩散大语言模型的快速投票推理方法) [09:09] 🤖 RISE: Self-Improving Robot Policy with Compositional World Model(RISE:基于组合世界模型的机器人策略自改进框架) [09:54] 🤖 $χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies(χ₀:通过驯服分布不一致实现资源感知的鲁棒机器人操作) [10:48] 🤖 EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration(EgoHumanoid:利用无机器人自我中心演示解锁野外移动操作) [11:45] 🔍 Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation(揭示隐式优势对称性:为何GRPO在探索与难度适应中举步维艰) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.12 | 稀疏MoE比肩GPT-5;GENIUS测流体智能【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:28] ⚡ Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters(Step 3.5 Flash:拥有110亿活跃参数的前沿级智能模型) [01:06] 🧠 GENIUS: Generative Fluid Intelligence Evaluation Suite(GENIUS:生成式流体智能评估套件) [01:46] 🤖 PhyCritic: Multimodal Critic Models for Physical AI(PhyCritic:面向物理人工智能的多模态评判模型) [02:18] ⚙ ASA: Training-Free Representation Engineering for Tool-Calling Agents(ASA:面向工具调用智能体的免训练表征工程) [02:59] 🧠 When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning(何时记忆与何时停止:用于长上下文推理的门控循环记忆) [03:38] 🧮 Towards Autonomous Mathematics Research(迈向自主数学研究) [04:15] 🎬 TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions(TimeChat-Captioner:基于时间感知与结构化音视频描述的多场景视频脚本生成) [05:12] 🧠 G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design(G-LNS:基于大语言模型的生成式大邻域搜索自动启发式设计) [06:02] ⚙ FeatureBench: Benchmarking Agentic Coding for Complex Feature Development(FeatureBench:面向复杂功能开发的智能体编码基准测试) [06:44] 🧑 DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning(DataChef:通过强化学习为LLM适应烹饪最优数据配方) [07:28] 🚀 ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression(ROCKET:基于校准引导的背包增强截断的快速优化,用于高效模型压缩) [08:27] 📈 Online Causal Kalman Filtering for Stable and Effective Policy Optimization(在线因果卡尔曼滤波用于稳定有效的策略优化) [09:24] 🧠 Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models(将元经验内化至记忆以指导大语言模型的强化学习) [10:06] 🗣 Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models(Ex-Omni:赋能全模态大语言模型生成3D面部动画) [10:47] 🔄 Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning(在长链思维监督微调中,数据重复优于数据扩展) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.11 | OPUS对齐更新选数据;Code2World代码预演GUI【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:33] 🚀 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration(OPUS:迈向大规模语言模型预训练中高效且原理化的逐轮数据选择) [01:17] 💻 Code2World: A GUI World Model via Renderable Code Generation(Code2World:通过可渲染代码生成的GUI世界模型) [02:05] 🤖 UI-Venus-1.5 Technical Report(UI-Venus-1.5 技术报告) [02:58] 🧠 Chain of Mindset: Reasoning with Adaptive Cognitive Modes(思维链模式:基于自适应认知模式的推理) [03:52] 🧠 SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning(SkillRL:通过递归技能增强强化学习进化智能体) [04:29] 🔬 P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads(P1-VL:连接视觉感知与物理奥赛中的科学推理) [05:24] 🤖 Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning(智能体世界模型:面向智能体强化学习的无限合成环境) [05:58] 🔍 Prism: Spectral-Aware Block-Sparse Attention(Prism:基于频谱感知的块稀疏注意力机制) [06:41] ⚡ DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents(DLLM-Searcher:适配扩散大语言模型用于搜索智能体) [07:23] 🎬 Olaf-World: Orienting Latent Actions for Video World Modeling(Olaf-World:面向视频世界建模的潜在动作定向) [08:18] 🎨 Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss(基于扩散损失的图像自回归生成中的条件误差优化) [09:09] 🍌 Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling(智能体香蕉:基于智能体思维与工具的高保真图像编辑) [09:50] 🎯 SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models(SCALE:基于自不确定度条件化的自适应视觉感知与执行视觉-语言-动作模型) [10:37] 🤖 BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation(BagelVLA:通过交错式视觉-语言-动作生成增强长视野操作) [11:31] 🎬 TokenTrim: Inference-Time Token Pruning for Autoregressive Long Video Generation(TokenTrim:用于自回归长视频生成的推理时令牌剪枝) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.10 | ReAlign零训弥合图文隙;MOVA同步生成视音频【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:34] 🔀 Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models(面向多模态大语言模型的模态间隙驱动的子空间对齐训练范式) [01:23] 🎬 MOVA: Towards Scalable and Synchronized Video-Audio Generation(MOVA:迈向可扩展且同步的视频-音频生成) [02:03] 📈 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining(QuantaAlpha:一种基于大语言模型驱动的阿尔法挖掘进化框架) [02:51] 🤖 Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning(循环深度视觉语言动作模型:通过潜在迭代推理实现隐式测试时计算扩展) [03:24] 🎯 Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO(通过建模逐步与长期采样效应缓解流式GRPO中的稀疏奖励问题) [04:22] ⚡ LLaDA2.1: Speeding Up Text Diffusion via Token Editing(LLaDA2.1:通过令牌编辑加速文本扩散) [05:02] 📱 GEBench: Benchmarking Image Generation Models as GUI Environments(GEBench:将图像生成模型作为GUI环境的基准测试) [05:52] 🎬 Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition(Demo-ICL:面向过程性视频知识获取的上下文学习) [06:42] 🧠 Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory(学习查询感知的预算层级路由以实现运行时智能体记忆) [07:20] 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger(弱驱动学习:弱智能体如何使强智能体更强) [08:12] 📊 LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth(LOCA-bench:在可控与极端上下文增长下对语言智能体进行基准测试) [08:59] 🔍 GISA: A Benchmark for General Information-Seeking Assistant(GISA:通用信息寻求助手基准) [09:56] 🧭 WorldCompass: Reinforcement Learning for Long-Horizon World Models(WorldCompass:面向长视野世界模型的强化学习) [10:35] 🧪 LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning(LatentChem:从文本思维链到化学推理中的潜在思维) [11:20] 🧭 Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?(空间理论:基础模型能否通过主动探索构建空间信念?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.09 | AI问诊如住院医;互动悟规则才是真智能【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🩺 Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making(Baichuan-M3:建模临床问询以实现可靠的医疗决策) [01:17] 🧭 OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions(奥德赛竞技场:面向长视野、主动与归纳交互的大语言模型基准测试) [02:03] 📈 On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models(论大型语言模型强化微调中的熵动态) [02:47] 🎯 F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare(F-GRPO:别让你的策略学会常见而遗忘罕见) [03:48] ⚖ MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration(MSign:一种通过稳定秩恢复防止大语言模型训练不稳定的优化器) [04:33] 🤖 DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos(DreamDojo:基于大规模人类视频的通用机器人世界模型) [05:14] 🧠 Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training(通过翻译-推理集成训练实现自我改进的多语言长推理) [06:07] 🧮 Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math(评判我们无法解决的问题:一种基于后果的无监督研究级数学评估方法) [06:46] 🎯 POINTS-GUI-G: GUI-Grounding Journey(POINTS-GUI-G:图形用户界面基础任务之旅) [07:45] 🧠 MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments(MemGUI-Bench:动态环境中移动GUI代理内存能力的基准测试) [08:29] 🧠 Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities(回归基础:通过生成概率重新审视强化学习在LLM推理中的探索) [09:18] 🎵 AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders(AudioSAE:利用稀疏自编码器理解音频处理模型) [09:59] ⚡ Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers(Canzona:一个统一、异步且负载均衡的分布式矩阵优化器框架) [11:02] 🧠 InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning(InftyThink+:通过强化学习实现高效且有效的无限视野推理) [11:49] 🧠 PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks(PlanViz:面向计算机使用任务的规划导向图像生成与编辑评估) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
【周末特辑】2月第2周最火AI论文 | 分阶段统一动作空间;ERNIE 5.0大一统多模态【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 5 篇论文如下: [00:48] TOP1(🔥235) | 🤖 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots(Green-VLA:面向通用机器人的分阶段视觉-语言-动作模型) [02:54] TOP2(🔥235) | 🧠 ERNIE 5.0 Technical Report(ERNIE 5.0 技术报告) [05:14] TOP3(🔥206) | 🤖 Kimi K2.5: Visual Agentic Intelligence(Kimi K2.5:视觉智能体) [07:49] TOP4(🔥147) | 🔍 Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models(Vision-DeepResearch:激励多模态大语言模型中的深度研究能力) [10:28] TOP5(🔥137) | 🍌 PaperBanana: Automating Academic Illustration for AI Scientists(PaperBanana:面向AI科学家的学术插图自动化生成框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.06 | RLVR去长度偏见;长镜头不换记忆【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 📊 Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR(长度无偏序列策略优化:揭示与控制RLVR中的响应长度变化) [01:20] 🎬 Context Forcing: Consistent Autoregressive Video Generation with Long Context(上下文强制:具有长上下文的一致自回归视频生成) [02:11] 🧠 RISE-Video: Can Video Generators Decode Implicit World Rules?(RISE-Video:视频生成器能否解码隐含的世界规则?) [02:57] 🔮 ProAct: Agentic Lookahead in Interactive Environments(ProAct:交互式环境中的前瞻性智能体规划) [03:47] ⚡ Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations(Dr. Kernel:用于Triton内核生成的强化学习正确实现) [04:39] 🧭 Steering LLMs via Scalable Interactive Oversight(通过可扩展的交互式监督引导大型语言模型) [05:27] 🧠 Grounding and Enhancing Informativeness and Utility in Dataset Distillation(数据集约简中信息性与实用性的基础与增强) [06:13] 🧪 Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities(检索增强推理沙盒:一个解耦检索与推理能力的基准) [07:07] 🔍 Semantic Search over 9 Million Mathematical Theorems(对超过900万个数学定理的语义搜索) [07:57] 🕷 Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening(Spider-Sense:基于内在风险感知的高效智能体防御与分层自适应筛查) [08:39] 🧪 CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty(CAR-bench:评估现实世界不确定性下LLM智能体的一致性与极限感知能力) [09:30] 🤖 InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions(InterPrior:基于物理的人-物交互生成控制扩展框架) [10:22] 🎬 Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning(帧中思考:视觉上下文与测试时缩放如何赋能视频推理) [11:14] 🔄 SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs(SwimBird:在混合自回归多模态大语言模型中引发可切换推理模式) [12:20] 🔍 SAGE: Benchmarking and Improving Retrieval for Deep Research Agents(SAGE:深度研究智能体的检索基准评测与性能提升) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.05 | ERNIE 5.0统一模态;FASA稀疏注意力省内存【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🧠 ERNIE 5.0 Technical Report(ERNIE 5.0 技术报告) [01:11] ⚡ FASA: Frequency-aware Sparse Attention(FASA:基于频率感知的稀疏注意力机制) [02:01] 📊 Training Data Efficiency in Multimodal Process Reward Models(多模态过程奖励模型中的训练数据效率研究) [02:44] 🤖 WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning(WideSeek-R1:通过多智能体强化学习探索宽度扩展以实现广泛信息检索) [03:28] ⚡ OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models(OmniSIFT:面向高效全模态大语言模型的模态非对称令牌压缩) [04:21] ⚡ HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing(HySparse:一种具有预言机令牌选择和KV缓存共享的混合稀疏注意力架构) [05:02] 🤖 EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models(EgoActor:通过视觉语言模型将任务规划落地为空间感知的具身动作) [06:05] 🎬 Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization(Quant VideoGen:通过2位KV缓存量化实现自回归长视频生成) [06:59] 🤖 SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation(SoMA:面向机器人软体操作的真实到仿真神经模拟器) [07:44] 🔍 TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents(TIDE:基于轨迹的LLM智能体测试时改进诊断评估) [08:21] 🧠 Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers(语义路由:探索扩散变换器中多层LLM特征加权的融合框架) [09:12] 🤖 Rethinking the Trust Region in LLM Reinforcement Learning(重新思考大语言模型强化学习中的信任区域) [09:54] ♻ Residual Context Diffusion Language Models(残差上下文扩散语言模型) [10:40] 🧱 HY3D-Bench: Generation of 3D Assets(HY3D-Bench:3D资产的生成) [11:34] 🎨 AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations(AutoFigure:生成与优化可直接用于发表的科学插图) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.04 | 看图写代码省token;临时组队降成本【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 👁 CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding(CodeOCR:视觉语言模型在代码理解中的有效性研究) [01:18] 🤖 AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration(AOrchestra:面向智能体编排的子智能体自动创建) [02:01] 🔍 No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs(思维链中无全局规划:揭示大语言模型的潜在规划视野) [02:43] 🔗 daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently(daVinci-Agency:高效解锁长程智能体工作流) [03:23] 🧠 Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks(世界模型研究并非仅将世界知识注入特定任务) [04:06] 🎬 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation(面向视角自适应人体视频生成的3D感知隐式运动控制) [04:56] 🤖 MARS: Modular Agent with Reflective Search for Automated AI Research(MARS:具备反思搜索能力的模块化智能体用于自动化人工智能研究) [05:41] 📊 CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs(CoBA-RL:面向大语言模型强化学习的基于能力的预算分配算法) [06:25] ⚡ Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis(保持多样性的分布匹配蒸馏用于快速视觉合成) [07:19] 🤖 SWE-World: Building Software Engineering Agents in Docker-Free Environments(SWE-World:在无Docker环境中构建软件工程智能体) [08:09] 🤖 SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training(SWE-Master:通过后训练释放软件工程智能体的潜力) [09:14] 📊 Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation(基于人类偏好的查询特定评分规则学习用于深度研究报告生成) [10:08] ⚡ Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing(Parallel-Probe:通过二维探测实现高效并行思维) [10:59] 🎯 Unified Personalized Reward Model for Vision Generation(视觉生成的统一个性化奖励模型) [11:47] 🔍 WideSeek: Advancing Wide Research via Multi-Agent Scaling(WideSeek:通过多智能体扩展推进广度研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
2026.02.03 | 分阶段训练统一动作空间;MoE+视觉编码器并行智能体【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🤖 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots(Green-VLA:面向通用机器人的分阶段视觉-语言-动作模型) [01:24] 🤖 Kimi K2.5: Visual Agentic Intelligence(Kimi K2.5:视觉智能体) [02:09] 🔍 Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models(Vision-DeepResearch:激励多模态大语言模型中的深度研究能力) [03:08] 🔍 Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models(Vision-DeepResearch 基准:重新思考多模态大语言模型的视觉与文本搜索) [03:57] 🔄 Closing the Loop: Universal Repository Representation with RPG-Encoder(闭环:基于RPG-Encoder的通用代码仓库表示方法) [04:39] 🧠 UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing(UniReason 1.0:面向世界知识对齐图像生成与编辑的统一推理框架) [05:23] 📊 WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora(WildGraphBench:基于野生来源语料库的图检索增强生成基准测试) [06:28] 📚 FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents(FS-Researcher:基于文件系统的智能体在长周期研究任务中的测试时扩展) [07:23] 🚀 SWE-Universe: Scale Real-World Verifiable Environments to Millions(SWE-Universe:将真实世界可验证的软件工程环境扩展至百万规模) [08:13] 📚 Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles(维基实时挑战:用专家级维基百科文章挑战深度研究智能体) [08:58] ⚖ SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization(SLIME:基于稳定似然的隐式边界强化偏好优化) [09:45] 🎨 PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss(PixelGen:基于感知损失的像素扩散模型超越潜在扩散模型) [10:38] ⚙ RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System(RLAnything:在完全动态强化学习系统中锻造环境、策略与奖励模型) [11:30] 🧠 Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation(思维画笔:将智能认知搜索与推理融入图像生成) [12:17] 🎬 PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards(PISCES:基于最优传输对齐奖励的无标注文本到视频后训练方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递