2026.04.09 | RL智能体模板病;分步生图更可控

2026.04.09 | RL智能体模板病;分步生图更可控

12分钟 ·
播放数130
·
评论数0

【赞助商】

通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 15 篇论文如下:

00:31 🧠 RAGEN-2: Reasoning Collapse in Agentic RL(RAGEN-2:智能体强化学习中的推理崩溃)

01:21 🎨 Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning(以笔画思考,而非像素:通过交错推理实现过程驱动的图像生成)

02:00 ⚡ MARS: Enabling Autoregressive Models Multi-Token Generation(MARS:实现自回归模型的多令牌生成)

02:51 🌍 INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling(INSPATIO-WORLD:基于时空自回归建模的实时4D世界模拟器)

03:48 🔬 SEVerA: Verified Synthesis of Self-Evolving Agents(SEVerA:可验证自进化智能体的合成)

04:41 🔍 TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders(TC-AE:解锁深度压缩自编码器的令牌容量)

05:26 ⚡ FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling(FP4探索,BF16训练:通过高效扩展rollout的扩散模型强化学习)

06:17 🔄 FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching(FlowInOne:将多模态生成统一为图像输入-图像输出的流匹配)

07:00 🧠 Neural Computers(神经计算机)

07:37 🎯 Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization(个性化奖励模型基准:基于人类对齐个性化的奖励模型评估)

08:22 💡 Learning to Hint for Reinforcement Learning(强化学习的提示学习)

09:11 🧠 Fast Spatial Memory with Elastic Test-Time Training(基于弹性测试时训练的高速空间记忆)

09:44 🎬 MoRight: Motion Control Done Right(MoRight:正确的运动控制)

10:21 🌐 Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment(通过跨语言对齐提升信息检索中的语义邻近性)

11:02 📊 Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval(超越困难负样本:知识蒸馏中分数分布对稠密检索的重要性)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递