2026.04.01 | FIPO用KL引导深度推理;LongCat统一多模态token

2026.04.01 | FIPO用KL引导深度推理;LongCat统一多模态token

12分钟 ·
播放数100
·
评论数0

【赞助商】

通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 15 篇论文如下:

00:30 🧠 FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization(FIPO:通过未来KL影响策略优化引导深度推理)

01:12 🧩 LongCat-Next: Lexicalizing Modalities as Discrete Tokens(LongCat-Next:将多模态信息离散化为标记)

01:48 🚁 CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence(CARLA-Air:在CARLA世界中飞行无人机——面向空地具身智能的统一基础设施)

02:31 🧬 Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells(Lingshu-Cell:一种用于转录组建模的生成式细胞世界模型,迈向虚拟细胞)

03:33 🤖 GEMS: Agent-Native Multimodal Generation with Memory and Skills(GEMS:具备记忆与技能的智能体原生多模态生成框架)

04:12 🎬 VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward(VGGRPO:迈向具有4D潜在奖励的世界一致性视频生成)

05:04 🤖 Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis(Unify-Agent:面向世界接地的图像合成的统一多模态智能体)

05:45 🔬 daVinci-LLM:Towards the Science of Pretraining(daVinci-LLM:迈向预训练的科学)

06:19 🎬 CutClaw: Agentic Hours-Long Video Editing via Music Synchronization(CutClaw:通过音乐同步实现代理式数小时视频编辑)

07:10 🔍 MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models(MonitorBench:大型语言模型中思维链可监控性的综合基准)

07:58 🧬 FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration(FlowPIE:基于流引导文献探索的测试时科学思想演化)

08:46 🏙 Extend3D: Town-Scale 3D Generation(Extend3D:城镇尺度的三维生成)

09:28 💭 Think Anywhere in Code Generation(代码生成中的随处思考)

10:18 ⚙ OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training(OptiMer:最优分布向量合并优于数据混合用于持续预训练)

11:03 🎨 VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing(VectorGym:面向SVG代码生成、绘制与编辑的多任务基准)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递