【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:32 🧪 Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models(Omni-WorldBench:迈向面向世界模型的全面交互中心化评估)
01:13 🚀 Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model(速度源于简洁:用于快速音视频生成基础模型的单流架构)
01:55 🧠 LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning(LongCat-Flash-Prover:通过智能体工具集成强化学习推进原生形式推理)
02:42 🔍 VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding(VideoDetective:基于外部查询与内部相关性的线索搜寻用于长视频理解)
03:30 🧠 SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning(SpatialBoost:通过语言引导推理增强视觉表征)
04:10 🎯 F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting(F4Splat:用于前馈3D高斯泼溅的前馈预测性致密化)
05:03 🎬 Manifold-Aware Exploration for Reinforcement Learning in Video Generation(面向视频生成的强化学习中的流形感知探索)
05:56 ⚖ mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT(mSFT:解决多任务监督微调中数据集混合的异质过拟合问题)
06:46 🧠 Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection(Group3D:基于多模态大语言模型的语义分组开放词汇3D物体检测)
07:35 🔄 Repurposing Geometric Foundation Models for Multi-view Diffusion(几何基础模型在多视角扩散中的再利用)
08:21 🤖 RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models(RoboAlign:学习视觉-语言-动作模型中语言-动作对齐的测试时推理)
09:15 🔍 OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis(OpenResearcher:一个完全开源的深度研究长程轨迹合成流程)
10:02 💭 BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs(BubbleRAG:面向黑盒知识图谱的证据驱动检索增强生成)
10:54 ⚖ SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models(SEM:用于视觉语言模型事后去偏的稀疏嵌入调制)
11:43 🧭 On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation(论RLVR更新方向对LLM推理的影响:识别与利用)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
