本期的 15 篇论文如下:
00:22 📊 V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models(V-ReasonBench:面向视频生成模型的统一推理基准套件)
01:06 🧠 Step-Audio-R1 Technical Report(Step-Audio-R1技术报告)
01:48 🧭 Scaling Spatial Intelligence with Multimodal Foundation Models(通过多模态基础模型扩展空间智能)
02:18 🎬 First Frame Is the Place to Go for Video Content Customization(首帧是实现视频内容定制化的关键所在)
02:49 🎬 Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO(视频即答案:使用联合GRPO预测并生成下一视频事件)
03:29 🔮 SAM 3D: 3Dfy Anything in Images(SAM 3D:图像中任意物体的三维化)
04:03 🚀 MiMo-Embodied: X-Embodied Foundation Model Technical Report(MiMo-Embodied:跨具身基础模型技术报告)
04:38 🧠 Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation(边生成边思考:在视觉生成中交织文本推理)
05:10 🏆 TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval(TurkColBERT:土耳其信息检索中稠密与延迟交互模型的基准研究)
05:53 🌀 Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs(Nemotron Elastic:迈向高效多合一推理大语言模型)
06:26 🚀 SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models(自参考策略优化:面向视觉-语言-动作模型)
07:09 🎬 TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding(TimeViper:一种用于高效长视频理解的混合Mamba-Transformer视觉语言模型)
07:46 🔬 SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking(SAM2S:通过语义长期跟踪实现手术视频中的任意分割)
08:23 🎨 NaTex: Seamless Texture Generation as Latent Color Diffusion(NaTex:作为潜在颜色扩散的无缝纹理生成)
08:58 📐 PartUV: Part-Based UV Unwrapping of 3D Meshes(PartUV:基于部件分割的3D网格UV展开方法)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
