2025.11.07 | 视频推理新范式；图像互动促思维 - HuggingFace 每日AI论文速递

本期的 12 篇论文如下：

00:21 🎬 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm（用视频思考：视频生成作为统一多模态推理新范式）

00:58 🧠 V-Thinker: Interactive Thinking with Images（V-Thinker：与图像互动的思维推理）

01:39 🧠 Scaling Agent Learning via Experience Synthesis（基于经验合成的智能体规模化强化学习）

02:23 🧠 Cambrian-S: Towards Spatial Supersensing in Video（Cambrian-S：迈向视频中的空间超感）

03:06 🖥 GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents（GUI-360°：面向计算机使用智能体的大规模综合数据集与评测基准）

03:51 📄 NVIDIA Nemotron Nano V2 VL（NVIDIA Nemotron Nano V2 VL：面向文档与长视频理解的高效视觉语言模型）

04:28 🎟 The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms（多头注意力机制的强彩票假设）

05:12 🕵 Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts（基准设计者应“在测试集上训练”以暴露可利用的非视觉捷径）

05:48 ⚽ Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots（人形机器人视觉驱动反应式足球技能学习）

06:18 🔍 Contamination Detection for VLMs using Multi-Modal Semantic Perturbation（基于多模态语义扰动的视觉语言模型污染检测）

06:53 🎧 How to Evaluate Speech Translation with Source-Aware Neural MT Metrics（如何借助源语言感知的神经机器翻译指标评估语音翻译）

07:32 🚀 RDMA Point-to-Point Communication for LLM Systems（面向LLM系统的RDMA点对点通信）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递