本期的 14 篇论文如下:
00:20 🧠 Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models(学习在四维空间中推理:视觉语言模型的动态空间理解)
01:11 ⚡ TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times(TurboDiffusion:将视频扩散模型加速100-200倍)
01:52 🧭 T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation(T2AV-Compass:迈向文本到音视频生成的统一评估)
02:38 🎬 DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation(DreaMontage:基于任意帧引导的单镜头视频生成)
03:21 🔍 Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models(超越记忆:一个多模态序数回归基准揭示视觉语言模型中的流行度偏差)
04:07 🎬 HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming(HiStream:通过消除冗余的流式处理实现高效高分辨率视频生成)
04:52 🚀 Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning(Nemotron 3 Nano:用于智能体推理的开放、高效混合专家Mamba-Transformer模型)
05:38 🔍 TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior(TokSuite:衡量分词器选择对语言模型行为的影响)
06:12 🚀 NVIDIA Nemotron 3: Efficient and Open Intelligence(NVIDIA Nemotron 3:高效且开放的智能模型)
06:57 🎬 Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations(基于下一帧预测的学习:自回归视频建模编码有效表示)
07:27 🎬 Streaming Video Instruction Tuning(流式视频指令微调)
08:02 🧠 Multi-hop Reasoning via Early Knowledge Alignment(通过早期知识对齐实现多跳推理)
08:43 📊 SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios(SWE-EVO:在长周期软件演化场景中评估编码智能体的基准)
09:24 🏆 LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics(LLM瑞士轮:通过竞争性瑞士制动态聚合多基准性能)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
