本期的 15 篇论文如下:
00:22 🔍 Multimodal Evaluation of Russian-language Architectures(俄语多模态架构的评估框架)
01:15 🧠 Latent Collaboration in Multi-Agent Systems(多智能体系统中的潜在协作)
01:47 🌍 Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation(Inferix:基于块扩散的新一代世界模拟推理引擎)
02:18 🎭 Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy(和谐:通过跨任务协同实现音频与视频生成的统一)
03:10 📄 NVIDIA Nemotron Parse 1.1(英伟达Nemotron解析1.1)
03:46 🧠 Monet: Reasoning in Latent Visual Space Beyond Images and Language(Monet:超越图像与语言的潜在视觉空间推理)
04:25 ⚡ Terminal Velocity Matching(终端速度匹配)
05:03 📊 Revisiting Generalization Across Difficulty Levels: It's Not So Easy(重新审视跨难度级别的泛化能力:并非易事)
05:42 🤖 MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots(MobileVLA-R1:强化移动机器人的视觉-语言-动作能力)
06:25 ⚡ Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs(基于轨迹采样对的连续时间一致性图像自由时间步蒸馏)
06:59 🎮 UniGame: Turning a Unified Multimodal Model Into Its Own Adversary(UniGame:将统一多模态模型转化为其自身的对抗者)
07:47 🧩 SPHINX: A Synthetic Environment for Visual Perception and Reasoning(SPHINX:用于视觉感知与推理的合成环境)
08:33 ⚡ Block Cascading: Training Free Acceleration of Block-Causal Video Models(块级联:免训练的块因果视频模型加速)
09:12 🏙 RAISECity: A Multimodal Agent Framework for Reality-Aligned 3D World Generation at City-Scale(RAISECity:面向城市尺度的现实对齐三维世界生成多模态智能体框架)
09:58 📊 I-GLIDE: Input Groups for Latent Health Indicators in Degradation Estimation(I-GLIDE:基于输入组的退化估计潜在健康指标)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
