2025.12.11 | StereoWorld单目秒变立体大片；BiCo跨域拼贴新概念 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:22 🎥 StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation（StereoWorld：几何感知的单目到立体视频生成）

00:59 🎨 Composing Concepts from Images and Videos via Concept-prompt Binding（通过概念-提示绑定从图像和视频中组合概念）

01:43 🧠 BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain（BrainExplore：人脑中可解释视觉表征的大规模发现）

02:20 🎨 OmniPSD: Layered PSD Generation with Diffusion Transformer（OmniPSD：基于扩散Transformer的分层PSD生成）

03:05 🚀 InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models（InfiniteVL：融合线性与稀疏注意力以实现高效、无限输入的视觉语言模型）

03:47 ⚡ Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules（通过进度感知置信度调度实现扩散语言模型的快速解码）

04:31 🚗 UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving（UniUGP：面向端到端自动驾驶的理解、生成与规划统一框架）

05:06 🧠 EtCon: Edit-then-Consolidate for Reliable Knowledge Editing（EtCon：面向可靠知识编辑的先编辑后巩固范式）

05:56 🤖 HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models（HiF-VLA：通过运动表征实现视觉-语言-动作模型的后见、洞见与先见）

06:46 🔍 WonderZoom: Multi-Scale 3D World Generation（WonderZoom：多尺度三维世界生成）

07:23 🤖 Learning Unmasking Policies for Diffusion Language Models（扩散语言模型的解掩码策略学习）

07:53 🔭 IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting（IF-Bench：基于生成式视觉提示的红外图像多模态大语言模型基准测试与增强）

08:51 ⚡ Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS（超越统一模型：面向服务的低延迟、上下文感知实时TTS音素化方法）

09:31 🎬 VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory（VideoSSM：基于混合状态空间记忆的自回归长视频生成）

10:16 🛡 Pay Less Attention to Function Words for Free Robustness of Vision-Language Models（减少对功能词的关注以免费提升视觉语言模型的鲁棒性）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递