2025.12.11 | StereoWorld单目秒变立体大片;BiCo跨域拼贴新概念

2025.12.11 | StereoWorld单目秒变立体大片;BiCo跨域拼贴新概念

11分钟 ·
播放数27
·
评论数0

本期的 15 篇论文如下:

00:22 🎥 StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation(StereoWorld:几何感知的单目到立体视频生成)

00:59 🎨 Composing Concepts from Images and Videos via Concept-prompt Binding(通过概念-提示绑定从图像和视频中组合概念)

01:43 🧠 BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain(BrainExplore:人脑中可解释视觉表征的大规模发现)

02:20 🎨 OmniPSD: Layered PSD Generation with Diffusion Transformer(OmniPSD:基于扩散Transformer的分层PSD生成)

03:05 🚀 InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models(InfiniteVL:融合线性与稀疏注意力以实现高效、无限输入的视觉语言模型)

03:47 ⚡ Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules(通过进度感知置信度调度实现扩散语言模型的快速解码)

04:31 🚗 UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving(UniUGP:面向端到端自动驾驶的理解、生成与规划统一框架)

05:06 🧠 EtCon: Edit-then-Consolidate for Reliable Knowledge Editing(EtCon:面向可靠知识编辑的先编辑后巩固范式)

05:56 🤖 HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models(HiF-VLA:通过运动表征实现视觉-语言-动作模型的后见、洞见与先见)

06:46 🔍 WonderZoom: Multi-Scale 3D World Generation(WonderZoom:多尺度三维世界生成)

07:23 🤖 Learning Unmasking Policies for Diffusion Language Models(扩散语言模型的解掩码策略学习)

07:53 🔭 IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting(IF-Bench:基于生成式视觉提示的红外图像多模态大语言模型基准测试与增强)

08:51 ⚡ Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS(超越统一模型:面向服务的低延迟、上下文感知实时TTS音素化方法)

09:31 🎬 VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory(VideoSSM:基于混合状态空间记忆的自回归长视频生成)

10:16 🛡 Pay Less Attention to Function Words for Free Robustness of Vision-Language Models(减少对功能词的关注以免费提升视觉语言模型的鲁棒性)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递