2025.12.15 | 牙科小模型逆袭；扩散模型弃VAE - HuggingFace 每日AI论文速递

本期的 14 篇论文如下：

00:22 🦷 DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry（DentalGPT：激励牙科领域多模态复杂推理）

00:53 🎨 SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder（SVG-T2I：无需变分自编码器即可扩展文本到图像潜在扩散模型）

01:41 🎥 EgoX: Egocentric Video Generation from a Single Exocentric Video（EgoX：从单视角外中心视频生成自我中心视频）

02:26 🎬 V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties（V-RGBX：基于内在属性精确控制的视频编辑）

03:03 🔍 Sliding Window Attention Adaptation（滑动窗口注意力适应）

03:43 🎬 PersonaLive! Expressive Portrait Image Animation for Live Streaming（PersonaLive！面向直播场景的富有表现力的肖像图像动画）

04:10 🎬 Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation（基于跟踪的结构生成：为视频生成提炼结构保持的运动）

04:41 🎨 Exploring MLLM-Diffusion Information Transfer with MetaCanvas（探索MLLM-扩散信息传递与MetaCanvas）

05:18 🔄 MeshSplatting: Differentiable Rendering with Opaque Meshes（MeshSplatting：基于不透明网格的可微分渲染）

06:02 🤖 LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator（LEO-RobotAgent：一种用于语言驱动具身操作的通用机器人智能体）

06:39 ⚡ The N-Body Problem: Parallel Execution from Single-Person Egocentric Video（N体问题：从单人第一人称视频中实现并行执行）

07:11 🧬 CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images（CheXmask-U：X射线图像中基于解剖标志点分割的不确定性量化）

07:52 🏆 Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge（视觉-语言-动作模型的任务适应：2025 BEHAVIOR挑战赛冠军方案）

08:32 🚀 Sharp Monocular View Synthesis in Less Than a Second（一秒钟内实现锐利的单目视图合成）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递