2025.12.15 | 牙科小模型逆袭;扩散模型弃VAE

2025.12.15 | 牙科小模型逆袭;扩散模型弃VAE

9分钟 ·
播放数119
·
评论数0

本期的 14 篇论文如下:

00:22 🦷 DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry(DentalGPT:激励牙科领域多模态复杂推理)

00:53 🎨 SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder(SVG-T2I:无需变分自编码器即可扩展文本到图像潜在扩散模型)

01:41 🎥 EgoX: Egocentric Video Generation from a Single Exocentric Video(EgoX:从单视角外中心视频生成自我中心视频)

02:26 🎬 V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties(V-RGBX:基于内在属性精确控制的视频编辑)

03:03 🔍 Sliding Window Attention Adaptation(滑动窗口注意力适应)

03:43 🎬 PersonaLive! Expressive Portrait Image Animation for Live Streaming(PersonaLive!面向直播场景的富有表现力的肖像图像动画)

04:10 🎬 Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation(基于跟踪的结构生成:为视频生成提炼结构保持的运动)

04:41 🎨 Exploring MLLM-Diffusion Information Transfer with MetaCanvas(探索MLLM-扩散信息传递与MetaCanvas)

05:18 🔄 MeshSplatting: Differentiable Rendering with Opaque Meshes(MeshSplatting:基于不透明网格的可微分渲染)

06:02 🤖 LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator(LEO-RobotAgent:一种用于语言驱动具身操作的通用机器人智能体)

06:39 ⚡ The N-Body Problem: Parallel Execution from Single-Person Egocentric Video(N体问题:从单人第一人称视频中实现并行执行)

07:11 🧬 CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images(CheXmask-U:X射线图像中基于解剖标志点分割的不确定性量化)

07:52 🏆 Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge(视觉-语言-动作模型的任务适应:2025 BEHAVIOR挑战赛冠军方案)

08:32 🚀 Sharp Monocular View Synthesis in Less Than a Second(一秒钟内实现锐利的单目视图合成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递