2025.01.16 | MMDocIR推动多模态检索标准化，CityDreamer4D创新4D城市生成模型。 - HuggingFace 每日AI论文速递

本期的 9 篇论文如下：

00:25 📊 MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents（MMDocIR：长文档多模态检索的基准测试）

01:06 🏙 CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities（CityDreamer4D：无界4D城市的组合生成模型）

01:49 🎥 RepVideo: Rethinking Cross-Layer Representation for Video Generation（RepVideo：重新思考视频生成中的跨层表示）

02:30 📚 Towards Best Practices for Open Datasets for LLM Training（面向LLM训练的最佳开放数据集实践）

03:11 🎵 XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework（XMusic：迈向通用且可控的符号音乐生成框架）

03:46 🔒 Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography（可信机器学习模型解锁当前密码学无法解决的隐私推理问题）

04:23 🔍 Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding（参数倒置图像金字塔网络用于视觉感知与多模态理解）

05:03 🎨 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot（多模态大语言模型在零样本条件下对美学的推理能力）

05:39 🎥 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion（Ouroboros-Diffusion：探索无调优长视频扩散中的一致内容生成）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递