2024.12.19 每日AI论文 | AI代理任务表现有限，动画制作效率提升。 - HuggingFace 每日AI论文速递

本期的 18 篇论文如下：

00:24 🤖 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks（TheAgentCompany：在具有重要现实意义的任务上对LLM代理进行基准测试）

01:06 🎥 AniDoc: Animation Creation Made Easier（AniDoc：让动画制作更简单）

01:44 👗 FashionComposer: Compositional Fashion Image Generation（时尚组合器：组合式时尚图像生成）

02:28 🤖 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning（高效扩散Transformer策略与专家去噪混合模型在多任务学习中的应用）

03:05 🌐 Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation（提示深度任意模型用于4K分辨率精确度量深度估计）

03:42 🔄 Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN（混合层归一化：通过结合预层归一化和后层归一化释放深层层的潜力）

04:26 🤖 GUI Agents: A Survey（图形用户界面代理：综述）

05:12 🌍 AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities（AnySat：适用于任意分辨率、尺度和模态的地球观测模型）

05:51 📊 RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment（RAG-RewardBench：在检索增强生成中评估奖励模型以实现偏好对齐）

06:40 🧠 LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer（LLaVA-UHD v2：通过分层窗口Transformer集成高分辨率特征金字塔的多模态大语言模型）

07:30 🤖 Learning from Massive Human Videos for Universal Humanoid Pose Control（从大规模人类视频中学习通用拟人姿态控制）

08:05 🤖 ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers（ChatDiT：一种无需训练的任务无关自由形式聊天扩散变换器基线）

08:49 🎥 VidTok: A Versatile and Open-Source Video Tokenizer（VidTok：一种多功能且开源的视频标记器）

09:28 🧠 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces（空间思维：多模态大语言模型如何看、记和回忆空间）

10:13 🔄 CAD-Recode: Reverse Engineering CAD Code from Point Clouds（CAD-Recode：从点云逆向工程CAD代码）

10:54 🤖 AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge（AntiLeak-Bench：通过自动构建基准测试防止数据污染）

11:39 🤖 Alignment faking in large language models（大型语言模型中的对齐伪装）

12:19 ⚡ FastVLM: Efficient Vision Encoding for Vision Language Models（FastVLM：高效视觉编码在视觉语言模型中的应用）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递