本期的 18 篇论文如下:
00:24 🤖 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks(TheAgentCompany:在具有重要现实意义的任务上对LLM代理进行基准测试)
01:06 🎥 AniDoc: Animation Creation Made Easier(AniDoc:让动画制作更简单)
01:44 👗 FashionComposer: Compositional Fashion Image Generation(时尚组合器:组合式时尚图像生成)
02:28 🤖 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning(高效扩散Transformer策略与专家去噪混合模型在多任务学习中的应用)
03:05 🌐 Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation(提示深度任意模型用于4K分辨率精确度量深度估计)
03:42 🔄 Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN(混合层归一化:通过结合预层归一化和后层归一化释放深层层的潜力)
04:26 🤖 GUI Agents: A Survey(图形用户界面代理:综述)
05:12 🌍 AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities(AnySat:适用于任意分辨率、尺度和模态的地球观测模型)
05:51 📊 RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment(RAG-RewardBench:在检索增强生成中评估奖励模型以实现偏好对齐)
06:40 🧠 LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer(LLaVA-UHD v2:通过分层窗口Transformer集成高分辨率特征金字塔的多模态大语言模型)
07:30 🤖 Learning from Massive Human Videos for Universal Humanoid Pose Control(从大规模人类视频中学习通用拟人姿态控制)
08:05 🤖 ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers(ChatDiT:一种无需训练的任务无关自由形式聊天扩散变换器基线)
08:49 🎥 VidTok: A Versatile and Open-Source Video Tokenizer(VidTok:一种多功能且开源的视频标记器)
09:28 🧠 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces(空间思维:多模态大语言模型如何看、记和回忆空间)
10:13 🔄 CAD-Recode: Reverse Engineering CAD Code from Point Clouds(CAD-Recode:从点云逆向工程CAD代码)
10:54 🤖 AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge(AntiLeak-Bench:通过自动构建基准测试防止数据污染)
11:39 🤖 Alignment faking in large language models(大型语言模型中的对齐伪装)
12:19 ⚡ FastVLM: Efficient Vision Encoding for Vision Language Models(FastVLM:高效视觉编码在视觉语言模型中的应用)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
