2025.01.03 每日AI论文 | 多模态教科书提升视觉语言模型性能，VideoAnydoor实现高保真视频对象插入 - HuggingFace 每日AI论文速递

本期的 17 篇论文如下：

00:24 📚 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining（2.5年课堂：用于视觉-语言预训练的多模态教科书）

01:02 🎥 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control（VideoAnydoor：高保真视频对象插入与精确运动控制）

01:39 🎥 VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM（VideoRefer套件：通过视频大语言模型推进时空对象理解）

02:13 🏆 CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings（CodeElo：基于人类可比Elo评分的大语言模型竞赛级代码生成基准测试）

02:52 🎨 Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models（重建与生成：潜在扩散模型中的优化困境驯服）

03:29 🤖 ProgCo: Program Helps Self-Correction of Large Language Models（ProgCo：程序助力大语言模型自我修正）

04:03 🗺 MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models（MapEval：基于地图的基础模型地理空间推理能力评估）

04:41 🤖 A3: Android Agent Arena for Mobile GUI Agents（A3：移动GUI代理的安卓代理竞技场）

05:21 🧪 Dynamic Scaling of Unit Tests for Code Reward Modeling（代码奖励建模中单元测试的动态扩展）

05:57 🛡 MLLM-as-a-Judge for Image Safety without Human Labeling（无需人工标注的图像安全MLLM-as-a-Judge方法）

06:40 🎥 LTX-Video: Realtime Video Latent Diffusion（LTX-视频：实时视频潜在扩散模型）

07:15 🗺 MapQaTor: A System for Efficient Annotation of Map Query Datasets（MapQaTor：高效地图查询数据集标注系统）

07:51 🔍 Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing（通过近期性和过度平滑的视角理解并缓解状态空间模型的瓶颈）

08:29 🎥 SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration（SeedVR：在扩散Transformer中播种无限，实现通用视频修复）

09:13 🤖 SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization（SeFAR：基于时间扰动和学习稳定的半监督细粒度动作识别）

09:50 🧠 Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding（重新思考语言模型中的寻址机制：基于上下文等变位置编码）

10:27 📊 Population Aware Diffusion for Time Series Generation（面向时间序列生成的群体感知扩散模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递