本期的 17 篇论文如下:
00:24 📚 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining(2.5年课堂:用于视觉-语言预训练的多模态教科书)
01:02 🎥 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control(VideoAnydoor:高保真视频对象插入与精确运动控制)
01:39 🎥 VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM(VideoRefer套件:通过视频大语言模型推进时空对象理解)
02:13 🏆 CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings(CodeElo:基于人类可比Elo评分的大语言模型竞赛级代码生成基准测试)
02:52 🎨 Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models(重建与生成:潜在扩散模型中的优化困境驯服)
03:29 🤖 ProgCo: Program Helps Self-Correction of Large Language Models(ProgCo:程序助力大语言模型自我修正)
04:03 🗺 MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models(MapEval:基于地图的基础模型地理空间推理能力评估)
04:41 🤖 A3: Android Agent Arena for Mobile GUI Agents(A3:移动GUI代理的安卓代理竞技场)
05:21 🧪 Dynamic Scaling of Unit Tests for Code Reward Modeling(代码奖励建模中单元测试的动态扩展)
05:57 🛡 MLLM-as-a-Judge for Image Safety without Human Labeling(无需人工标注的图像安全MLLM-as-a-Judge方法)
06:40 🎥 LTX-Video: Realtime Video Latent Diffusion(LTX-视频:实时视频潜在扩散模型)
07:15 🗺 MapQaTor: A System for Efficient Annotation of Map Query Datasets(MapQaTor:高效地图查询数据集标注系统)
07:51 🔍 Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing(通过近期性和过度平滑的视角理解并缓解状态空间模型的瓶颈)
08:29 🎥 SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration(SeedVR:在扩散Transformer中播种无限,实现通用视频修复)
09:13 🤖 SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization(SeFAR:基于时间扰动和学习稳定的半监督细粒度动作识别)
09:50 🧠 Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding(重新思考语言模型中的寻址机制:基于上下文等变位置编码)
10:27 📊 Population Aware Diffusion for Time Series Generation(面向时间序列生成的群体感知扩散模型)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
