2026.05.21 | Mega-ASR降噪减幻觉；Video2GUI数据预训练提效 - HuggingFace 每日AI论文速递

【目录】
本期的 15 篇论文如下：
[00:23] 🎤 Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation（Mega-ASR：通过扩展真实世界声学模拟实现野外环境语音识别）
[01:22] 🎬 Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining（Video2GUI：合成大规模交互轨迹以实现通用型GUI代理预训练）
[02:11] 🎬 Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos（增强无训练无限帧生成以实现一致的长视频）
[03:04] 🚀 You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories（你仅需极简的RLVR训练：通过秩-1轨迹外推大语言模型）
[03:50] 🗜 OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond（OScaR：面向大语言模型及更广领域的极致KV缓存量化的奥卡姆剃刀）
[04:39] 🔧 IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools（IndusAgent：利用智能工具增强开放词汇工业异常检测）
[05:36] 🔊 A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook（大型音频语言模型综述：泛化、可信度与展望）
[06:35] 🤝 It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs（双管齐下：面向大语言模型语境完整性的互补式自蒸馏框架）
[07:26] 📈 Toto 2.0: Time Series Forecasting Enters the Scaling Era（Toto 2.0：时间序列预测进入规模化时代）
[08:20] ⚡ Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs（混合量化：面向智能体大语言模型的量化预填充与精确解码）
[09:25] 🧠 Generative Recursive Reasoning（生成式递归推理）
[10:29] 🎬 CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing（CutVerse：面向媒体后期制作编辑的组合式GUI智能体基准）
[11:22] 🖼 Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning（Uni-Edit：智能编辑作为统一模型调优的通用任务）
[12:08] 🧠 LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening（LLMEval-Logic：一个求解器验证的中文逻辑推理基准测试，具备对抗性强化）
[13:07] ⚡ HRM-Text: Efficient Pretraining Beyond Scaling（HRM-Text：超越规模的高效预训练）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递