2026.05.21 | Mega-ASR降噪减幻觉;Video2GUI数据预训练提效

2026.05.21 | Mega-ASR降噪减幻觉;Video2GUI数据预训练提效

14分钟 ·
播放数49
·
评论数0

【目录】
本期的 15 篇论文如下:
[00:23] 🎤 Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation(Mega-ASR:通过扩展真实世界声学模拟实现野外环境语音识别)
[01:22] 🎬 Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining(Video2GUI:合成大规模交互轨迹以实现通用型GUI代理预训练)
[02:11] 🎬 Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos(增强无训练无限帧生成以实现一致的长视频)
[03:04] 🚀 You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories(你仅需极简的RLVR训练:通过秩-1轨迹外推大语言模型)
[03:50] 🗜 OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond(OScaR:面向大语言模型及更广领域的极致KV缓存量化的奥卡姆剃刀)
[04:39] 🔧 IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools(IndusAgent:利用智能工具增强开放词汇工业异常检测)
[05:36] 🔊 A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook(大型音频语言模型综述:泛化、可信度与展望)
[06:35] 🤝 It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs(双管齐下:面向大语言模型语境完整性的互补式自蒸馏框架)
[07:26] 📈 Toto 2.0: Time Series Forecasting Enters the Scaling Era(Toto 2.0:时间序列预测进入规模化时代)
[08:20] ⚡ Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs(混合量化:面向智能体大语言模型的量化预填充与精确解码)
[09:25] 🧠 Generative Recursive Reasoning(生成式递归推理)
[10:29] 🎬 CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing(CutVerse:面向媒体后期制作编辑的组合式GUI智能体基准)
[11:22] 🖼 Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning(Uni-Edit:智能编辑作为统一模型调优的通用任务)
[12:08] 🧠 LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening(LLMEval-Logic:一个求解器验证的中文逻辑推理基准测试,具备对抗性强化)
[13:07] ⚡ HRM-Text: Efficient Pretraining Beyond Scaling(HRM-Text:超越规模的高效预训练)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递