2026.01.27 | Agent原生训练刷新SWE-Bench;LLM重塑数据清洗 pipeline

2026.01.27 | Agent原生训练刷新SWE-Bench;LLM重塑数据清洗 pipeline

13分钟 ·
播放数218
·
评论数0

【赞助商】

通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 15 篇论文如下:

00:33 🤖 daVinci-Dev: Agent-native Mid-training for Software Engineering(daVinci-Dev:面向软件工程的智能体原生中期训练)

01:21 🧹 Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs(大语言模型能否清理你的数据?基于LLM的应用就绪数据准备综述)

02:21 🎬 The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation(剧本即一切:面向长时域对话到电影视频生成的智能体框架)

03:08 🔬 Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility(科学图像合成:基准测试、方法论与下游效用)

04:00 🔬 iFSQ: Improving FSQ for Image Generation with 1 Line of Code(iFSQ:一行代码改进FSQ用于图像生成)

04:42 ⚡ Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers(弹性注意力:面向高效Transformer的测试时自适应稀疏率)

05:36 🎬 Self-Refining Video Sampling(自优化视频采样)

06:31 🧠 Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability(教模型自我教学:可学习性边缘的推理)

07:23 🎤 VIBEVOICE-ASR Technical Report(VIBEVOICE-ASR技术报告)

08:06 📊 CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval(CGPT:基于聚类引导的部分表格与LLM生成监督的表格检索方法)

09:04 📊 STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion(STAR:基于表头感知聚类与自适应加权融合的语义表格表示)

09:51 🧠 Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents(减少泛化税:关于LLM智能体强化学习训练的跨领域泛化研究)

10:26 🚀 AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation(AR-Omni:一种用于任意到任意生成的统一自回归模型)

11:15 🔍 SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback(SAGE:基于执行反馈的可控智能体数据生成用于深度搜索)

12:04 🤖 Agentic Very Long Video Understanding(基于智能体的超长视频理解)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递