【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗www.xiaoyuzhoufm.com
【目录】
本期的 15 篇论文如下:
00:34 🎯 Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding(Video-MME-v2:迈向全面视频理解基准的下一个阶段)
01:19 🔬 Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents(Claw-Eval:迈向可信赖的自主智能体评估)
02:06 🤖 Learning to Retrieve from Agent Trajectories(从智能体轨迹中学习检索)
02:53 🧪 ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation(ACES:谁来测试测试?代码生成的留一法AUC一致性)
03:42 👗 Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision(Vanast:基于合成三元组监督的虚拟试穿与人体图像动画)
04:31 ⏱ Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning(超越准确率:揭示工具集成推理中的低效模式)
05:23 🧠 ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement(ThinkTwice:联合优化大型语言模型的推理与自我精炼能力)
06:03 🔍 Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework(论文圈:一个开源的多智能体研究文献发现与分析框架)
06:52 🔍 How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings(智能体技能在真实场景中的效用评估:基准测试LLM在现实环境下的技能使用)
07:33 🚀 MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU(MegaTrain:在单GPU上全精度训练1000亿+参数大语言模型)
08:11 🛠 DARE: Diffusion Large Language Models Alignment and Reinforcement Executor(DARE:扩散大语言模型的对齐与强化执行器)
08:54 🧠 In-Place Test-Time Training(原位测试时训练)
09:39 🎬 Watch Before You Answer: Learning from Visually Grounded Post-Training(先看后答:基于视觉基础的后训练学习)
10:13 🔍 Demystifying When Pruning Works via Representation Hierarchies(通过表征层次解析剪枝何时有效)
10:59 🤖 Action Images: End-to-End Policy Learning via Multiview Video Generation(动作图像:通过多视角视频生成的端到端策略学习)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
