2026.01.08 | 熵加权微调保旧学;演化技能网络不断进阶

2026.01.08 | 熵加权微调保旧学;演化技能网络不断进阶

11分钟 ·
播放数105
·
评论数0

本期的 15 篇论文如下:

00:21 ⚖ Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting(熵自适应微调:解决置信冲突以缓解遗忘)

01:15 🧠 Evolving Programmatic Skill Networks(演化式程序化技能网络)

01:51 🧠 Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning(Atlas:面向多领域复杂推理的异构模型与工具编排框架)

02:31 📊 Benchmark^2: Systematic Evaluation of LLM Benchmarks(基准测试的基准测试:大语言模型评估基准的系统性评估)

03:12 🎬 Klear: Unified Multi-Task Audio-Video Joint Generation(Klear:统一的多任务音视频联合生成)

03:53 🎬 Choreographing a World of Dynamic Objects(动态物体的编排:一个通用生成式流水线)

04:36 ✅ Agentic Rubrics as Contextual Verifiers for SWE Agents(作为上下文验证器的智能评分标准在软件工程代理中的应用)

05:11 ⚗ MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics(MDAgent2:用于分子动力学代码生成与知识问答的大语言模型)

05:55 🚀 E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models(E-GRPO:高熵步驱动流模型的有效强化学习)

06:53 🛡 RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models(RedBench:一个用于大型语言模型全面红队测试的通用数据集)

07:36 📊 EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning(EpiQAL:面向增强对齐与推理的流行病学问答大语言模型基准评测)

08:15 🧠 Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks(通过语言学习任务预训练增强语言模型的语言能力)

08:48 🔬 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts(为什么大语言模型还不是科学家:来自四次自主研究尝试的教训)

09:25 🤖 ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing(ThinkRL-Edit:基于强化学习的思维式推理中心图像编辑)

10:17 🧠 MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents(MAGMA:一种基于多图的AI智能体记忆架构)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递