2026.05.25 | SkillOpt实现技能自进化;Lens提升文生图训练效率

2026.05.25 | SkillOpt实现技能自进化;Lens提升文生图训练效率

15分钟 ·
播放数102
·
评论数0

【目录】
本期的 15 篇论文如下:
[00:25] 🧠 SkillOpt: Executive Strategy for Self-Evolving Agent Skills(SkillOpt:面向自进化智能体技能的执行策略)
[01:16] 🔍 Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models(Lens:重新思考基础文本到图像模型的训练效率)
[02:20] 🔀 Rethinking Cross-Layer Information Routing in Diffusion Transformers(重新思考扩散变换器中的跨层信息路由)
[03:01] 🧠 SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research(SciAtlas:面向自动化科学研究的大规模知识图谱)
[03:56] 🎙 StepAudio 2.5 Technical Report(StepAudio 2.5 技术报告)
[04:51] 👁 See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding(看懂我的意思:对齐视觉与语言表示以实现视频细粒度物体理解)
[05:50] 📸 PhotoFlow: Agentic 3D Virtual Photography Missions(PhotoFlow:智能体式的3D虚拟摄影任务)
[06:29] 🧠 From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills(从原始经验到技能消费:模型生成智能体技能的系统性研究)
[07:28] 🎥 VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis(VGenST-Bench:通过主动视频合成进行时空推理的基准测试)
[08:29] ⚡ PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion(PiD:基于像素扩散的快速高分辨率潜在解码)
[09:35] 🎨 RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution(RankE:面向离散文本到图像生成的端到端后训练与解码器协同进化)
[10:30] ✂ ETCHR: Editing To Clarify and Harness Reasoning(ETCHR:通过编辑来阐明和利用推理能力)
[11:26] 🎮 SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models(SCOPE:在可玩环境中模拟跨游戏操作以构建FPS世界模型)
[12:14] 📡 LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws(大语言模型作为噪声信道:香农视角下的模型容量与缩放定律)
[13:03] 🎥 Geo-Align: Video Generation Alignment via Metric Geometry Reward(几何对齐:基于度量几何奖励的视频生成对齐方法)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递