2025.10.27 | DeepAgent一步推理+ToolPO;视频即提示DiT秒控百种语义

2025.10.27 | DeepAgent一步推理+ToolPO;视频即提示DiT秒控百种语义

10分钟 ·
播放数193
·
评论数0

本期的 15 篇论文如下:

00:27 🧠 DeepAgent: A General Reasoning Agent with Scalable Toolsets(DeepAgent:具备可扩展工具集的通用推理智能体)

01:01 🎬 Video-As-Prompt: Unified Semantic Control for Video Generation(视频即提示:统一语义控制的视频生成新范式)

01:35 🔧 From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model(从去噪到精修:视觉-语言扩散模型的纠错式生成框架)

02:14 🧩 Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation(逐段采样、分块优化:面向文本到图像生成的块级GRPO方法)

02:51 🧠 A Definition of AGI(AGI的量化定义)

03:23 🧩 Sparser Block-Sparse Attention via Token Permutation(基于Token置换的稀疏块稀疏注意力机制)

04:14 🧭 UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning(UI-Ins:以“指令即推理”多视角增强GUI定位)

04:57 🧠 Reasoning with Sampling: Your Base Model is Smarter Than You Think(基于采样的推理:你的基础模型比你想象的更聪明)

05:30 🧠 RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging(RECALL:基于表示对齐的层级模型融合缓解大模型灾难性遗忘)

06:08 📐 Visual Diffusion Models are Geometric Solvers(视觉扩散模型是几何求解器)

06:56 🌍 WorldGrow: Generating Infinite 3D World(无限3D世界生成:WorldGrow)

07:35 🎬 RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling(RAPO++:面向文生视频的跨阶段提示优化——数据对齐与测试时缩放)

08:14 🔗 Model Merging with Functional Dual Anchors(基于功能双锚点的模型融合方法)

08:49 🧭 Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs(揭示VideoLLM隐藏信息通路:视频语言模型内部流动图谱)

09:34 📊 Document Understanding, Measurement, and Manipulation Using Category Theory(基于范畴论的文档理解、度量与操控)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递