2026.03.30 | ShotStream流式生成多镜头;PackForcing短视频训出长片

2026.03.30 | ShotStream流式生成多镜头;PackForcing短视频训出长片

9分钟 ·
播放数106
·
评论数0

【赞助商】

通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事

传送门 🔗www.xiaoyuzhoufm.com

【目录】

本期的 10 篇论文如下:

00:28 🎬 ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling(ShotStream:用于交互式叙事的多镜头流式视频生成)

01:07 🎬 PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference(PackForcing:短视频训练足以实现长视频采样与长上下文推理)

01:54 🧠 Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills(Trace2Skill:将轨迹局部经验提炼为可迁移的智能体技能)

02:43 📊 RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation(RealChart2Code:基于真实数据与多任务评估推进图表到代码生成)

03:53 🚗 LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset(带有推理轨迹的长尾驾驶场景:KITScenes长尾数据集)

04:42 🧠 Know3D: Prompting 3D Generation with Knowledge from Vision-Language Models(Know3D:利用视觉语言模型知识驱动的3D生成提示)

05:25 🛠 Natural-Language Agent Harnesses(自然语言智能体控制框架)

06:10 🎤 Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models(侍酒师:面向全双工语音语言模型的可扩展开放多轮音频预处理)

06:59 🔬 MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies(MedOpenClaw:基于未整理完整研究的可审计医学影像智能体推理)

07:46 🚀 Diffutron: A Masked Diffusion Language Model for Turkish Language(Diffutron:面向土耳其语的掩码扩散语言模型)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递