本期的 15 篇论文如下:
00:23 🧠 MMGR: Multi-Modal Generative Reasoning(MMGR:多模态生成式推理评估与基准)
01:14 🎮 WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling(WorldPlay:面向实时交互式世界建模的长期几何一致性研究)
01:47 🤖 Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?(视频真实性测试:AI生成的ASMR视频能否欺骗视觉语言模型与人类?)
02:46 🎨 Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling(Scone:通过统一理解-生成建模桥接主题驱动图像生成中的组合与区分)
03:29 🤖 RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics(RoboTracer:视觉语言模型在机器人学中掌握基于推理的空间轨迹追踪)
04:13 📊 OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value(OpenDataArena:一个用于基准测试训练后数据集价值的公平开放平台)
04:50 🎨 Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure(矢量棱镜:通过分层语义结构实现矢量图形动画)
05:36 🧊 Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views(揭示隐藏陷阱并从任务中心视角导航下一代向量相似性搜索)
06:14 🧠 RecGPT-V2 Technical Report(RecGPT-V2 技术报告)
07:04 📊 ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement(ShowTable:通过协作反思与精炼解锁创意表格可视化)
07:43 🎬 MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives(MemFlow:用于一致且高效长视频叙事的自适应记忆流)
08:22 🧠 VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse(VersatileFFN:通过自适应宽深复用实现大语言模型的参数高效性)
09:04 🎨 Feedforward 3D Editing via Text-Steerable Image-to-3D(基于文本可操控图像到三维的前馈式编辑方法)
09:52 🤖 A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning(A4-Agent:一种用于零样本可供性推理的智能体框架)
10:26 🎬 SS4D: Native 4D Generative Model via Structured Spacetime Latents(SS4D:基于结构化时空潜在表示的本地4D生成模型)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
