本期的 15 篇论文如下:
00:22 🖼 TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes(TextCrafter:复杂视觉场景中准确渲染多个文本)
00:59 🎬 MoCha: Towards Movie-Grade Talking Character Synthesis(MoCha:面向电影级对话角色合成)
01:39 🔍 What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models(什么、如何、何地以及如何有效?大型语言模型中测试时扩展的调查)
02:16 🤖 Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model(Open-Reasoner-Zero:一种基于基础模型扩展强化学习的开源方法)
03:05 🧠 RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy(RIG:端到端通用策略中推理与想象的协同)
03:48 🧠 Effectively Controlling Reasoning Models through Thinking Intervention(通过思维干预有效控制推理模型)
04:32 💡 Query and Conquer: Execution-Guided SQL Generation(查询与征服:执行引导的SQL生成)
05:15 ✍ SketchVideo: Sketch-based Video Generation and Editing(SketchVideo:基于草图的视频生成与编辑)
06:04 🚨 TeleAntiFraud-28k: A Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection(TeleAntiFraud-28k:用于电信诈骗检测的音频-文本慢思考数据集)
06:57 💡 Efficient Inference for Large Reasoning Models: A Survey(大型推理模型高效推理综述)
07:40 🤖 Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code(基于LLM生成启发式的经典规划:用Python代码挑战最先进水平)
08:29 🧪 Expanding RL with Verifiable Rewards Across Diverse Domains(利用可验证奖励扩展强化学习至多样化领域)
09:11 ✨ Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data(渐进式渲染蒸馏:无需3D数据即可调整Stable Diffusion用于即时文本到网格生成)
09:50 🤖 TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization(TokenHSI:通过任务Token化统一合成物理人-场景交互)
10:30 🇰 KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language(KOFFVQA:一个针对大型视觉-语言模型在韩语中进行客观评估的自由形式VQA基准)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递