2025.04.01 | 多文本渲染新方法，电影级对话角色合成 - HuggingFace 每日AI论文速递

本期的 15 篇论文如下：

00:22 🖼 TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes（TextCrafter：复杂视觉场景中准确渲染多个文本）

00:59 🎬 MoCha: Towards Movie-Grade Talking Character Synthesis（MoCha：面向电影级对话角色合成）

01:39 🔍 What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models（什么、如何、何地以及如何有效？大型语言模型中测试时扩展的调查）

02:16 🤖 Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model（Open-Reasoner-Zero：一种基于基础模型扩展强化学习的开源方法）

03:05 🧠 RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy（RIG：端到端通用策略中推理与想象的协同）

03:48 🧠 Effectively Controlling Reasoning Models through Thinking Intervention（通过思维干预有效控制推理模型）

04:32 💡 Query and Conquer: Execution-Guided SQL Generation（查询与征服：执行引导的SQL生成）

05:15 ✍ SketchVideo: Sketch-based Video Generation and Editing（SketchVideo：基于草图的视频生成与编辑）

06:04 🚨 TeleAntiFraud-28k: A Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection（TeleAntiFraud-28k：用于电信诈骗检测的音频-文本慢思考数据集）

06:57 💡 Efficient Inference for Large Reasoning Models: A Survey（大型推理模型高效推理综述）

07:40 🤖 Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code（基于LLM生成启发式的经典规划：用Python代码挑战最先进水平）

08:29 🧪 Expanding RL with Verifiable Rewards Across Diverse Domains（利用可验证奖励扩展强化学习至多样化领域）

09:11 ✨ Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data（渐进式渲染蒸馏：无需3D数据即可调整Stable Diffusion用于即时文本到网格生成）

09:50 🤖 TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization（TokenHSI：通过任务Token化统一合成物理人-场景交互）

10:30 🇰 KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language（KOFFVQA：一个针对大型视觉-语言模型在韩语中进行客观评估的自由形式VQA基准）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递