本期的 15 篇论文如下:
00:24 🚀 InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency(InternVL3.5:提升开源多模态模型在通用性、推理能力和效率上的表现)
00:52 🧠 Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation(Visual-CoG:阶段感知强化学习与指导链用于文本到图像生成)
01:19 🎨 MV-RAG: Retrieval Augmented Multiview Diffusion(MV-RAG:检索增强多视角扩散)
01:45 🧠 T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation(T2I-ReasonBench:推理增强型文本到图像生成基准评估)
02:10 🤔 Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling(超越记忆:借助循环、记忆和测试时计算扩展来提升推理深度)
02:41 🚀 Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning(打破探索瓶颈:通用大型语言模型推理的评分标准支架式强化学习)
03:04 🎨 PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs(PosterGen:基于多智能体LLMs的美学感知型论文海报生成)
03:25 🤔 UQ: Assessing Language Models on Unsolved Questions(UQ:评估语言模型面对未解决问题)
03:54 📚 MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment(MEENA (PersianMMMU):面向多级别评估的多模态多语言教育考试)
04:25 🗺 Explain Before You Answer: A Survey on Compositional Visual Reasoning(先解释再回答:组合式视觉推理研究综述)
04:47 📊 ST-Raptor: LLM-Powered Semi-Structured Table Question Answering(ST-Raptor:大语言模型驱动的半结构化表格问答)
05:15 🔍 SpotEdit: Evaluating Visually-Guided Image Editing Methods(SpotEdit:评估视觉引导的图像编辑方法)
05:39 📖 German4All - A Dataset and Model for Readability-Controlled Paraphrasing in German(German4All:德语中可读性控制复述的数据集与模型)
06:06 📉 Limitations of Normalization in Attention Mechanism(注意力机制中归一化的局限性)
06:33 🌐 MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting(MeshSplat:基于高斯辐射场的可泛化稀疏视角表面重建)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
