HuggingFace 每日AI论文速递

7377已订阅

拨号上网

单集更新

节目详情

2025.05.22 | Web导航效率提升；量化误差优化。
本期的 15 篇论文如下： [00:25] 🤖 Web-Shepherd: Advancing PRMs for Reinforcing Web Agents（Web-Shepherd：用于增强Web代理的PRM的进步） [01:13] 🧮 Scaling Law for Quantization-Aware Training（量化感知训练的缩放法则） [01:53] 🤖 UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning（基于强化学习和推理引导的通用视觉定位） [02:28] 🎨 MMaDA: Multimodal Large Diffusion Language Models（MMaDA：多模态大型扩散语言模型） [03:04] 🔄 Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective（扩散模型 vs. 自回归语言模型：文本嵌入的视角） [03:44] 💻 Efficient Agent Training for Computer Use（用于计算机使用的高效Agent训练） [04:26] 🧠 Learn to Reason Efficiently with Adaptive Length-based Reward Shaping（基于自适应长度奖励塑造的高效推理学习） [05:08] 💡 When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning（何时继续思考：用于高效推理的自适应思考模式切换） [05:39] 🤖 Vid2World: Crafting Video Diffusion Models to Interactive World Models（Vid2World：构建交互式世界模型的视频扩散模型） [06:16] 🖼 IA-T2I: Internet-Augmented Text-to-Image Generation（互联网增强的文本到图像生成） [06:49] 🧠 Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs（基于先验知识的审慎：大型语言模型在知识图谱上的可信推理） [07:31] 🎮 lmgame-Bench: How Good are LLMs at Playing Games?（lmgame-Bench：大型语言模型在玩游戏方面表现如何？） [08:18] 🏙 Constructing a 3D Town from a Single Image（从单张图像构建三维城镇） [08:58] 🚀 dKV-Cache: The Cache for Diffusion Language Models（dKV-Cache：扩散语言模型的缓存） [09:40] 🛡 How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study（我们应该如何提升大型推理模型的安全性：一项实证研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 21小时前
47
0
2025.05.21 | 多模态预训练提升复杂任务能力；注意力机制优化推理与训练效率。
本期的 15 篇论文如下： [00:22] 💡 Emerging Properties in Unified Multimodal Pretraining（统一多模态预训练中的涌现属性） [01:03] 🚀 SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training（SageAttention3：用于推理的微缩FP4注意力机制与8位训练的探索） [01:42] 🖼 VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank（VisualQuality-R1：基于强化学习排序的推理引导图像质量评估） [02:23] 🤖 Visual Agentic Reinforcement Fine-Tuning（视觉Agent强化微调） [03:01] 🧪 The Aloe Family Recipe for Open and Specialized Healthcare LLMs（开源与专用医疗保健大型语言模型的芦荟家族秘方） [03:40] 🧮 Optimizing Anytime Reasoning via Budget Relative Policy Optimization（通过预算相对策略优化实现随时推理优化） [04:25] 🧠 Neurosymbolic Diffusion Models（神经符号扩散模型） [05:02] 🌊 Latent Flow Transformer（潜在流Transformer） [05:40] 🧑 Exploring Federated Pruning for Large Language Models（探索用于大型语言模型的联邦剪枝） [06:23] 👁 Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning（Visionary-R1：利用强化学习缓解视觉推理中的捷径问题） [07:05] 🧠 General-Reasoner: Advancing LLM Reasoning Across All Domains（通用推理器：提升大型语言模型在所有领域的推理能力） [07:45] 🤔 Reasoning Models Better Express Their Confidence（推理模型更善于表达其置信度） [08:20] 🚀 Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning（推理路径压缩：压缩生成轨迹以实现高效的LLM推理） [09:07] 🖼 Training-Free Watermarking for Autoregressive Image Generation（自回归图像生成模型的免训练水印方法） [09:48] 🤔 VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation（VideoEval-Pro：稳健且真实的长视频理解评估）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 2天前
66
0
2025.05.20 | 模型链学习提升效率；AdaptThink优化推理速度。
本期的 15 篇论文如下： [00:23] 🔗 Chain-of-Model Learning for Language Model（模型链学习：一种用于语言模型的新型学习范式） [00:58] 🤔 AdaptThink: Reasoning Models Can Learn When to Think（AdaptThink：推理模型何时思考的学习） [01:45] 🧠 AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning（AdaCoT: 通过强化学习实现帕累托最优的自适应思维链触发） [02:21] 🚀 Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction（Delta注意力机制：通过Delta校正实现快速而精确的稀疏注意力推断） [03:04] 🖥 Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis（通过用户界面分解与合成扩展计算机使用中的Grounding） [03:43] 🤔 Thinkless: LLM Learns When to Think（智思：大语言模型学习何时思考） [04:23] 💡 Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space（暗中求索：在隐空间中通过测试时实例级策略梯度进行推理） [05:00] 🧮 MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision（MM-PRM：利用可扩展的步骤级监督增强多模态数学推理） [05:39] ✨ Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation（混合3D-4D高斯溅射：用于快速动态场景表示） [06:15] 🛡 FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA（FedSVD：基于LoRA的自适应正交化差分隐私联邦学习） [07:00] 🧩 Model Merging in Pre-training of Large Language Models（大型语言模型预训练中的模型合并） [07:53] 🤖 CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models（CPGD：面向语言模型稳定规则强化学习） [08:36] 🎬 Faster Video Diffusion with Trainable Sparse Attention（基于可训练稀疏注意力的快速视频扩散） [09:23] 🧠 Fractured Chain-of-Thought Reasoning（碎裂的思维链推理） [10:03] 🧠 Neuro-Symbolic Query Compiler（神经符号查询编译器）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 3天前
74
0
2025.05.19 | Qwen3提升LLMs性能；GuardReasoner-VL强化VLM安全。
本期的 15 篇论文如下： [00:24] 🤖 Qwen3 Technical Report（Qwen3技术报告） [01:14] 🛡 GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning（GuardReasoner-VL：通过强化推理保护视觉语言模型） [02:01] 🖼 MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly（MMLongBench：有效且全面地评测长文本视觉语言模型） [02:40] 🖼 Visual Planning: Let's Think Only with Images（视觉规划：让我们只用图像思考） [03:25] 💡 Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization（基于视觉-语言模型通过双头优化实现的简单半监督知识蒸馏） [04:09] 🧠 Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity（群策群思：多个并发推理智能体在Token级别粒度上进行协作） [04:53] 🧬 Mergenetic: a Simple Evolutionary Model Merging Library（Mergenetic：一个用于合并库的简单进化模型） [05:35] 💡 MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation（MPS-Prover：通过多视角搜索和数据精选推进逐步定理证明） [06:14] 🧮 Multi-Token Prediction Needs Registers（多Token预测需要寄存器） [06:48] 🤔 Scaling Reasoning can Improve Factuality in Large Language Models（扩展推理能力提升大型语言模型的事实准确性） [07:25] 🧪 MatTools: Benchmarking Large Language Models for Materials Science Tools（MatTools：用于材料科学工具的大语言模型基准测试） [08:04] 🤔 Humans expect rationality and cooperation from LLM opponents in strategic games（人类期望在策略游戏中，大型语言模型对手是理性和合作的） [08:45] 🤝 Learning Dense Hand Contact Estimation from Imbalanced Data（基于不平衡数据的稠密手部接触估计学习） [09:26] 🩻 CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs（CheXGenBench：合成胸部X光片的保真度、隐私性和效用性的统一基准） [10:11] 🤝 From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models（从权衡到协同：一种用于大型语言模型的多功能共生水印框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 4天前
102
0
【周末特辑】5月第3周最火AI论文 | Seed1.5-VL多模态推理领先；MiniMax-Speech零样本语音克隆
本期的 5 篇论文如下： [00:38] TOP1(🔥126) | 💡 Seed1.5-VL Technical Report（Seed1.5-VL 技术报告） [03:11] TOP2(🔥109) | 🗣 MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder（MiniMax-Speech：具有可学习说话人编码器的内在零样本语音合成） [05:23] TOP3(🔥86) | 💡 Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models（超越“Aha!”时刻：迈向大型推理模型中系统性元能力对齐） [07:25] TOP4(🔥73) | 🧠 MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining（MiMo：释放语言模型的推理潜力——从预训练到后训练） [10:04] TOP5(🔥67) | 🖼 BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset（BLIP3-o：一族完全开放的统一多模态模型——架构、训练和数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
13分钟 · 6天前
110
0
2025.05.16 | 推理模型元能力提升；系统提示优化与鲁棒性增强
本期的 15 篇论文如下： [00:24] 💡 Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models（超越“Aha!”时刻：迈向大型推理模型中系统性元能力对齐） [01:02] 🤖 System Prompt Optimization with Meta-Learning（基于元学习的系统提示优化） [01:47] 🤖 EnerVerse-AC: Envisioning Embodied Environments with Action Condition（EnerVerse-AC：通过动作条件设想具身环境） [02:29] 🧠 The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think（CoT百科全书：分析、预测和控制推理模型如何思考） [03:17] 🤖 EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models（EWMBench：具身世界模型中场景、运动和语义质量的评估） [03:57] 🖼 End-to-End Vision Tokenizer Tuning（端到端视觉标记器调优） [04:34] 📈 WorldPM: Scaling Human Preference Modeling（世界偏好建模：扩展人类偏好模型） [05:13] 🤖 MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering（MLE-Dojo：用于增强机器学习工程中LLM代理的交互式环境） [06:01] 🧩 Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning（通过启发式适配和超Token学习实现语言模型中的Tokenizer灵活性） [06:43] 🎨 Style Customization of Text-to-Vector Generation with Image Diffusion Priors（基于图像扩散先验的文本到矢量生成风格定制） [07:25] 🧠 J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning（J1：通过强化学习激励LLM作为裁判时的思考） [08:07] 👉 PointArena: Probing Multimodal Grounding Through Language-Guided Pointing（PointArena：通过语言引导的指向探测多模态理解） [08:47] 🖼 Depth Anything with Any Prior（任意先验的深度感知） [09:29] 🖼 OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning（OpenThinkIMG: 通过视觉工具强化学习，学习用图像思考） [10:14] 🚀 Parallel Scaling Law for Language Models（语言模型的并行扩展法则）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 7天前
111
0
2025.05.15 | 解耦学习提升感知性能；多模态模型优化图像生成。
本期的 11 篇论文如下： [00:23] 🖼 DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception（DeCLIP：用于开放词汇密集感知的解耦学习） [01:02] 🖼 BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset（BLIP3-o：一族完全开放的统一多模态模型——架构、训练和数据集） [01:41] 💡 Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures（DeepSeek-V3 的深度剖析：AI 架构的扩展挑战与硬件思考） [02:24] 🎨 Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis（Marigold：用于图像分析的基于扩散的图像生成器的经济型适配） [03:00] 🤖 UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations（UniSkill：通过跨具身技能表征模仿人类视频） [03:42] 🐛 SweRank: Software Issue Localization with Code Ranking（SweRank：基于代码排序的软件问题定位） [04:23] 🤔 VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models（VCRBench：探索大型视频语言模型在长程因果推理方面的能力） [05:14] 🖼 CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image（CAST：基于RGB图像的组件对齐三维场景重建） [05:49] 🤔 Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?（Omni-R1: 微调音频大语言模型真的需要音频数据吗？） [06:27] 🤔 Visually Interpretable Subtask Reasoning for Visual Question Answering（视觉问答中基于视觉可解释性的子任务推理） [06:59] 🚁 DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition（DetReIDX：一个用于现实世界无人机人员识别的压力测试数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
8分钟 · 8天前
85
0
2025.05.14 | 零样本语音合成新模型；多维度评估LLM指令能力
本期的 8 篇论文如下： [00:25] 🗣 MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder（MiniMax-Speech：具有可学习说话人编码器的内在零样本语音合成） [01:00] 🤖 A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models（用于评估和改进大型语言模型指令遵循能力的多维度约束框架） [01:47] 🎮 Measuring General Intelligence with Generated Games（基于生成游戏测量通用智能） [02:29] 🎦 SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation（SkillFormer：用于评估技能水平的统一多视角视频理解） [03:14] 🤖 NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance（NavDP：基于特权信息引导的Sim-to-Real导航扩散策略学习） [03:51] 🔍 Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency（优化检索增强生成：超参数对性能和效率影响的分析） [04:28] 🇻 ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation（ViMRHP：一个人机协作标注的越南语多模态评论有用性预测基准数据集） [05:04] 📖 Advancing Arabic Reverse Dictionary Systems: A Transformer-Based Approach with Dataset Construction Guidelines（推进阿拉伯语逆向词典系统：一种基于Transformer的方法与数据集构建指南）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
6分钟 · 9天前
104
1
2025.05.13 | 视觉-语言模型提升多模态能力；优化训练策略增强推理潜力。
本期的 15 篇论文如下： [00:24] 💡 Seed1.5-VL Technical Report（Seed1.5-VL 技术报告） [01:04] 🧠 MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining（MiMo：释放语言模型的推理潜力——从预训练到后训练） [01:48] 🖼 Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets（Step1X-3D：迈向高质量和可控的纹理3D资产生成） [02:29] 🤝 Learning from Peers in Reasoning Models（推理模型中的同伴学习） [03:08] 🎨 Unified Continuous Generative Models（统一连续生成模型） [03:49] 🤖 REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback（REFINE-AF：一种通过强化学习和自动反馈，以自生成指令对齐语言模型的任务无关框架） [04:44] 💃 DanceGRPO: Unleashing GRPO on Visual Generation（DanceGRPO：在视觉生成领域释放GRPO的潜力） [05:25] 🧠 AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection（AttentionInfluence：采用注意力头影响进行弱到强预训练数据选择） [06:10] 🌐 WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch（WebGen-Bench：评估大型语言模型从零生成交互式和功能性网站的能力） [06:53] 📈 Learning Dynamics in Continual Pre-Training for Large Language Models（大型语言模型持续预训练中的学习动态） [07:28] 🏆 Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning（Skywork-VL Reward：一种用于多模态理解和推理的有效奖励模型） [08:11] 🧠 Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent（用于高效自适应搜索代理的增强型内外知识协同推理） [08:50] 🤖 H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning（H$^{\mathbf{3}}$DP：用于视觉运动学习的三重分层扩散策略） [09:36] 🎨 Continuous Visual Autoregressive Generation via Score Maximization（基于得分最大化的连续视觉自回归生成） [10:26] 🧠 Overflow Prevention Enhances Long-Context Recurrent LLMs（溢出预防增强长文本循环LLM）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
12分钟 · 10天前
160
0
2025.05.12 | 波兰语模型优化；高效参数利用
本期的 7 篇论文如下： [00:23] 🇵 Bielik v3 Small: Technical Report（Bielik v3 Small：技术报告） [01:07] 🇵 Bielik 11B v2 Technical Report（Bielik 11B v2 技术报告） [01:42] 🤖 UniVLA: Learning to Act Anywhere with Task-centric Latent Actions（UniVLA：通过任务中心潜在动作学习在任意环境行动） [02:30] 🎨 G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness（G-FOCUS：迈向评估用户界面设计说服力的稳健方法） [03:16] ⭐ Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models（星辰引航：大型语言模型后训练与测试时扩展中基于奖励学习的综述） [03:55] ⚕ Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information（健康的大语言模型？英国政府公共健康信息知识基准测试） [04:37] 🖼 A Preliminary Study for GPT-4o on Image Restoration（GPT-4o 在图像修复中的初步研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
6分钟 · 11天前
86
0
【周末特辑】5月第2周最火AI论文 | 零数据自博弈推理；多模态长推理模型综述
本期的 5 篇论文如下： [00:42] TOP1(🔥93) | 🚀 Absolute Zero: Reinforced Self-play Reasoning with Zero Data（绝对零度：基于零数据的强化自博弈推理） [02:38] TOP2(🔥91) | 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models（感知、推理、思考与规划：大型多模态推理模型综述） [04:44] TOP3(🔥83) | 🧠 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning（基于强化微调的统一多模态思维链奖励模型） [06:35] TOP4(🔥77) | 🤖 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play（Voila：用于实时自主交互和语音角色扮演的语音-语言基础模型） [08:52] TOP5(🔥77) | 🧠 Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers（野外Grokking：使用Transformers进行真实世界多跳推理的数据增强）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 13天前
176
0
2025.05.09 | 多模态推理模型发展综述；通用智能评估框架提出
本期的 15 篇论文如下： [00:22] 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models（感知、推理、思考与规划：大型多模态推理模型综述） [00:57] 🤖 On Path to Multimodal Generalist: General-Level and General-Bench（迈向多模态通用智能：通用水平与通用基准） [01:40] 🤖 Flow-GRPO: Training Flow Matching Models via Online RL（Flow-GRPO：通过在线强化学习训练Flow Matching模型） [02:23] 🧠 Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models（作为裁判的感知代理：评估大型语言模型中的高阶社会认知） [03:05] 🧠 Scalable Chain of Thoughts via Elastic Reasoning（基于弹性推理的可扩展思维链） [03:41] 🔍 FG-CLIP: Fine-Grained Visual and Textual Alignment（FG-CLIP：细粒度视觉与文本对齐） [04:19] 🏞 3D Scene Generation: A Survey（三维场景生成：综述） [05:02] 🧮 ICon: In-Context Contribution for Automatic Data Selection（ICon：用于自动数据选择的上下文贡献度学习） [05:39] 🎬 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant（StreamBridge：将离线视频大语言模型转化为主动流式助手） [06:19] 🤖 LiftFeat: 3D Geometry-Aware Local Feature Matching（LiftFeat: 三维几何感知局部特征匹配） [06:56] 🧱 Generating Physically Stable and Buildable LEGO Designs from Text（基于文本生成物理稳定且可搭建的乐高设计） [07:38] 🧠 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains（X-Reasoner：迈向跨模态和领域的通用推理） [08:22] 🌐 Crosslingual Reasoning through Test-Time Scaling（基于测试时缩放的跨语言推理） [09:04] 🖼 PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes（PlaceIt3D：语言引导的真实3D场景物体放置） [09:42] 🌐 BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese（BrowseComp-ZH：中文环境下评估大型语言模型网页浏览能力的基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 14天前
123
1
2025.05.08 | 多模态模型整合潜力大；零搜索提升LLMs效率。
本期的 14 篇论文如下： [00:21] 💡 Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities（统一多模态理解与生成模型：进展、挑战与机遇） [01:02] 🤖 ZeroSearch: Incentivize the Search Capability of LLMs without Searching（零搜索：无需搜索即可激励大型语言模型的搜索能力） [01:50] 🤔 Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models（超越识别：评估视觉语言模型中的视觉视角采纳能力） [02:31] 🎬 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation（HunyuanCustom：一种用于定制视频生成的多模态驱动架构） [03:15] 🧩 PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer（PrimitiveAnything：基于自回归Transformer的人工3D图元组合生成） [04:04] 🤖 Benchmarking LLMs' Swarm intelligence（大型语言模型群集智能基准测试） [04:49] 🤔 Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving（超越定理证明：形式化问题求解的公式、框架与基准） [05:26] 🤖 OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation（OpenHelix：机器人操作的双系统VLA模型的简要调查、实证分析和开源实现） [05:58] 🌐 OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution（OmniGIRL：一个用于GitHub问题解决的多语言和多模态基准） [06:36] 🖥 OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents（OSUniverse：多模态GUI导航AI智能体的基准测试） [07:19] 🧠 Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey（大型语言模型赋能知识增强的复杂问题求解：一项综述） [08:04] 🎛 R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training（R&B：面向高效基础模型训练的领域重组与数据混合平衡） [08:48] 🤝 Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation（涌现认知：人机知识共创中的能动性、维度与动态） [09:26] 📹 Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection（不确定性加权图像-事件多模态融合的视频异常检测）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 15天前
112
0
2025.05.07 | 多模态思维链提升模型性能；零数据自博弈强化推理能力。
本期的 14 篇论文如下： [00:24] 🧠 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning（基于强化微调的统一多模态思维链奖励模型） [01:10] 🤖 Absolute Zero: Reinforced Self-play Reasoning with Zero Data（绝对零度：零数据下的强化自博弈推理） [01:52] 🤸 FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios（FlexiAct：面向异构场景的灵活动作控制） [02:33] 🚀 RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale（RADLADS：大规模线性注意力解码器的快速注意力蒸馏） [03:07] 🚀 RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference（RetroInfer：一种用于可扩展长文本LLM推理的向量存储方法） [03:45] 👁 Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading（从阅读中的眼动解码开放式信息搜寻目标） [04:30] 🗜 An Empirical Study of Qwen3 Quantization（Qwen3量化的实证研究） [05:09] ⚽ Multi-Agent System for Comprehensive Soccer Understanding（用于综合足球理解的多智能体系统） [05:52] 🗣 VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model（VITA-Audio：用于高效大型语音-语言模型的快速交错跨模态Token生成） [06:36] 🗺 Geospatial Mechanistic Interpretability of Large Language Models（大型语言模型的地理空间机制可解释性） [07:12] 🧑 InfoVids: Reimagining the Viewer Experience with Alternative Visualization-Presenter Relationships（InfoVids：通过另类可视化-演示者关系重塑观看者体验） [07:54] 🤖 Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering（仅在需要时调用接口：用于问答中大语言模型的自适应调用） [08:32] 🥽 HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation（HoloTime：驾驭视频扩散模型生成全景4D场景） [09:18] 🤖 Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant（Auto-SLURP：一个用于评估智能个人助理中多智能体框架的基准数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
10分钟 · 16天前
149
0
2025.05.06 | Voila实现低延迟全双工对话；RM-R1提升大模型推理奖励。
本期的 15 篇论文如下： [00:22] 🤖 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play（Voila：用于实时自主交互和语音角色扮演的语音-语言基础模型） [01:09] 🤔 RM-R1: Reward Modeling as Reasoning（RM-R1：将奖励建模视为推理） [01:52] 🧠 Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers（野外Grokking：用于Transformer真实世界多跳推理的数据增强） [02:32] 🧮 FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models（FormalMATH：大规模语言模型的形式化数学推理基准） [03:17] ✂ ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations（ReplaceMe：基于层剪枝和线性变换的网络简化） [03:59] 🧠 Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL（通过拒绝采样和强化学习中的梯度方差最小化优化思维链推理器） [04:39] 🚀 Practical Efficiency of Muon for Pretraining（Muon在预训练中的实际效率） [05:18] ⚙ A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency（大语言模型推理引擎综述：优化与效率的视角） [06:01] 🤖 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning（R1-奖励：通过稳定强化学习训练多模态奖励模型） [06:44] 🤔 Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents（随机应变：基于强化学习的社交智能体自适应思考） [07:24] 🤖 SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations（SkillMimic-V2：从稀疏和嘈杂的示范中学习鲁棒且可泛化的交互技能） [08:03] 🤖 Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning（基于强化学习的LLM自主推理与工具集成） [08:50] 🖼 SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing（SuperEdit：修正并促进基于指令的图像编辑的监督） [09:30] 🧮 Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities（大语言模型低精度训练：方法、挑战与机遇） [10:11] 🎨 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction（Ming-Lite-Uni：自然多模态交互统一架构的进展）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递
11分钟 · 17天前
138
0

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

在小宇宙打开