2025.02.25 | 长上下文优化创新,视觉扩散高效通用。

2025.02.25 | 长上下文优化创新,视觉扩散高效通用。

15分钟 ·
播放数150
·
评论数0

本期的 20 篇论文如下:

00:27 📖 Thus Spake Long-Context Large Language Model(长上下文大语言模型如是说)

01:09 🌈 DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks(用于视觉感知任务的通用扩散模型)

01:48 🚀 Slamming: Training a Speech Language Model on One GPU in a Day(撞击:在一天内使用单个GPU训练语音语言模型)

02:32 🎥 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing(视频粒度:调节时空注意力实现多粒度视频编辑)

03:11 🎧 Audio-FLAN: A Preliminary Release(音频FLAN:初步发布)

03:43 🧠 CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models(CodeCriticBench:面向大型语言模型的全面代码 critique 基准测试)

04:28 🎨 GCC: Generative Color Constancy via Diffusing a Color Checker(GCC:通过扩散色卡生成颜色恒常性)

05:11 📊 Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning(数学推理中测试时间扩展的语言通用性)

05:57 🚀 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment(让LoRA再次伟大:通过自适应奇异值和混合专家优化对齐提升LoRA性能)

06:38 🧠 Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models(多模态不一致性推理(MMIR):多模态推理模型的新基准)

07:23 🎥 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers(RIFLEx:视频扩散Transformer中长度外推的免费午餐)

08:01 📱 Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration(移动代理V:通过视频引导的多代理协作学习移动设备操作)

08:45 ⏳ Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties(中国朝代间的时间推理与对齐基准测试)

09:31 🤖 Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation(反射性规划:视觉语言模型在多阶段长时程机器人操作中的应用)

10:02 🔄 Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam(稳定-SPAM:如何在4位精度下比16位Adam更稳定地训练)

10:43 📝 Can Community Notes Replace Professional Fact-Checkers?(社区笔记能替代专业事实核查员吗?)

11:24 📈 Forecasting Open-Weight AI Model Growth on Hugging Face(预测Hugging Face上开放权重AI模型的增长)

12:08 🔑 Beyond Release: Access Considerations for Generative AI Systems(超越发布:生成式人工智能系统的访问考量)

12:49 🌐 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning(TAG:一种用于多智能体分层强化学习的去中心化框架)

13:30 💃 X-Dancer: Expressive Music to Human Dance Video Generation(X-Dancer:从音乐生成生动舞蹈视频)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递