本期的 14 篇论文如下:
00:25 🔧 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models(开放编码器:顶级代码大语言模型的开放食谱)
01:03 🎥 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning(ReCapture:使用掩码视频微调生成用户提供视频的生成性摄像机控制)
01:46 ⚡ BitNet a4.8: 4-bit Activations for 1-bit LLMs(BitNet a4.8:1位大语言模型的4位激活)
02:25 🎥 DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion(DimensionX:从单张图像生成可控视频扩散的3D和4D场景)
03:04 🤖 Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models(混合变压器:多模态基础模型的稀疏与可扩展架构)
03:39 🧠 Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model(灭霸:通过融入心灵技能增强对话代理的大型语言模型)
04:21 🎥 TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation(TIP-I2V:百万级真实文本与图像提示数据集用于图像到视频生成)
05:05 🤖 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation(DynaMem:开放世界移动操作的在线动态时空语义记忆)
05:40 🧵 Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?(针穿线:LLMs能否在近百万规模的文本中追踪线索?)
06:22 👀 GazeGen: Gaze-Driven User Interaction for Visual Content Generation(GazeGen:基于注视驱动的用户交互视觉内容生成)
07:03 🌐 RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval(RetrieveGPT:融合提示与数学模型以增强代码混合信息检索)
07:49 🎥 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation(SG-I2V:图像到视频生成中的自引导轨迹控制)
08:29 🎥 VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos(视频GLaMM:一种用于视频中像素级视觉定位的大型多模态模型)
09:03 ⚡ SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models(SVDQuant:通过低秩成分吸收异常值的4比特扩散模型)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
