2025.02.19 | 数据高效语音处理,嵌入空间压缩创新。

2025.02.19 | 数据高效语音处理,嵌入空间压缩创新。

15分钟 ·
播放数207
·
评论数0

本期的 20 篇论文如下:

00:25 🎙 Soundwave: Less is More for Speech-Text Alignment in LLMs(声波:减少数据需求,优化语音与文本对齐在LLMs中的应用)

01:05 🔍 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity(将1568个Token压缩到一个向量并再次解压:探索嵌入空间容量的极限)

01:48 🌊 Continuous Diffusion Model for Language Modeling(连续扩散模型用于语言建模)

02:30 🎥 Phantom: Subject-consistent video generation via cross-modal alignment(幻影:通过跨模态对齐实现主体一致性视频生成)

03:12 🧠 Rethinking Diverse Human Preference Learning through Principal Component Analysis(重新思考通过主成分分析进行多样化人类偏好学习)

04:00 🤖 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation(SoFar:语言引导的方向桥接空间推理与对象操作)

04:36 🛡 SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models(SafeRoute:大型语言模型中高效且准确的安全防护栏的自适应模型选择)

05:25 🐍 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation(多模态Mamba:通过二次到线性蒸馏的解码器多模态状态空间模型)

06:08 📚 You Do Not Fully Utilize Transformer's Representation Capacity(你没有充分利用Transformer的表示能力)

06:50 🤖 Magma: A Foundation Model for Multimodal AI Agents(熔岩:多模态AI代理的基础模型)

07:23 💹 FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading(FLAG-Trader:融合LLM与基于梯度的强化学习用于金融交易)

08:08 📄 RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm(RealSyn:一种有效且可扩展的多模态交错文档转换范式)

08:49 🧠 PAFT: Prompt-Agnostic Fine-Tuning(PAFT:与提示无关的微调)

09:27 🛠 OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning(OctoTools:一个具有扩展工具的复杂推理代理框架)

10:13 📊 Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?(重新审视o1类模型的测试时缩放能力:它们是否真正具备测试时缩放能力?)

11:00 🔄 MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections(MUDDFormer:通过多路动态密集连接打破Transformer中的残差瓶颈)

11:37 🩺 HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation(HealthGPT:通过异构知识适应实现医疗大视觉语言模型的统一理解与生成)

12:12 🧠 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading(HeadInfer:通过分头卸载实现高效的LLM推理)

12:51 🌍 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation(文本到世界:大语言模型符号世界模型生成的基准测试)

13:32 🧠 Atom of Thoughts for Markov LLM Test-Time Scaling(用于马尔可夫LLM测试时扩展的原子思维)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递