2025.06.19 | SEKAI数据集提升视频生成;原型推理增强LLM泛化能力。

2025.06.19 | SEKAI数据集提升视频生成;原型推理增强LLM泛化能力。

11分钟 ·
播放数79
·
评论数0

本期的 15 篇论文如下:

00:22 🌍 Sekai: A Video Dataset towards World Exploration(Sekai:一个面向世界探索的视频数据集)

01:02 💡 ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs(原型推理:作为大型语言模型中通用推理基础的原型)

01:43 💡 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models(GenRecal:从大型到小型视觉-语言模型的重校准后生成)

02:24 🗣 BUT System for the MLC-SLM Challenge(用于MLC-SLM挑战赛的BUT系统)

03:10 🤖 Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence(具身Web智能体:连接物理与数字领域,实现集成智能)

03:57 💡 Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation(自由形式生成中基于语义感知的开放式R1训练奖励)

04:43 🔬 SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification(SciVer:评估多模态科学声明验证中的基础模型)

05:26 🚀 Truncated Proximal Policy Optimization(截断近端策略优化)

06:04 🖼 PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers(PictSure:预训练嵌入对上下文学习图像分类器的影响)

06:37 🖼 CoMemo: LVLMs Need Image Context with Image Memory(CoMemo:LVLM需要带有图像记忆的图像上下文)

07:21 🤖 SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence(群体智能代理:迈向基于群体智能的全自动代理系统生成)

08:01 🧠 MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models(MoTE:面向内存高效的大型多模态模型的三元专家混合)

08:45 🛡 OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents(OS-Harm:衡量计算机使用Agent安全性的基准)

09:34 🏞 ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies(ImmerseGen:基于代理引导的、使用Alpha纹理代理的沉浸式世界生成)

10:09 🤝 FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models(FedNano:面向预训练多模态大语言模型的轻量级联邦调优)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递