2025.03.14 | CoSTA*优化多轮编辑效率,无声品牌攻击揭示扩散模型脆弱性。

2025.03.14 | CoSTA*优化多轮编辑效率,无声品牌攻击揭示扩散模型脆弱性。

11分钟 ·
播放数88
·
评论数0

本期的 15 篇论文如下:

00:25 🖼 CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing(CoSTA*:面向多轮图像编辑的成本敏感工具路径代理)

01:03 🎭 Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models(无声品牌攻击:针对文本到图像扩散模型的无触发数据投毒攻击)

01:45 🌍 World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning(世界建模提升规划器性能:双重偏好优化用于具身任务规划)

02:30 🗺 Charting and Navigating Hugging Face's Model Atlas(绘制与导航Hugging Face的模型地图)

03:14 🧠 GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing(GoT:释放多模态大型语言模型的推理能力用于视觉生成与编辑)

03:48 🎨 CoRe^2: Collect, Reflect and Refine to Generate Better and Faster(CoRe^2:收集、反思与精炼以生成更快更好的图像)

04:29 🧠 Transformers without Normalization(无需归一化的Transformer)

05:06 🌐 GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding(GroundingSuite:测量复杂多粒度像素接地)

05:50 🤖 New Trends for Modern Machine Translation with Large Reasoning Models(现代机器翻译的新趋势:基于大型推理模型的研究)

06:32 📝 Shifting Long-Context LLMs Research from Input to Output(从输入到输出:长上下文大语言模型研究的转变)

07:09 🌐 VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search(视觉网页指令:通过网络搜索扩展多模态指令数据)

07:54 🧠 DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation(DiT-Air: 重新审视扩散模型架构设计在文本到图像生成中的效率)

08:35 🐱 Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark(我看起来像一只猫吗?分类图像生成基准)

09:20 🎥 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k(Open-Sora 2.0:以20万美元训练商用级视频生成模型)

10:01 🎥 Long Context Tuning for Video Generation(长上下文调优用于视频生成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递