EP75 · Video Agents Are Next, Voice Agent · 06-02BestBlogs Podcast

EP75 · Video Agents Are Next, Voice Agent · 06-02

12分钟 ·
播放数1
·
评论数0

Deep Dive 1: Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead

From Latent.Space Ethan He (xAI Grok Imagine lead) argues video models get their intelligence from LLMs, not video data — so the quality ceiling tracks LLM progress. The next Sora won't be a better video model but a video agent: plan, generate, edit, critique, iterate — mirroring how coding shifted from one-shot output to agentic workflows. Grok Imagine Agent Mode is the first real proof of this thesis.

Deep Dive 2: Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI [Video]

From AI Engineer Bhargava (Together AI) maps production voice agent constraints: humans expect 300ms responses, >500ms kills engagement. Optimal pipeline chains streaming STT → 8B–30B LLM (200–300ms TTFT) → TTS with RTF <1.0. Infrastructure collocation alone cuts latency 30%. The Thinker-Talker pattern sends immediate filler audio while a heavier guarded model processes actual logic asynchronously — the trick that makes safety checks affordable at conversational speed.

Deep Dive 3: RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

From Towards Data Science A team ran five Optuna sweeps and fine-tuned embeddings for 6 months — production accuracy never moved. The bug was in the parser. RAG is a search and engineering problem, not ML: wrong answers are individually traceable failures, not statistical noise. Chunk size is a config choice, not a hyperparameter — you need to read your documents, not run grid searches. Fix RAG by engineering the structure better, not by training the model more.

Quick Takes

More stories worth your attention

· Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3 — NVIDIA Technical Blog

· How Rippling built production AI in 6 months with Deep Agents and LangSmith — LangChain Blog

· Anthropic Confidentially Files Draft S-1 for IPO — Anthropic(@AnthropicAI)

· How to Build an AI Support Agent That Knows When NOT to Answer Tickets — freeCodeCamp

· The Rise of AI Forward Deployed Engineers and the Future of AI Engineering Roles — Andrew Ng(@AndrewYNg)

· How we reduced core unit boot time from hours to minutes — The Cloudflare Blog

· Shopify Reports 15X Faster Graphql Execution with Breadth First Engine — InfoQ

Related Links

· Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead: www.bestblogs.dev

· Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI [Video]: www.bestblogs.dev

· RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem: www.bestblogs.dev

· Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3: www.bestblogs.dev

· How Rippling built production AI in 6 months with Deep Agents and LangSmith: www.bestblogs.dev

· Anthropic Confidentially Files Draft S-1 for IPO: www.bestblogs.dev

· How to Build an AI Support Agent That Knows When NOT to Answer Tickets: www.bestblogs.dev

· The Rise of AI Forward Deployed Engineers and the Future of AI Engineering Roles: www.bestblogs.dev

· How we reduced core unit boot time from hours to minutes: www.bestblogs.dev

· Shopify Reports 15X Faster Graphql Execution with Breadth First Engine: www.bestblogs.dev

About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth.

BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: bestblogs.dev

BestBlogs.dev · Discover high-quality content that truly fits you