Deep Dive 1: Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead
From Latent.Space Ethan He (xAI Grok Imagine lead) argues video models get their intelligence from LLMs, not video data — so the quality ceiling tracks LLM progress. The next Sora won't be a better video model but a video agent: plan, generate, edit, critique, iterate — mirroring how coding shifted from one-shot output to agentic workflows. Grok Imagine Agent Mode is the first real proof of this thesis.
Deep Dive 2: Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI [Video]
From AI Engineer Bhargava (Together AI) maps production voice agent constraints: humans expect 300ms responses, >500ms kills engagement. Optimal pipeline chains streaming STT → 8B–30B LLM (200–300ms TTFT) → TTS with RTF <1.0. Infrastructure collocation alone cuts latency 30%. The Thinker-Talker pattern sends immediate filler audio while a heavier guarded model processes actual logic asynchronously — the trick that makes safety checks affordable at conversational speed.
Deep Dive 3: RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem
From Towards Data Science A team ran five Optuna sweeps and fine-tuned embeddings for 6 months — production accuracy never moved. The bug was in the parser. RAG is a search and engineering problem, not ML: wrong answers are individually traceable failures, not statistical noise. Chunk size is a config choice, not a hyperparameter — you need to read your documents, not run grid searches. Fix RAG by engineering the structure better, not by training the model more.
Quick Takes
More stories worth your attention
· Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3 — NVIDIA Technical Blog
· How Rippling built production AI in 6 months with Deep Agents and LangSmith — LangChain Blog
· Anthropic Confidentially Files Draft S-1 for IPO — Anthropic(@AnthropicAI)
· How to Build an AI Support Agent That Knows When NOT to Answer Tickets — freeCodeCamp
· The Rise of AI Forward Deployed Engineers and the Future of AI Engineering Roles — Andrew Ng(@AndrewYNg)
· How we reduced core unit boot time from hours to minutes — The Cloudflare Blog
· Shopify Reports 15X Faster Graphql Execution with Breadth First Engine — InfoQ
Related Links
· Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead: www.bestblogs.dev
· Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI [Video]: www.bestblogs.dev
· RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem: www.bestblogs.dev
· Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3: www.bestblogs.dev
· How Rippling built production AI in 6 months with Deep Agents and LangSmith: www.bestblogs.dev
· Anthropic Confidentially Files Draft S-1 for IPO: www.bestblogs.dev
· How to Build an AI Support Agent That Knows When NOT to Answer Tickets: www.bestblogs.dev
· The Rise of AI Forward Deployed Engineers and the Future of AI Engineering Roles: www.bestblogs.dev
· How we reduced core unit boot time from hours to minutes: www.bestblogs.dev
· Shopify Reports 15X Faster Graphql Execution with Breadth First Engine: www.bestblogs.dev
About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth.
BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: bestblogs.dev
BestBlogs.dev · Discover high-quality content that truly fits you

