

EP80 · Agent Automation, Token-Aware MCP Interfaces · 06-07Emergent: How Six Months of Tinkering Led To A $100M ARR Company [Video] From Y Combinator Mukun (ex-Dunzo) hit $100M ARR in 9 months on one thesis: AI progress is exponential — automate all of software engineering at once, not piecemeal. The stack: multi-agent orchestration with custom container state snapshots and local RL pipelines, rebuilt 3× in 9 months to track frontier model upgrades. Before any users, the team spent 3 months reaching #1 on a coding benchmark — technical proof-of-concept before fundraising. Now 8.5M users across 190 countries. Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google [Video] From AI Engineer Chrome DevTools built MCP tools and hit the first trap: raw perf traces (50K+ lines JSON) exhaust agent context — the 'dump zone.' Four pillars: track tokens-per-successful-outcome as fuel efficiency; embed context in errors for self-healing; write API schemas as the LLM's UI with explicit activation criteria; tier trust across dev/CI/internet. Key reframe: removing friction is UX best practice for humans, but a security hole for agents. Every AI Agent Feature Is a Cache Invalidation Surface From Hacker News - Newest: "AI Agent" OpenClacky tried two architectures first: RAG (vector lag, <97% recall not viable) and multi-agent orchestration (4-min task → 14 min, 6× cost, debugging nightmare). The lesson: every agent feature is a cache invalidation surface. Seven decisions now yield 90%+ cache hit rates: double cache markers, frozen system prompt, one meta-tool routing all skills, exactly 16 tools, and Insert-then-Compress — taking compression cache hit rate from 0% to 95%. Quick Takes More stories worth your attention · How a reasoning model cracked an 80-year-old math problem — the OpenAI Podcast Ep. 20 [Video] — OpenAI · BREAKING: Agentic Traffic Surpasses Human Traffic on the Web — SemiAnalysis(@SemiAnalysis_) · SpaceX Signs $920M/Month Cloud Deal with Google for ~110,000 NVIDIA GPUs — Wall St Engine(@wallstengine) · The ABF Substrate Crunch: Hidden Monopolies and a Second-Order Crisis — Teng Yan(@0xPrismatic) · The Next Stage of AI: World Models - A Comprehensive Analysis — Mert · AI Architect(@MertLovesAI) · Against Corrigibility — LessWrong — LessWrong · Trees to Flows and Back: Unifying Decision Trees and Diffusion Models — Hacker News More Reads Extra reads worth a look today · Max Junestrand, CEO of Legora [Video] — Y Combinator · Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI [Video] — AI Engineer · AVGO Post-Call: $30B AI Bookings, 3x Backlog, and a Belief Shift — Teng Yan(@0xPrismatic) · Dark Factory: OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc, OpenClaw [Video] — AI Engineer · Five labs, five minds: building a multi-model finance drama on small models — Hugging Face - Blog · Intel 18A Yield Issues: A Critical Analysis — Omer Cheema(@OmerCheeema) · Why Software Automation Is Hard — LessWrong — LessWrong · OpenAI Help: Lockdown Mode — Simon Willison's Weblog · Nemotron 3 Ultra Matches GPT 5.5 at 10x Lower Cost — atomic.chat(@atomic_chat_hq) Related Links · Emergent: How Six Months of Tinkering Led To A $100M ARR Company [Video]: https://www.bestblogs.dev/video/c0c555c · Building Agent Interfaces: Lessons from Chrome DevTools (MCP) for Agents — Michael Hablich, Google [Video]: https://www.bestblogs.dev/video/5579aa4 · Every AI Agent Feature Is a Cache Invalidation Surface: https://www.bestblogs.dev/article/663dd48c · How a reasoning model cracked an 80-year-old math problem — the OpenAI Podcast Ep. 20 [Video]: https://www.bestblogs.dev/video/5654ce9 · BREAKING: Agentic Traffic Surpasses Human Traffic on the Web: https://www.bestblogs.dev/status/2062580333770408231 · SpaceX Signs $920M/Month Cloud Deal with Google for ~110,000 NVIDIA GPUs: https://www.bestblogs.dev/status/2062970468077068389 · The ABF Substrate Crunch: Hidden Monopolies and a Second-Order Crisis: https://www.bestblogs.dev/status/2062336583324553654 · The Next Stage of AI: World Models - A Comprehensive Analysis: https://www.bestblogs.dev/status/2062506580881322288 · Against Corrigibility — LessWrong: https://www.bestblogs.dev/article/0978efda · Trees to Flows and Back: Unifying Decision Trees and Diffusion Models: https://www.bestblogs.dev/article/72139bc9 · Max Junestrand, CEO of Legora [Video]: https://www.bestblogs.dev/video/fc6907e · Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI [Video]: https://www.bestblogs.dev/video/5cdbbba · AVGO Post-Call: $30B AI Bookings, 3x Backlog, and a Belief Shift: https://www.bestblogs.dev/status/2062360188557123868 · Dark Factory: OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc, OpenClaw [Video]: https://www.bestblogs.dev/video/06cdbdc · Five labs, five minds: building a multi-model finance drama on small models: https://www.bestblogs.dev/article/15dcc397 · Intel 18A Yield Issues: A Critical Analysis: https://www.bestblogs.dev/status/2062448028925980819 · Why Software Automation Is Hard — LessWrong: https://www.bestblogs.dev/article/b15d12f6 · OpenAI Help: Lockdown Mode: https://www.bestblogs.dev/article/9642b109 · Nemotron 3 Ultra Matches GPT 5.5 at 10x Lower Cost: https://www.bestblogs.dev/status/2062676779362357398 About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth. BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: https://bestblogs.dev BestBlogs.dev · Discover high-quality content that truly fits you
EP79 · Text Diffusion Models, AI Agent Sandboxes · 06-06Deep Dive 1: Text Diffusion — Brendon Dillon, Google DeepMind [Video] From AI Engineer Google DeepMind researcher Brendon Dillon (AI Engineer) demystifies Text Diffusion: instead of token-by-token generation, the model refines a full block over parallel denoising passes — 256 tokens may require just 24 passes, a tenfold speed gain. Bidirectional attention enables mid-run self-correction. A precise guide to diffusion's trade-offs versus autoregressive models. Deep Dive 2: Give your AI agent its own computer From LangChain Blog LangChain makes the case that containers are insufficient for AI agents: shared kernels leave code execution exposed to kernel-level exploits. LangSmith Sandboxes are hardware-virtualized microVMs with isolated kernels, persistent state, and snapshot/fork primitives. A production-ready answer for any team running untrusted, model-generated code safely at scale. Deep Dive 3: From account executive to product manager: how one Anthropic seller rebuilt his team's workflows with Claude Code | Claude From Claude Blog Anthropic's Claude Blog profiles Jared Sires, an AE with no coding background who built CLAFTS — a Gmail drafting tool of ~4,300 Claude Code-written lines — saving 10–15 hours per week. Within months, ~80% of Anthropic's sales org adopted his plugin. A concrete playbook for GTM teams on building shared AI tools without dedicated engineering. Quick Takes More stories worth your attention · Microsoft Build Keynote: Agentic Engineering Replaced Programming — Cory House(@housecor) · Thousand Token Wood: shipping a multi-agent economy on a 3B model — Hugging Face - Blog · How to Stop Shipping Low-Quality RL Environments (with Examples) — Latent.Space · NVIDIA Releases Nemotron 3 Ultra: Most Intelligent US Open Weights Model — Artificial Analysis(@ArtificialAnlys) · Platform Teams Enabling AI - MCP/Multi-Agentic Tools Across Linkedin — InfoQ · Qwen3.7-Max Challenges Google for Third Place, AI Saves Whales, Fine-Tuning Breaks Copyright Alignment — The Batch | DeepLearning.AI · Did Claude Increase Bugs in rsync? — Hacker News More Reads Extra reads worth a look today · Google Releases Gemma 4 12B: Open Model with Advanced Reasoning — Google(@Google) · The Case for a Screen-Free Childhood | Jonathan Haidt [Video] — TED · How Conductor CEO Charlie Holtz Sets Up His Team Of AI Agents [Video] — Y Combinator · AI Engineer Melbourne 2026 Keynote Livestream | Day 2 [Video] — AI Engineer · First Commercial Non-Light-Water Reactor Goes Critical in U.S. in Over 40 Years — Director Michael Kratsios(@mkratsios47) · Progressive Disclosure in Skills: The Most Powerful Pattern for Large Agent Processes — Daniel San(@dani_avila7) Related Links · Text Diffusion — Brendon Dillon, Google DeepMind [Video]: https://www.bestblogs.dev/video/93a33f8 · Give your AI agent its own computer: https://www.bestblogs.dev/article/dc9482cb · From account executive to product manager: how one Anthropic seller rebuilt his team's workflows with Claude Code | Claude: https://www.bestblogs.dev/article/8af798c7 · Microsoft Build Keynote: Agentic Engineering Replaced Programming: https://www.bestblogs.dev/status/2061953686847557962 · Thousand Token Wood: shipping a multi-agent economy on a 3B model: https://www.bestblogs.dev/article/d15e5749 · How to Stop Shipping Low-Quality RL Environments (with Examples): https://www.bestblogs.dev/article/cdd6597f · NVIDIA Releases Nemotron 3 Ultra: Most Intelligent US Open Weights Model: https://www.bestblogs.dev/status/2062527871529439438 · Platform Teams Enabling AI - MCP/Multi-Agentic Tools Across Linkedin: https://www.bestblogs.dev/article/1ea2338d · Qwen3.7-Max Challenges Google for Third Place, AI Saves Whales, Fine-Tuning Breaks Copyright Alignment: https://www.bestblogs.dev/article/2baac995 · Did Claude Increase Bugs in rsync?: https://www.bestblogs.dev/article/4661d220 · Google Releases Gemma 4 12B: Open Model with Advanced Reasoning: https://www.bestblogs.dev/status/2062203526588088452 · The Case for a Screen-Free Childhood | Jonathan Haidt [Video]: https://www.bestblogs.dev/video/a39ce6a · How Conductor CEO Charlie Holtz Sets Up His Team Of AI Agents [Video]: https://www.bestblogs.dev/video/07f0e5c · AI Engineer Melbourne 2026 Keynote Livestream | Day 2 [Video]: https://www.bestblogs.dev/video/85648f5 · First Commercial Non-Light-Water Reactor Goes Critical in U.S. in Over 40 Years: https://www.bestblogs.dev/status/2062681078721020250 · Progressive Disclosure in Skills: The Most Powerful Pattern for Large Agent Processes: https://www.bestblogs.dev/status/2062529678590513475 About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth. BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: https://bestblogs.dev BestBlogs.dev · Discover high-quality content that truly fits you
EP78 · ChatGPT Dreaming Memory, Generative UI · 06-05Deep Dive 1: Dreaming: Better memory for a more helpful ChatGPT From OpenAI News OpenAI announces Dreaming V3, a background system synthesizing ChatGPT memory across chat history without explicit prompts. Unlike stale "saved memories," dreaming auto-revises entries over time. Approximately 5x compute savings enable Free-tier rollout. Demos show responses shifting from generic to tailored — matched to a user's camera, travel preferences, and home location. Deep Dive 2: Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman [Video] From AI Engineer Ruben Casas (Staff Engineer, Postman) outlines three AI UI tiers at AI Engineer — static components, declarative JSON/YAML/Python engines, and generative runtimes where LLMs emit HTML, CSS, and JavaScript on demand. MCP enables secure delivery via sandboxing and iframe isolation. He urges developers beyond chat panels into human-agent canvases that reshape content in real time. Deep Dive 3: How to Build an AI-Native Services Company [Video] From Y Combinator Y Combinator presents a playbook for AI-native service firms selling outcomes, not SaaS tools. Four pillars: low trust, low judgment per task, a high intelligence threshold, and regulatory moats. Variance in output quality is the biggest existential threat — clients tolerate slow turnarounds before volatile quality. AI operating leverage unlocks gross margins of 50% or greater. Quick Takes More stories worth your attention · Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs — Latent.Space · Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI — Hugging Face - Blog · Microsoft Unveils MAI-Thinking-1, Its First In-House Reasoning Model — Microsoft AI(@MicrosoftAI) · EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios — Hugging Face - Blog · [AINews] Reve 2 and Ideogram 4: Layouts in Imagegen — Latent.Space · VoidZero is joining Cloudflare — The Cloudflare Blog · Naval Podcast: The AI Industrial Revolution with Rauch, Hodak, Scholl — Naval(@naval) More Reads Extra reads worth a look today · How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent — Hugging Face - Blog · Claude Mythos Preview Achieves ~52x AI Training Speedup — Anthropic(@AnthropicAI) · Claude Shows Signs of Accelerating AI Development Toward Recursive Self-Improvement — Anthropic(@AnthropicAI) · OpenAI Rolls Out More Capable Memory System for ChatGPT — OpenAI(@OpenAI) · TL;DR: Every Agentic Engineering Hack I Know — Matt Van Horn(@mvanhorn) · Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining — Hugging Face - Blog Related Links · Dreaming: Better memory for a more helpful ChatGPT: https://www.bestblogs.dev/article/bd3109dd · Beyond Components: Designing Generative UI for MCP Apps — Ruben Casas, Postman [Video]: https://www.bestblogs.dev/video/0fcc48a · How to Build an AI-Native Services Company [Video]: https://www.bestblogs.dev/video/80421d9 · Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs: https://www.bestblogs.dev/article/ffda12ac · Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI: https://www.bestblogs.dev/article/bb6294b3 · Microsoft Unveils MAI-Thinking-1, Its First In-House Reasoning Model: https://www.bestblogs.dev/status/2062263049596067864 · EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios: https://www.bestblogs.dev/article/f4bf8cb2 · [AINews] Reve 2 and Ideogram 4: Layouts in Imagegen: https://www.bestblogs.dev/article/5da7bfa9 · VoidZero is joining Cloudflare: https://www.bestblogs.dev/article/73900a47 · Naval Podcast: The AI Industrial Revolution with Rauch, Hodak, Scholl: https://www.bestblogs.dev/status/2062632380234641611 · How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent: https://www.bestblogs.dev/article/328e8914 · Claude Mythos Preview Achieves ~52x AI Training Speedup: https://www.bestblogs.dev/status/2062568869240476050 · Claude Shows Signs of Accelerating AI Development Toward Recursive Self-Improvement: https://www.bestblogs.dev/status/2062568862479208923 · OpenAI Rolls Out More Capable Memory System for ChatGPT: https://www.bestblogs.dev/status/2062567556524003631 · TL;DR: Every Agentic Engineering Hack I Know: https://www.bestblogs.dev/status/2061978364391592110 · Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining: https://www.bestblogs.dev/article/eb39d789 About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth. BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: https://bestblogs.dev BestBlogs.dev · Discover high-quality content that truly fits you
EP77 · Microsoft Build, Alphabet $85B Raise · 06-04Deep Dive 1: ⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build From Latent.Space Satya Nadella appeared at Build 2026 on the No Priors × Latent Space crossover podcast, making three bets: Microsoft is becoming a 'Frontier Intelligence Platform'; private evals — not headcount — are the new IP asset; and the Azure network team's Miles agent distills 500+ fiber engineers' expertise. A rare unscripted take on SaaS unbundling, open multi-model harnesses, and why making the impossible possible is the real ambition. Deep Dive 2: Bringing Gemma 4 12B to your Laptop: Unlocking Local, Agentic Workflows with Google AI Edge From Google Developers Blog Google's Gemma 4 12B now runs on ordinary laptops via the AI Edge stack. The post covers three entry points: AI Edge Gallery (macOS) for local code generation and data analysis, Eloquent's new Voice Edit mode claiming a 60%+ quality jump over prior models, and the LiteRT-LM CLI's 'serve' command exposing an OpenAI-compatible local endpoint. Essential for developers building offline, privacy-preserving agentic workflows without cloud dependencies. Deep Dive 3: Alphabet Raises ~85BinOversubscribedEquityOffering,Including85BinOversubscribedEquityOffering,Including10B from Berkshire Hathaway From Sundar Pichai(@sundarpichai) Alphabet CEO Sundar Pichai announced that the company's equity offering was oversubscribed, raising approximately 45B,withanadditional45B,withanadditional40B via an 'at the market' program in Q3 — totaling ~85B.Theraiseincludesa85B.Theraiseincludesa10B investment from Berkshire Hathaway and funds a multi-year AI infrastructure buildout. The scale signals how firmly Wall Street is backing Alphabet's AI capital strategy. Quick Takes More stories worth your attention · How OpenAI Built Its Data Agent — ByteByteGo Newsletter · Building Frontier CX Agents | Interrupt 26 [Video] — LangChain · How Harmonic Rebuilt Scout on Deep Agents and 4x'd Retention with LangSmith — LangChain Blog · Anyone can build and share apps in Codex [Video] — OpenAI · I Spent May Evaluating Different Engines for OCR — Towards Data Science · Direct Preference Optimization Beyond Chatbots — Hugging Face - Blog · How to Build a Custom Agent Harness — LangChain Blog More Reads Extra reads worth a look today · Best practices for getting started with Claude Cowork | Claude — Claude Blog · What we learned mapping a year’s worth of AI-enabled cyber threats — Anthropic News · Two Misconfigurations That Caused Spark OOM Failures on Kubernetes — InfoQ · Google Launches Gemma 4 12B Model for Local Laptop Use — Sundar Pichai(@sundarpichai) · LangSmith Sandboxes: Stateful Execution Environments for AI Agents — LangChain(@LangChainAI) · The Real Value of AI: Seeing the Whole System, Not Just Speed — Garry Tan(@garrytan) Related Links · ⚡️Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build: https://www.bestblogs.dev/article/7ffd8109 · Bringing Gemma 4 12B to your Laptop: Unlocking Local, Agentic Workflows with Google AI Edge: https://www.bestblogs.dev/article/cb28b58a · Alphabet Raises ~85BinOversubscribedEquityOffering,Including85BinOversubscribedEquityOffering,Including10B from Berkshire Hathaway: https://www.bestblogs.dev/status/2062203848673161267 · How OpenAI Built Its Data Agent: https://www.bestblogs.dev/article/0e3c898d · Building Frontier CX Agents | Interrupt 26 [Video]: https://www.bestblogs.dev/video/db0be12 · How Harmonic Rebuilt Scout on Deep Agents and 4x'd Retention with LangSmith: https://www.bestblogs.dev/article/eca5ff15 · Anyone can build and share apps in Codex [Video]: https://www.bestblogs.dev/video/c489a82 · I Spent May Evaluating Different Engines for OCR: https://www.bestblogs.dev/article/aba895ac · Direct Preference Optimization Beyond Chatbots: https://www.bestblogs.dev/article/a46ae128 · How to Build a Custom Agent Harness: https://www.bestblogs.dev/article/7090bd13 · Best practices for getting started with Claude Cowork | Claude: https://www.bestblogs.dev/article/b99d7cb3 · What we learned mapping a year’s worth of AI-enabled cyber threats: https://www.bestblogs.dev/article/d632aa80 · Two Misconfigurations That Caused Spark OOM Failures on Kubernetes: https://www.bestblogs.dev/article/72544a86 · Google Launches Gemma 4 12B Model for Local Laptop Use: https://www.bestblogs.dev/status/2062257242645393889 · LangSmith Sandboxes: Stateful Execution Environments for AI Agents: https://www.bestblogs.dev/status/2062172904150761935 · The Real Value of AI: Seeing the Whole System, Not Just Speed: https://www.bestblogs.dev/status/2062165715638227141 About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth. BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: https://bestblogs.dev BestBlogs.dev · Discover high-quality content that truly fits you
EP76 · Dynamic Workflows, GitHub Copilot App · 06-03Deep Dive 1: A harness for every task: dynamic workflows in Claude Code | Claude From Claude Blog Anthropic just shipped dynamic workflows in Claude Code — Claude now writes its own JS orchestration harness on the fly, spawning subagents with isolated worktrees and model-level control. Designed to combat agentic laziness, goal drift, and self-preferential bias. Trigger with ultracode. Best for complex, high-value multi-step tasks. Deep Dive 2: GitHub Copilot app: The agent-native desktop experience From The GitHub Blog GitHub launched a desktop control center for agent-native development at Microsoft Build 2026. Each session gets its own isolated git worktree. My Work panel tracks parallel agents, issues, and PRs. Canvas surfaces make agent intent inspectable and steerable. Agent Merge handles CI and review automation. Monthly commits crossed 1.4B, up 2x YoY. Deep Dive 3: Task Fidelity Scaling Laws — Kobie Crawdord, Snorkel [Video] From AI Engineer Snorkel's empirical finding: same base model, same compute — high-fidelity training tasks yield 6% RL performance gain vs. 1% for noisy tasks, a 5x gap. Task quality requires containerization, achievability, functional correctness, and environmental reliability. Clean failure signals let models hill-climb effectively during agentic RL training. Quick Takes More stories worth your attention · Running an AI-native engineering org | Claude — Claude Blog · MiniMax M3: First Open-Weights Model with Frontier Coding, 1M Context, and Native Multimodality — MiniMax (official)(@MiniMax__AI) · NVIDIA Introduces Cosmos 3: Fully Open Omnimodel for Physical AI — NVIDIA AI(@NVIDIAAI) · GitHub's plan for Agents — Kyle Daigle, GitHub — Latent.Space · Expanding Project Glasswing — Anthropic News · Why Vector Search Alone Isn't Enough: Hybrid Retrieval for RAG — InfoQ · Holo3.1: Fast & Local Computer Use Agents — Hugging Face - Blog Related Links · A harness for every task: dynamic workflows in Claude Code | Claude: https://www.bestblogs.dev/article/d9ee6dfe · GitHub Copilot app: The agent-native desktop experience: https://www.bestblogs.dev/article/66bbe9b9 · Task Fidelity Scaling Laws — Kobie Crawdord, Snorkel [Video]: https://www.bestblogs.dev/video/4b1bf8c · Running an AI-native engineering org | Claude: https://www.bestblogs.dev/article/f781c46a · MiniMax M3: First Open-Weights Model with Frontier Coding, 1M Context, and Native Multimodality: https://www.bestblogs.dev/status/2061266317815296322 · NVIDIA Introduces Cosmos 3: Fully Open Omnimodel for Physical AI: https://www.bestblogs.dev/status/2061308434629132553 · GitHub's plan for Agents — Kyle Daigle, GitHub: https://www.bestblogs.dev/article/dff0ae3c · Expanding Project Glasswing: https://www.bestblogs.dev/article/dd299026 · Why Vector Search Alone Isn't Enough: Hybrid Retrieval for RAG: https://www.bestblogs.dev/article/1986e257 · Holo3.1: Fast & Local Computer Use Agents: https://www.bestblogs.dev/article/0aaaa20f About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth. BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: https://bestblogs.dev BestBlogs.dev · Discover high-quality content that truly fits you
EP75 · Video Agents Are Next, Voice Agent · 06-02Deep Dive 1: Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead From Latent.Space Ethan He (xAI Grok Imagine lead) argues video models get their intelligence from LLMs, not video data — so the quality ceiling tracks LLM progress. The next Sora won't be a better video model but a video agent: plan, generate, edit, critique, iterate — mirroring how coding shifted from one-shot output to agentic workflows. Grok Imagine Agent Mode is the first real proof of this thesis. Deep Dive 2: Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI [Video] From AI Engineer Bhargava (Together AI) maps production voice agent constraints: humans expect 300ms responses, >500ms kills engagement. Optimal pipeline chains streaming STT → 8B–30B LLM (200–300ms TTFT) → TTS with RTF <1.0. Infrastructure collocation alone cuts latency 30%. The Thinker-Talker pattern sends immediate filler audio while a heavier guarded model processes actual logic asynchronously — the trick that makes safety checks affordable at conversational speed. Deep Dive 3: RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem From Towards Data Science A team ran five Optuna sweeps and fine-tuned embeddings for 6 months — production accuracy never moved. The bug was in the parser. RAG is a search and engineering problem, not ML: wrong answers are individually traceable failures, not statistical noise. Chunk size is a config choice, not a hyperparameter — you need to read your documents, not run grid searches. Fix RAG by engineering the structure better, not by training the model more. Quick Takes More stories worth your attention · Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3 — NVIDIA Technical Blog · How Rippling built production AI in 6 months with Deep Agents and LangSmith — LangChain Blog · Anthropic Confidentially Files Draft S-1 for IPO — Anthropic(@AnthropicAI) · How to Build an AI Support Agent That Knows When NOT to Answer Tickets — freeCodeCamp · The Rise of AI Forward Deployed Engineers and the Future of AI Engineering Roles — Andrew Ng(@AndrewYNg) · How we reduced core unit boot time from hours to minutes — The Cloudflare Blog · Shopify Reports 15X Faster Graphql Execution with Breadth First Engine — InfoQ Related Links · Why Video Agent models are next — Ethan He, xAI Grok Imagine Lead: https://www.bestblogs.dev/article/794772a8 · Engineering voice agents: Latency, quality, and scale — Rishabh Bhargava, Together AI [Video]: https://www.bestblogs.dev/video/5dd32cf · RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem: https://www.bestblogs.dev/article/5265f8ad · Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3: https://www.bestblogs.dev/article/3209827a · How Rippling built production AI in 6 months with Deep Agents and LangSmith: https://www.bestblogs.dev/article/d0be0b5d · Anthropic Confidentially Files Draft S-1 for IPO: https://www.bestblogs.dev/status/2061478052257841495 · How to Build an AI Support Agent That Knows When NOT to Answer Tickets: https://www.bestblogs.dev/article/f3be1fc9 · The Rise of AI Forward Deployed Engineers and the Future of AI Engineering Roles: https://www.bestblogs.dev/status/2061477558693384395 · How we reduced core unit boot time from hours to minutes: https://www.bestblogs.dev/article/60953010 · Shopify Reports 15X Faster Graphql Execution with Breadth First Engine: https://www.bestblogs.dev/article/901fdc83 About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth. BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: https://bestblogs.dev BestBlogs.dev · Discover high-quality content that truly fits you
EP74 · AI's Deployment Dawn, Agents Over Pipelines · 06-01Deep Dive 1: A rational conversation on where AI is actually going | Benedict Evans [Video] From Lenny's Podcast Former a16z analyst Benedict Evans maps AI to its 1997 internet moment: deployment is still early, and labs now hire McKinsey-style services teams because enterprises cannot self-restructure. Jevons paradox — spreadsheets grew accountants, not eliminated them — refutes job apocalypse fears. Core thesis: foundation models will commoditize like telcos; real value concentrates in distribution and application layers above the model, not in the labs themselves. Contrarian, historically grounded. Deep Dive 2: How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS [Video] From AI Engineer WorkOS engineer Nick Nisi, 8 months no-keyboard, found a striking inverse: 10k-line skill files bloated eval cycles to 68 min at 77% accuracy. After deleting 95% — keeping only a 553-line gotcha-focused file — cycles dropped to 6 min at 97%. SHA-256 hashes on test logs prevent agents from faking pass status. Three rules: enforce with code gates not prompts, guide around pitfalls not prescribe, measure real pass rates not claimed ones. Deep Dive 3: Build agents, not pipelines From Sean Goedecke Sean Goedecke draws a clean map of the LLM architecture fork: pipeline (code controls flow) vs agent (LLM controls its own flow). Agents win on flexibility and context-gathering — they retrieve what they need dynamically, sidestepping the unsolved RAG retrieval problem. Pipelines win on predictability and cost control. Practical heuristic: if the task is hard enough to require a reasoning model, the added flexibility of an agent is worth the unpredictability. Quick Takes More stories worth your attention · How's it going? Reinforcement learning in language models recruits a functional welfare axis — LessWrong — LessWrong · How I Bootstrapped a SaaS to $10M ARR With Zero Funding (15 Q&A) | Chatbase, Yasser Elsaid [Video] — EO · The solution might be cancelling my AI subscription — Simon Willison's Weblog · The 7-Year Horizon Moat: Why Patience is Your Competitive Advantage — Garry Tan(@garrytan) · Safer Than YOLO: Auto Mode for Exec Approvals — OpenClaw Blog · DuckDB Quack: Client/Server Protocol over HTTP for Multi-User Analytics — InfoQ · OpenAI's Harness System: PMs Ship 100k+ Lines of Code Without Engineers Typing Production Code — Aakash Gupta(@aakashg0) More Reads Extra reads worth a look today · Platform Openness and the Risk of AI Sharecropping — Garry Tan(@garrytan) · OpenAI Robotics Rapid Progress and Hiring Push — Greg Brockman(@gdb) · Marc Andreessen Endorses CEO Coding Agent Trend — Marc Andreessen 🇺🇸(@pmarca) · GPT Realtime 2 Unlocks Voice-Controlled Computer Interaction — Greg Brockman(@gdb) · OpenAI Robotics is Hiring, Focused on Physical World AI — Sam Altman(@sama) Related Links · A rational conversation on where AI is actually going | Benedict Evans [Video]: https://www.bestblogs.dev/video/ed8426c · How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS [Video]: https://www.bestblogs.dev/video/f95e394 · Build agents, not pipelines: https://www.bestblogs.dev/article/572b4e71 · How's it going? Reinforcement learning in language models recruits a functional welfare axis — LessWrong: https://www.bestblogs.dev/article/cc07b331 · How I Bootstrapped a SaaS to $10M ARR With Zero Funding (15 Q&A) | Chatbase, Yasser Elsaid [Video]: https://www.bestblogs.dev/video/e8221bf · The solution might be cancelling my AI subscription: https://www.bestblogs.dev/article/9d6b3025 · The 7-Year Horizon Moat: Why Patience is Your Competitive Advantage: https://www.bestblogs.dev/status/2061080196229525808 · Safer Than YOLO: Auto Mode for Exec Approvals: https://www.bestblogs.dev/article/98816042 · DuckDB Quack: Client/Server Protocol over HTTP for Multi-User Analytics: https://www.bestblogs.dev/article/66368033 · OpenAI's Harness System: PMs Ship 100k+ Lines of Code Without Engineers Typing Production Code: https://www.bestblogs.dev/status/2061176400611320290 · Platform Openness and the Risk of AI Sharecropping: https://www.bestblogs.dev/status/2061176075288453333 · OpenAI Robotics Rapid Progress and Hiring Push: https://www.bestblogs.dev/status/2061145994121871656 · Marc Andreessen Endorses CEO Coding Agent Trend: https://www.bestblogs.dev/status/2061138031621616077 · GPT Realtime 2 Unlocks Voice-Controlled Computer Interaction: https://www.bestblogs.dev/status/2060955146952077653 · OpenAI Robotics is Hiring, Focused on Physical World AI: https://www.bestblogs.dev/status/2061117302528188712 About BestBlogs BestBlogs.dev is an AI-powered personal reading assistant. It curates high-quality content from RSS, newsletters, Twitter, YouTube, podcasts and more, organizing a daily reading flow tailored to each reader — across technology, AI, product, business, research, design, investing, culture and personal growth. BestBlogs Pro early-bird beta is open: follow the sources you care about, set interest tags, and get your own personalized brief every day. Try it and share your feedback: https://bestblogs.dev BestBlogs.dev · Discover high-quality content that truly fits you