GLM-5.2: The Frontier Coding Model You Can Actually Download
A deep, builder-focused breakdown of Z.ai's GLM-5.2: a roughly 744B mixture-of-experts model with a 1M-token context, released open-weights under MIT. What it is, what the benchmarks say (and the asterisk nobody mentions), what it costs on an API versus your own hardware, and when to reach for the open model nobody can ban.
Read the pieceWhy LLMs Hallucinate (and What Actually Reduces It)
A technical, no-hand-waving explanation of why large language models make things up: how next-token prediction works, why confidence is not correctness, and the techniques that genuinely reduce hallucination in production.
Why Your AI App Feels Slow (and the Latency Budget That Fixes It)
A deeply technical walkthrough of AI app latency: the real cost of TLS handshakes, cross-region round trips, cold starts, vector search, and LLM time-to-first-token, plus the budget framework that makes a product feel instant.
More writing
Running Claude Agents in Parallel: A Practical Guide to Doing More Than One Thing at Once Without Creating Chaos
A detailed, practical guide to parallel agent work in Claude Code: when to use Agent View, Agent Teams, and Dynamic Workflows, how worktrees and /batch keep parallel edits from colliding, how to monitor and control running agents, and the operating rules that keep it all sane.
Prompting Is the Interface, Not the Job: How to Become a Full-Stack AI Engineer
Prompt engineering is not dead, but prompt-only thinking is. The real craft is the system around the prompt: context, retrieval, tools, workflows, evals, guardrails, logging, and improvement loops. Here is the full stack and the order to build it in.
Midjourney Medical: A 60-Second Body Scan, a Spa, and the Real Junction in Healthcare
Midjourney is building a 500,000-transducer full-body ultrasound scanner and putting it in a spa. Past the spectacle, it sits on the real inflection in healthcare: the shift from scarce, reactive imaging to cheap, continuous body data, and the hard problems that decide whether that helps.
Reinforcement Fine-Tuning in 2026: Train a Small Model to Beat a Giant One (GRPO, RULER, ART)
A technical guide to reinforcement fine-tuning in 2026: why a fine-tuned small open model beats a giant one, how GRPO and RULER let agents learn from experience with no reward functions or labels, and the open-source stack (ART, Unsloth, Tinker) to do it.
Your Moat Is Fifteen Years
Paul Graham's essays say the best startup ideas come from living in the future. In 2026, deep domain expertise is the only founder moat that AI can't compress.
The Claude Features Almost Nobody Turns On, Part 2: Let It Act, Automate, and Scale
Part 2 of the guide to getting more from Claude: the tools that leave the chat window. Reading your files, acting in your browser, running on a schedule, installable skills, CLAUDE.md, Claude Code, visual work, and prompt caching for up to 90% cheaper API calls.
Stop Building ChatGPT Wrappers. Build Vertical AI.
Elad Gil's test for real AI startups vs thin wrappers is the most useful filter in venture right now. Here's the operator playbook for boring-industry vertical AI.
The Claude Features Almost Nobody Turns On, Part 1: Memory, Real Builds, Better Thinking, and Honest Pushback
Part 1 of a practical guide to getting far more out of Claude: how to give it memory, make it build working things instead of describing them, make it reason on the decisions that matter, and turn it from a flatterer into something that tells you the truth.