GLM-5.2: The Frontier Coding Model You Can Actually Download
A deep, builder-focused breakdown of Z.ai's GLM-5.2: a roughly 744B mixture-of-experts model with a 1M-token context, released open-weights under MIT. What it is, what the benchmarks say (and the asterisk nobody mentions), what it costs on an API versus your own hardware, and when to reach for the open model nobody can ban.
Read the pieceWhy LLMs Hallucinate (and What Actually Reduces It)
A technical, no-hand-waving explanation of why large language models make things up: how next-token prediction works, why confidence is not correctness, and the techniques that genuinely reduce hallucination in production.
Why Your AI App Feels Slow (and the Latency Budget That Fixes It)
A deeply technical walkthrough of AI app latency: the real cost of TLS handshakes, cross-region round trips, cold starts, vector search, and LLM time-to-first-token, plus the budget framework that makes a product feel instant.
More writing
Learn Anything Faster: The Science of Rapid Skill Acquisition
A practical playbook for rapid skill acquisition: deconstruct any skill, drill the high-leverage 20 percent, and reach useful proficiency in weeks.
Loop Engineering in 2026, Part 1: What an Agent Loop Is and How to Build One
The practical version of loop engineering: what an agent loop actually is, the six parts every loop is assembled from, how Claude Code's /goal works, a PR-babysitter you can build today, why iterations not tokens are the real cost, and when not to loop at all.
The EU AI Act Is a Startup Catalyst, Not a Startup Killer
The EU AI Act entered force August 2024 with obligations phasing through 2027. Here's why the compliance burden is also the biggest go-to-market opening in European AI.
Human Capital, Token Capital, and the Climbing Machine: Why the Next Moat Is Owning Your Learning Loop
Frontier AI models are becoming a commodity. The durable advantage is owning the learning loop that turns your workflows and judgment into AI that compounds over time.
Running Claude Agents in Parallel: A Practical Guide to Doing More Than One Thing at Once Without Creating Chaos
A detailed, practical guide to parallel agent work in Claude Code: when to use Agent View, Agent Teams, and Dynamic Workflows, how worktrees and /batch keep parallel edits from colliding, how to monitor and control running agents, and the operating rules that keep it all sane.
Prompting Is the Interface, Not the Job: How to Become a Full-Stack AI Engineer
Prompt engineering is not dead, but prompt-only thinking is. The real craft is the system around the prompt: context, retrieval, tools, workflows, evals, guardrails, logging, and improvement loops. Here is the full stack and the order to build it in.
Midjourney Medical: A 60-Second Body Scan, a Spa, and the Real Junction in Healthcare
Midjourney is building a 500,000-transducer full-body ultrasound scanner and putting it in a spa. Past the spectacle, it sits on the real inflection in healthcare: the shift from scarce, reactive imaging to cheap, continuous body data, and the hard problems that decide whether that helps.
Reinforcement Fine-Tuning in 2026: Train a Small Model to Beat a Giant One (GRPO, RULER, ART)
A technical guide to reinforcement fine-tuning in 2026: why a fine-tuned small open model beats a giant one, how GRPO and RULER let agents learn from experience with no reward functions or labels, and the open-source stack (ART, Unsloth, Tinker) to do it.