Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 14, 2026
The fast model just got smart
For two years you made a trade every time you picked a model: fast and cheap, or smart and slow. Gemini 3.5 Flash just broke it. The 'Flash' tier — the cheap, quick one — now scores 55 on the Artificial Analysis Intelligence Index, ahead of Grok 4.3 and Claude Sonnet 4.6, while running over 280 tokens a second. The fast model is no longer the dumb model. That should make you re-open a decision most teams quietly froze a year ago: which model is your default, and is it still the right one? Here's how to think about it — including the catch.
- ai-native
- business
June 13, 2026
Agents are becoming a feature, not a product
Gartner expects 40% of enterprise applications to embed task-specific AI agents by the end of 2026, up from under 5% a year ago. Agentic AI is the fastest-growing enterprise priority, up 31.5% year over year. Read together, those numbers say something uncomfortable for a lot of startups: the agent is turning into a feature inside the software people already use, not a standalone product they switch to. If 'we built an agent that does X' is your whole pitch, the app that owns X is about to build it too. Here's what that means for what you build.
- business
- ai-native
June 13, 2026
Reach for the small model first
The reflex is to send every task to the biggest, smartest model. The numbers say that's usually the wrong default. A 7-billion-parameter small model runs 10–30× cheaper than a 70–175B one, Microsoft's Phi matches GPT-3.5-class quality at 98% less compute, and over two billion phones already run capable models locally with no cloud at all. Gartner expects task-specific small models to be used three times more than general LLMs by 2027. Here's why 'small first' is becoming the smart default — and when to still reach for the big one.
- ai-native
- business
June 13, 2026
Write it down for the machine
There's now a plain text file every serious coding agent reads before it touches your repo: AGENTS.md. As of early 2026 it's read natively by Claude Code, OpenAI's Codex CLI, Cursor, Aider, Devin, GitHub Copilot, Gemini CLI, Windsurf, and Amazon Q — the closest thing the industry has to a universal instruction format for agents. It's the highest-leverage hour you can spend on AI coding right now, and almost nobody does it. Here's what goes in it and why it works.
- methodology
- ai-native
June 12, 2026
ChatGPT is no longer the default
A year ago, 'AI' basically meant ChatGPT — it had about three-quarters of all chatbot traffic and the model layer was a near-monopoly. As of June 2026 it's down to 54.7%, Gemini has surged to 27.4% (up about 104% in six months), and Claude, Grok and a long tail split the rest. The monoculture is over, and that changes how you should pick — and how you should build. The 'best AI' is now a per-task question, and betting your product on a single provider just got riskier.
- ai-native
- business
June 11, 2026
Agents that remember
The big agent unlock of 2026 isn't a smarter model — it's memory. Google's ReasoningBank lets an agent learn from its own successes and failures, store the reasoning, and get measurably better over time. That's the leap from a tool that resets every morning to a colleague who compounds. But memory has a second edge: it turns every mistake into a persistent one. A wrong fact, a poisoned instruction, or a belief that quietly went stale now survives across sessions and acts on you later. Memory isn't a feature you switch on. It's a corpus you have to govern.
- ai-native
- agents