Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 14, 2026
You have an agent. You don't have AI.
80% of enterprise apps shipped or updated in early 2026 embed at least one AI agent — up from 33% in 2024. That sounds like everyone has 'done AI.' But embedding an agent and getting value from it are different things: the median agent takes 5.1 months to pay back, and most deployments are still stuck in pilot, never scaled. Having an agent is now table stakes, like having a website. The gap that actually separates companies is whether the agent reached production, earned its keep, and got trusted to run. Here's the difference that matters.
- business
- agents
June 14, 2026
Now your AI content has to say it's AI
On June 10, 2026, the European Commission published its Code of Practice on marking and labelling AI-generated content — the practical playbook for transparency rules that become enforceable under the EU AI Act on August 2. Deepfakes and AI-written text on matters of public interest must be clearly labelled, and people must be told when they're talking to a chatbot. The Code is voluntary; the obligation behind it isn't. Disclosure is becoming the default, and that's not just a compliance chore — it's a trust decision. Here's what it means for anyone shipping AI content.
- business
- security
June 13, 2026
A green checkmark can hide a broken middle
Here's the failure mode that eats AI agents in production: an agent runs a multi-step task, makes a wrong turn somewhere in the middle, and still produces a final answer that passes your check. The output looks clean. The reasoning was broken. Researchers found this is exactly how multi-step agents fail — a step-three mistake propagates invisibly into a step-ten summary that reads fine and is wrong. If you only grade the final answer, you're blind to most of how agents actually break. Here's why, and what to check instead.
- methodology
- agents
June 13, 2026
Agents are becoming a feature, not a product
Gartner expects 40% of enterprise applications to embed task-specific AI agents by the end of 2026, up from under 5% a year ago. Agentic AI is the fastest-growing enterprise priority, up 31.5% year over year. Read together, those numbers say something uncomfortable for a lot of startups: the agent is turning into a feature inside the software people already use, not a standalone product they switch to. If 'we built an agent that does X' is your whole pitch, the app that owns X is about to build it too. Here's what that means for what you build.
- business
- ai-native
June 13, 2026
Reach for the small model first
The reflex is to send every task to the biggest, smartest model. The numbers say that's usually the wrong default. A 7-billion-parameter small model runs 10–30× cheaper than a 70–175B one, Microsoft's Phi matches GPT-3.5-class quality at 98% less compute, and over two billion phones already run capable models locally with no cloud at all. Gartner expects task-specific small models to be used three times more than general LLMs by 2027. Here's why 'small first' is becoming the smart default — and when to still reach for the big one.
- ai-native
- business
June 13, 2026
The AI Act's real deadline is August
On August 2, 2026, the EU AI Act's obligations for high-risk AI systems take effect — the part with real teeth: documentation, oversight, risk management, and fines up to €35M or 7% of global turnover. Two things make this messy. As of March, only 8 of 27 member states had even set up their enforcement contact point. And nobody has a clean answer for who's liable when an autonomous agent acts on its own. If your software touches EU users, here's what's actually changing and the gap you need to close.
- business
- security