Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 5, 2026
The agent that "closes sales" — the part the demo hides
Meta just shipped an agent that doesn't only chat — it books appointments, qualifies leads, closes sales, and takes payments, 24/7, in any language, wired into Shopify and Zendesk. A million businesses are already on it. The demo is magic. What it hides: an autonomous thing acting on your business, at machine speed, on messages from strangers — and the law just closed the 'the AI did it' escape hatch. Here's the honest version.
- security
- business
- agents
June 5, 2026
The AI just started profiling you in the background
Until this week, ChatGPT only remembered what you told it to. As of June 4 it 'dreams' — a background process reads across all your past chats and quietly builds a model of you, keeping it current on its own. That's a genuinely useful upgrade and the moment a chatbot became a profiler. The EU's data regulator said exactly that, today. Here's what actually changed, in plain terms — and why it's the grounding problem aimed at you.
- ai-native
- security
June 5, 2026
Four flagships in four weeks — "which model wins" is a design smell
This month a wave of flagship models is landing almost on top of each other — Gemini 3.5 Pro, a new Claude, Grok 5, with Opus 4.8 already out. Everyone's refreshing leaderboards. If that wave makes you anxious — are we on the best one, should we switch — the anxiety is telling you something about your architecture, not the models. Here's the honest read, and what 'stay swappable' actually takes.
- ai-native
- architecture
June 5, 2026
"Which part do we agentize first?" is the wrong first question
The whole market has moved from 'are agents real?' to 'which part of my company gets agentized first?' — IT support, sales, reconciliations. It feels like the smart strategic question. It's the wrong one. Asking where to point the agent skips the two questions that actually decide whether any of it works: what does the agent stand on, and who answers when it's wrong. Here's the order that matters.
- methodology
- business
- agents
June 4, 2026
A token paywall is not SaaS
Founders are pricing AI products with SaaS instincts — flat monthly, per-seat — and quietly bleeding, because the thing that made SaaS magical is gone. Near-zero marginal cost is dead: every user burns tokens, forever, and cost scales with use. GitHub Copilot lost up to $80 a month per heavy user at a flat $10. AI products aren't software with great margins; they're closer to a utility with real cost of goods. Price like it.
- business
- ai-native
June 4, 2026
87% on the benchmark, and it still can't evolve your codebase
The headline says AI 'solves 87% of SWE-bench,' and everyone reads it as 'AI can do software engineering now.' Two problems. The small one: a third of those passes leaked the answer or had weak tests. The fatal one: the benchmark measures one isolated bug fix, not the actual job — evolving a living codebase over weeks. Measure that, and the same models fall from ~73% to ~25%. The benchmark is the demo. Your codebase is production.
- eval
- agents
- methodology