Right now I'm leading AI agent architecture for a US-based biotech client, building multi-agent systems that automate scientific research workflows. The day-to-day: writing specifications, reviewing agent output, designing evaluations, custom MCP servers.
I direct coding agents (Claude Code on Opus) as the implementation layer; I own architecture, methodology, and quality. Twenty years of production engineering judgment going into specification design.
Building
- Multi-agent orchestration patterns for research workflows — planner with typed sub-task decomposition, executor agents handling tool calls via MCP.
- An evaluation framework — public agent benchmarks plus a held-out scenario suite the agents never see during development.
- This site itself, in the open. Source on GitHub. Built directing coding agents — every line of code reviewed by me, very few lines typed by me.
Reading
- Anthropic's recent papers on agentic capabilities and evaluations.
- Designing Data-Intensive Applications (Kleppmann) — re-reading the consistency / consensus chapters with multi-agent state in mind.
Thinking about
- When specification-driven development scales beyond one agent — the coordination cost between specs, the deduplication of guardrails, spec inheritance patterns.
- Whether the right unit of test for an agent is the scenario suite or the property test, or both at different layers.
Available for
- Conversations with founders or CTOs putting AI into production who want a second pair of eyes on architecture.
- Long-form consulting engagements where the work is to design what should be built, not to type lines into a file.
- Senior AI architecture roles at companies that take engineering quality seriously and put "ship it" as the second sentence, not the first.