JPMarket — AI-Powered Proxy Marketplace (24-Hour Build)
A production-shaped AI-first cross-border shopping platform, built end-to-end in a single 24-hour sprint by orchestrating Claude Code. Two ReAct agents over Rakuten Ichiba + multi-LLM router with provider fallback. Proof point for "AI Architect as velocity multiplier" — strict typing, 97% coverage, ADRs, runbooks, deployed and serving traffic.
- Rol
- AI Architect · Claude Code orchestration
- Stack
- Python 3.13 · FastAPI · Next.js 16 · React 19 · TypeScript · Postgres · Redis · OpenRouter · Rakuten · Railway · Vercel
- Período
- 2026 — single 24h sprint

Este caso de estudio aún no está traducido a tu idioma — mostrando el original en inglés.

The brief
US collectors of Japanese anime merchandise, trading cards, figures, and manga want the stuff that never makes it out of Japan. They get it today through proxy-buying services (Buyee, ZenMarket, FromJapan) which are functional but punishing — Japanese-only catalogues that don't translate properly, opaque fees that only crystallise at checkout, no curation, no recommendations, and a support flow that ends at an FAQ page. The buyer's job becomes: read katakana, eyeball weight, guess shipping, hope duty doesn't sting.
The idea
A niche-first proxy marketplace where every price is the all-in USD landed cost, and an AI Personal Shopper finds drops you'd never discover yourself. The catalogue is a live filter on top of Rakuten Ichiba constrained to five collector niches (anime figures, TCG, manga, plush, hobby goods). Translations, USD pricing, weight-derived shipping quotes, US Section-321 duty logic, and state sales tax all happen on our side — the user just sees a card.
Two ReAct agents share one chat surface ("AI Sensei"):
- Concierge — reactive support: order status, FAQ, niche-term explanations, landed-cost calculations.
- Personal Shopper — proactive sales: pulls collector interests + purchase history, runs a multi-step catalogue search, prices the top picks against the user's budget, returns a curated grid.


What "24 hours with Claude Code" actually produced
This is not "vibe coding". It's a codebase with strict typing on both sides, a coverage gate that fails CI under threshold, a no-legacy-tokens invariant enforced by test, ADRs that justify the architectural seams, runbooks that document operational edges, and a deployment that actually serves traffic. The acceleration came from using Claude Code the way an architect uses senior engineers — delegating implementation in concrete tasks, reviewing diffs, calling out the architectural invariant when output drifted, and keeping a written context (CLAUDE.md + persistent memory) so the agent understood the project's principles before touching a line.
Architectural decisions
Clean / hexagonal core, DI for everything
The domain layer (app/domain/*) speaks only in Product, Order, Cart, User — zero knowledge of Rakuten, OpenRouter, or SQL. Every external boundary is a Protocol (RakutenGatewayProtocol, FxRateProtocol, ProductTranslationRepositoryProtocol, …) wired by a single dependency-injector container.
Why: tests substitute fakes that implement the same Protocol; mypy-strict catches fake/real drift at compile time; swapping vendors is a one-line change in the container. The pattern compresses to its essence under time pressure — there is no version of the right answer that skips it.
Multi-LLM router with provider-fallback chain
A single LLMRouter fronts OpenRouter (primary) plus four vendor SDKs (Gemini, DeepSeek, Anthropic, Groq). Per-task model selection goes through a ModelSelectionPolicy; a provider outage retries the same task on the next provider before bubbling.
Why: vendor churn in foundation-model pricing/availability is a fact of life. Application code stays vendor-agnostic; switching primary providers is a config change. ADR-0004 documents the choice.
Background catalog warmer + bounded translation concurrency
A lifespan task on the API process walks every niche category every 15 minutes, hitting Rakuten through the gateway and translating each product (title + caption) into English. Translations land in a Postgres product_translations table; the user-facing search reads cache-only and never blocks on an LLM round-trip. A semaphore (asyncio.Semaphore(4)) caps in-flight LLM calls so the provider isn't throttled into 30% empty responses.
Why: synchronous translation on the hot path was producing 30-second category searches and getting empty responses from OpenRouter under burst load. The warmer trades cold-start latency for steady-state instant grids.
Outbox + Redis Streams for cross-boundary events
Order-lifecycle events (order.placed, order.status_changed) go through a transactional outbox written in the same DB transaction as the order itself; a relayer drains the outbox into a Redis Stream; handlers (email today, webhooks later) consume from a consumer group with XPENDING redelivery + an idempotency guard.
Why: classic solution to "I wrote the row, then died before sending the email." Domain events fan out exactly-once-enough without dual-write hazards.
ReAct agent runtime with SSE streaming
Each agent runs a bounded ReAct loop (max 8 steps, 25s/step) over a JSON-schema-validated tool registry. Steps stream to the browser as Server-Sent Events. A single env flag (AGENT_DEBUG_STREAM) toggles between the user-facing stream (comment_to_user, proposal, final_answer only) and the developer trace (full situation / thoughts / tool-call / raw-result for debugging).
Why: users want a friendly assistant, not a tool-dump; developers want to see why the agent picked what it picked. One runtime, two views.
Runtime theming with three-layer tokens
[data-theme="night"] / [data-theme="day"] carry raw primitives; Tailwind's @theme inline semantic layer (bg-canvas, text-content, border-border, …) emits var(--color-*) so a single attribute swap on <html> re-skins the live page with no JS reload. A CI test (tests/theming/no-legacy-tokens.test.ts) fails the build on any reappearance of the deprecated palette.
Why: a "theme" should be a complete design system, not a recolour — and the no-old-tokens invariant has to be permanent, not a one-time cleanup. ADR-0008.
Observability surfaced in the product
The footer carries a live catalog warmer status indicator (Catalog: N items loaded · refreshed Xm ago · in Y) and the current JPY→USD rate with fetched-at UTC timestamp. Every product card shows Data as of YYYY-MM-DD HH:MM UTC.
Why: operators see freshness at a glance; users get explicit honesty about live-data staleness instead of "trust us, it's fresh".



Engineering decisions
Backend — async everywhere, no synchronous hot paths
Python 3.13 + FastAPI + SQLAlchemy 2 async + asyncpg, uv for deterministic dep management. An
AUTOCOMMIT sessionmaker for read-dominated repos drops a measured ~3× round-trip cost on the
landing-page DB reads. A per-App-ID Redis token bucket paces every replica at Rakuten's documented
1 req/sec ceiling without client-side coordination. A circuit breaker wrapping the Rakuten gateway
returns 503 in microseconds when the vendor IP allowlist trips, instead of 30-second timeouts that
pile up worker threads.
Cache hygiene — don't-cache-fallback and augment-don't-overwrite
Two specific lessons from the build:
- Don't-cache-fallback for empty LLM responses. Empty bodies return
Nonefrom the translation field resolver so the next warm pass retries; the cache converges to ~100% English titles across 2-3 cycles instead of pinning a quarter of the catalogue to its Japanese source forever. - Augment-don't-overwrite for cached translations. Repo lookup falls back to the product's pre-cached
name_translatedrather than blanking it on a cache miss. Earlier bug: seeded demo products with English names rendered as Japanese after thecache_onlyflag was introduced.
Why: AI fallbacks compound. The wrong default at the cache layer turns into a permanent UX regression three layers up.
Frontend — Next.js 16 App Router, mostly RSC, minimal client islands
Server components fetch curated collections + the warmer-status indicator with next: { revalidate: 30 }; client islands only where they buy interactivity (ThemeSwitcher, ChatPanel). Tailwind 4 CSS-first config with @theme inline — token changes ship as one CSS file, no rebuild of every component. TypeScript strict + noUncheckedIndexedAccess (array access returns T | undefined). Auth.js v5 + RS256-signed backend JWT; access TTL aligned with the Auth.js cookie so users don't get silently 401'd while the browser still believes it's logged in.
Operations — pragmatic for the scale that actually exists
Railway for the backend + managed Postgres + Redis; Vercel for the frontend. The catalogue warmer
is an in-process asyncio task in the API lifespan, not a separate worker process — pragmatic
for the current scale, easy to extract later if traffic justifies. Static outbound IP on Railway
satisfies Rakuten's per-App-ID allowlist. Dockerfile.prod with repo-root build context + alembic upgrade head baked into the start command — migrations apply on every deploy, idempotent. CI
gates: ruff + mypy-strict + pytest + coverage threshold (97.78% backend, 93% frontend), thresholds
ratchet forward, never backward.
The point
The "AI Engineer / Architect" skill on display here is not "I can prompt Claude to write code." It is:
- Decomposing a product brief into ports and adapters, identifying the right Protocol seams before any code exists.
- Picking patterns under pressure — outbox vs. dual-write, cache-only search vs. live translation, in-process warmer vs. separate worker — and being able to justify them in writing the same day.
- Designing for AI in production — bounded concurrency on the LLM provider, structured-output schemas, fallback chains, debug-vs-user stream separation, ReAct loop limits.
- Treating AI as a build accelerator, not the architect — every architectural decision is mine; Claude Code drives the mechanics. The output is meant to be picked up, extended, and operated by a human team without ever knowing AI was involved in the build.
That, compressed into one calendar day, is the demo.
Stack snapshot
| Layer | Choices |
|---|---|
| Backend | Python 3.13, FastAPI, SQLAlchemy 2 async, asyncpg, Pydantic v2, uv |
| Frontend | Next.js 16 (App Router + RSC), React 19, TypeScript strict + noUncheckedIndexedAccess, Tailwind v4 with @theme inline |
| AI | OpenRouter primary; Gemini / DeepSeek / Anthropic / Groq fallbacks via LLMRouter + ModelSelectionPolicy |
| Data | PostgreSQL (transactional + translation cache), Redis (Streams + token bucket), Frankfurter ECB FX rates |
| Integrations | Rakuten Ichiba (catalogue), Auth.js v5 + RS256-signed backend JWT, weight-derived shipping quotes, US Section-321 duty + state sales tax |
| Infra | Railway (backend + Postgres + Redis) + Vercel (frontend), Docker multi-stage, Alembic migrations on start |
| Quality | ruff + mypy-strict + pytest 97.78% cov · tsc-strict + vitest 93% cov · no-legacy-tokens CI guard · 9 ADRs · 9 runbooks |