En producción2026 — single 24h sprint

JPMarket — AI-Powered Proxy Marketplace (24-Hour Build)

A production-shaped AI-first cross-border shopping platform, built end-to-end in a single 24-hour sprint by orchestrating Claude Code. Two ReAct agents over Rakuten Ichiba + multi-LLM router with provider fallback. Proof point for "AI Architect as velocity multiplier" — strict typing, 97% coverage, ADRs, runbooks, deployed and serving traffic.

Abrir

Rol: AI Architect · Claude Code orchestration
Stack: Python 3.13 · FastAPI · Next.js 16 · React 19 · TypeScript · Postgres · Redis · OpenRouter · Rakuten · Railway · Vercel
Período: 2026 — single 24h sprint

JPMarket — AI-Powered Proxy Marketplace (24-Hour Build)

Este caso de estudio aún no está traducido a tu idioma — mostrando el original en inglés.

JPMarket shop grid with English-translated Japanese collector products and all-in USD landed cost — The shop grid — every Japanese title translated to English, every price shown as all-in USD landed cost (item + shipping + Section-321 duty logic + state sales tax).

The brief

US collectors of Japanese anime merchandise, trading cards, figures, and manga want the stuff that never makes it out of Japan. They get it today through proxy-buying services (Buyee, ZenMarket, FromJapan) which are functional but punishing — Japanese-only catalogues that don't translate properly, opaque fees that only crystallise at checkout, no curation, no recommendations, and a support flow that ends at an FAQ page. The buyer's job becomes: read katakana, eyeball weight, guess shipping, hope duty doesn't sting.

The idea

A niche-first proxy marketplace where every price is the all-in USD landed cost, and an AI Personal Shopper finds drops you'd never discover yourself. The catalogue is a live filter on top of Rakuten Ichiba constrained to five collector niches (anime figures, TCG, manga, plush, hobby goods). Translations, USD pricing, weight-derived shipping quotes, US Section-321 duty logic, and state sales tax all happen on our side — the user just sees a card.

Two ReAct agents share one chat surface ("AI Sensei"):

Concierge — reactive support: order status, FAQ, niche-term explanations, landed-cost calculations.
Personal Shopper — proactive sales: pulls collector interests + purchase history, runs a multi-step catalogue search, prices the top picks against the user's budget, returns a curated grid.

AI Sensei concierge agent answering a support question with streamed steps — Concierge agent — reactive support, SSE-streamed ReAct steps so users see the assistant thinking instead of a frozen spinner.

AI Sensei personal shopper returning a curated product grid based on stated interests and budget — Personal Shopper agent — proactive curation, returns a budget-priced grid pulled live from the warmed catalogue.

What "24 hours with Claude Code" actually produced

git commits across one calendar day

9 ADRs

written Architecture Decision Records — every boundary justified

14 specs

phase plans driving implementation, reviewed before code

231 files

146 backend Python · 85 frontend TypeScript · all strict-typed

97.78%

backend test coverage; 93% frontend; thresholds ratchet forward

9 runbooks

operational docs for the edges that production exposes

This is not "vibe coding". It's a codebase with strict typing on both sides, a coverage gate that fails CI under threshold, a no-legacy-tokens invariant enforced by test, ADRs that justify the architectural seams, runbooks that document operational edges, and a deployment that actually serves traffic. The acceleration came from using Claude Code the way an architect uses senior engineers — delegating implementation in concrete tasks, reviewing diffs, calling out the architectural invariant when output drifted, and keeping a written context (CLAUDE.md + persistent memory) so the agent understood the project's principles before touching a line.

Architectural decisions

Clean / hexagonal core, DI for everything

The domain layer (app/domain/*) speaks only in Product, Order, Cart, User — zero knowledge of Rakuten, OpenRouter, or SQL. Every external boundary is a Protocol (RakutenGatewayProtocol, FxRateProtocol, ProductTranslationRepositoryProtocol, …) wired by a single dependency-injector container.

Why: tests substitute fakes that implement the same Protocol; mypy-strict catches fake/real drift at compile time; swapping vendors is a one-line change in the container. The pattern compresses to its essence under time pressure — there is no version of the right answer that skips it.

Multi-LLM router with provider-fallback chain

A single LLMRouter fronts OpenRouter (primary) plus four vendor SDKs (Gemini, DeepSeek, Anthropic, Groq). Per-task model selection goes through a ModelSelectionPolicy; a provider outage retries the same task on the next provider before bubbling.

Why: vendor churn in foundation-model pricing/availability is a fact of life. Application code stays vendor-agnostic; switching primary providers is a config change. ADR-0004 documents the choice.

Background catalog warmer + bounded translation concurrency

A lifespan task on the API process walks every niche category every 15 minutes, hitting Rakuten through the gateway and translating each product (title + caption) into English. Translations land in a Postgres product_translations table; the user-facing search reads cache-only and never blocks on an LLM round-trip. A semaphore (asyncio.Semaphore(4)) caps in-flight LLM calls so the provider isn't throttled into 30% empty responses.

Why: synchronous translation on the hot path was producing 30-second category searches and getting empty responses from OpenRouter under burst load. The warmer trades cold-start latency for steady-state instant grids.

Outbox + Redis Streams for cross-boundary events

Order-lifecycle events (order.placed, order.status_changed) go through a transactional outbox written in the same DB transaction as the order itself; a relayer drains the outbox into a Redis Stream; handlers (email today, webhooks later) consume from a consumer group with XPENDING redelivery + an idempotency guard.

Why: classic solution to "I wrote the row, then died before sending the email." Domain events fan out exactly-once-enough without dual-write hazards.

ReAct agent runtime with SSE streaming

Each agent runs a bounded ReAct loop (max 8 steps, 25s/step) over a JSON-schema-validated tool registry. Steps stream to the browser as Server-Sent Events. A single env flag (AGENT_DEBUG_STREAM) toggles between the user-facing stream (comment_to_user, proposal, final_answer only) and the developer trace (full situation / thoughts / tool-call / raw-result for debugging).

Why: users want a friendly assistant, not a tool-dump; developers want to see why the agent picked what it picked. One runtime, two views.

Runtime theming with three-layer tokens

[data-theme="night"] / [data-theme="day"] carry raw primitives; Tailwind's @theme inline semantic layer (bg-canvas, text-content, border-border, …) emits var(--color-*) so a single attribute swap on <html> re-skins the live page with no JS reload. A CI test (tests/theming/no-legacy-tokens.test.ts) fails the build on any reappearance of the deprecated palette.

Why: a "theme" should be a complete design system, not a recolour — and the no-old-tokens invariant has to be permanent, not a one-time cleanup. ADR-0008.

Observability surfaced in the product

The footer carries a live catalog warmer status indicator (Catalog: N items loaded · refreshed Xm ago · in Y) and the current JPY→USD rate with fetched-at UTC timestamp. Every product card shows Data as of YYYY-MM-DD HH:MM UTC.

Why: operators see freshness at a glance; users get explicit honesty about live-data staleness instead of "trust us, it's fresh".

JPMarket day vs night theme side by side, demonstrating runtime theme switching — Two complete themes — day / night — swap by toggling one attribute on <html>. Three-layer tokens (primitives → semantic → component) prevent palette drift.

Footer observability — catalog warmer status, JPY/USD rate freshness timestamp — Observability surfaced in the product — warmer status + FX freshness visible to every visitor, not buried in a /status page.

Cart with line-item all-in USD pricing and Section-321 duty calculation — All-in USD pricing at the line item — item + shipping + Section-321 duty + state sales tax computed server-side.

Engineering decisions

Backend — async everywhere, no synchronous hot paths

Python 3.13 + FastAPI + SQLAlchemy 2 async + asyncpg, uv for deterministic dep management. An AUTOCOMMIT sessionmaker for read-dominated repos drops a measured ~3× round-trip cost on the landing-page DB reads. A per-App-ID Redis token bucket paces every replica at Rakuten's documented 1 req/sec ceiling without client-side coordination. A circuit breaker wrapping the Rakuten gateway returns 503 in microseconds when the vendor IP allowlist trips, instead of 30-second timeouts that pile up worker threads.

Cache hygiene — don't-cache-fallback and augment-don't-overwrite

Two specific lessons from the build:

Don't-cache-fallback for empty LLM responses. Empty bodies return None from the translation field resolver so the next warm pass retries; the cache converges to ~100% English titles across 2-3 cycles instead of pinning a quarter of the catalogue to its Japanese source forever.
Augment-don't-overwrite for cached translations. Repo lookup falls back to the product's pre-cached name_translated rather than blanking it on a cache miss. Earlier bug: seeded demo products with English names rendered as Japanese after the cache_only flag was introduced.

Why: AI fallbacks compound. The wrong default at the cache layer turns into a permanent UX regression three layers up.

Frontend — Next.js 16 App Router, mostly RSC, minimal client islands

Server components fetch curated collections + the warmer-status indicator with next: { revalidate: 30 }; client islands only where they buy interactivity (ThemeSwitcher, ChatPanel). Tailwind 4 CSS-first config with @theme inline — token changes ship as one CSS file, no rebuild of every component. TypeScript strict + noUncheckedIndexedAccess (array access returns T | undefined). Auth.js v5 + RS256-signed backend JWT; access TTL aligned with the Auth.js cookie so users don't get silently 401'd while the browser still believes it's logged in.

Operations — pragmatic for the scale that actually exists

Railway for the backend + managed Postgres + Redis; Vercel for the frontend. The catalogue warmer is an in-process asyncio task in the API lifespan, not a separate worker process — pragmatic for the current scale, easy to extract later if traffic justifies. Static outbound IP on Railway satisfies Rakuten's per-App-ID allowlist. Dockerfile.prod with repo-root build context + alembic upgrade head baked into the start command — migrations apply on every deploy, idempotent. CI gates: ruff + mypy-strict + pytest + coverage threshold (97.78% backend, 93% frontend), thresholds ratchet forward, never backward.

The point

The "AI Engineer / Architect" skill on display here is not "I can prompt Claude to write code." It is:

Decomposing a product brief into ports and adapters, identifying the right Protocol seams before any code exists.
Picking patterns under pressure — outbox vs. dual-write, cache-only search vs. live translation, in-process warmer vs. separate worker — and being able to justify them in writing the same day.
Designing for AI in production — bounded concurrency on the LLM provider, structured-output schemas, fallback chains, debug-vs-user stream separation, ReAct loop limits.
Treating AI as a build accelerator, not the architect — every architectural decision is mine; Claude Code drives the mechanics. The output is meant to be picked up, extended, and operated by a human team without ever knowing AI was involved in the build.

That, compressed into one calendar day, is the demo.

Stack snapshot

Layer	Choices
Backend	Python 3.13, FastAPI, SQLAlchemy 2 async, asyncpg, Pydantic v2, `uv`
Frontend	Next.js 16 (App Router + RSC), React 19, TypeScript strict + `noUncheckedIndexedAccess`, Tailwind v4 with `@theme inline`
AI	OpenRouter primary; Gemini / DeepSeek / Anthropic / Groq fallbacks via `LLMRouter` + `ModelSelectionPolicy`
Data	PostgreSQL (transactional + translation cache), Redis (Streams + token bucket), Frankfurter ECB FX rates
Integrations	Rakuten Ichiba (catalogue), Auth.js v5 + RS256-signed backend JWT, weight-derived shipping quotes, US Section-321 duty + state sales tax
Infra	Railway (backend + Postgres + Redis) + Vercel (frontend), Docker multi-stage, Alembic migrations on start
Quality	ruff + mypy-strict + pytest 97.78% cov · tsc-strict + vitest 93% cov · no-legacy-tokens CI guard · 9 ADRs · 9 runbooks