全部项目
生产2026 — present

Algodesks — Multi-Tenant SaaS for Algorithmic Trading

Production multi-tenant SaaS that unifies the algorithmic trading loop from research to live execution. Users discover or auto-generate strategies, walk-forward-validate them, and promote winning portfolios to a real exchange with one click — same domain code path in research and production, no silent re-implementation seam.

访问

角色
AI Architect & Lead Engineer
技术栈
Python 3.13 · FastAPI · Next.js 15 · PostgreSQL · Redis · Docker · Railway
时间
2026 — present
Algodesks — Multi-Tenant SaaS for Algorithmic Trading

本案例研究尚未翻译为您的语言——正在显示英文原文。

Algodesks dashboard
The main dashboard — portfolios, recent runs, and the entry point into AutoBuild.

The problem

Retail and prosumer traders who want to research, validate, and run algorithmic strategies face a fragmented toolchain: backtesting lives in one notebook, optimization in another, live execution somewhere else entirely. Every step re-implements the same primitives — risk sizing, signal logic, fee gates, trailing stops — and the gap between "looks good in backtest" and "survives live trading" is where the majority of strategies silently die. Lookahead bias, overfitting, and execution drift turn promising research into losing money.

The idea

Algodesks unifies the full research-to-production loop in a single product. A user discovers strategies (or has the system generate them autonomously via AutoBuild — a discovery → data-coverage check → walk-forward optimization → portfolio-assembly pipeline), validates them on historical OHLCV data, and promotes a winning portfolio to live trading on a real exchange with one click. The same domain code path runs inside the backtest engine, the optimizer, and the live runner — so what wins in research is what executes in production. No re-implementation seam, no silent divergence.

AutoBuild configuration screen
AutoBuild — set the constraints (exchange, contract type, target legs, optimisation tier, acceptance policy) and the pipeline assembles a viable portfolio autonomously.

Architectural decisions

Clean / hexagonal architecture

The domain layer holds immutable value objects (TrendFilterConfig, TrailingStopConfig, EntryQualityPreset, BodyRatioConfig, …) and entities (Backtest, Portfolio, LiveSession, AutoBuildJob). Every external integration sits behind a narrow Protocol: IUserRepository, IOptimizationRunner, IDataAutoFetcher, IPortfolioBuilder, IEventRepository. Production wiring uses Postgres / Redis / subprocess adapters; tests inject in-memory fakes.

Why: swapping the backtest engine from StubEngine to the legacy subprocess engine was a one-line DI change. Multi-tenancy was retrofitted by threading user_id through one constructor, not by rewriting business logic. The cost of strict layering pays back the first time you need to change anything.

Multi-tenant from day zero

Every database row carries user_id; every Redis key is namespaced {resource}:{user_id}:{id}. Tenant scoping is enforced at the repository boundary, not in the routes — so a future endpoint that forgets to pass user_id won't leak across tenants because the repo simply returns nothing. Per-tenant Fernet encryption protects exchange API credentials. Cross-tenant isolation has its own e2e test suite.

Why: retrofitting tenancy at the route layer is how data-leak CVEs get written. Putting the filter one layer deeper makes the default behaviour secure.

Autonomous-pipeline orchestration (AutoBuild)

AutoBuild is the AI/agentic core of the system. Given a constraint set, it autonomously:

  1. Discovers candidate instruments and ranks them by Bybit liquidity.
  2. Preflights data coverage on disk; auto-fetches missing ranges from upstream with timeout + classified failure modes (timeout, no_data, unsupported_symbol, exception).
  3. Optimises each viable symbol via parameter-grid search with walk-forward fit/test split (70/30 default) — only candidates whose fit-window winners survive the held-out test window get accepted.
  4. Assembles the accepted legs into a balanced portfolio, ready to promote to live trading.

The whole pipeline is cooperatively cancellable: a single flag, checked at every loop iteration, propagates down to a parallel SIGTERM of in-flight subprocesses with a 10-second grace window.

Why: lookahead bias and overfitting are structural failure modes in this domain. The validation has to be the architecture, not an afterthought.

AutoBuild progress view
A running AutoBuild job. Rich diagnostic events stream over WebSocket — bar counts, fetch durations, classified errors — so the user sees what happened and why, not just a frozen spinner.

Event-driven progress UX

Long-running jobs emit typed events over WebSocket (/ws/autobuild/{job_id}, /ws/events) with HTTP polling as a fallback for dropped connections. Event payloads carry rich diagnostic data — bar counts, on-disk size, wall-clock duration, classified error kinds — so the user sees what happened and why, not just a green check / red X.

Why: users watching a 30-minute optimization need something on screen. A frozen modal kills trust faster than a slow job.

Engineering decisions

Type discipline at every seam

  • Pydantic v2 schemas at the HTTP edge, dataclasses + value objects in the domain, SQLAlchemy 2 models in persistence. No dict[str, Any] travels between layers. - TypeScript on the frontend with strict discriminated unions for WebSocket event variants. - Schema and entity converters live in dedicated modules — wire-format changes don't ripple into the domain.

Defensive runtime

Every external call (exchange API, backtest subprocess, upstream OHLCV fetcher, OAuth provider) is wrapped with:

  • Per-call timeout (asyncio.wait_for) so a hung upstream can't anchor the entire job.
  • Classified failure modes with upstream-message capture, so the UI can group failures into actionable buckets.
  • Defense-in-depth catches at orchestrator level — a misbehaving implementer can't crash the run loop.

A stuck-job sweeper requeues orchestrator runs after pod restarts. Quota gates, token-bucket rate limiting, and Sentry on every uncaught exception round out the production hardening.

Security posture

  • Exchange credentials are Fernet-encrypted at rest with a SECRET_KEY provisioned per environment. Rotation is manual-ops by design — rotating mid-trade would silently brick live sessions, which is a far worse outcome than the rotation friction. - Google OIDC for auth, short-lived JWT sessions, HTTP-only secure cookies. - Audit log on every admin write; security headers via middleware; CORS / CSP correctly scoped.

Test pyramid that actually pyramids

  • 2,000+ unit tests on pure domain logic — no I/O, sub-second runtime, fast feedback on every commit.
  • Integration tests against real Postgres + Redis in CI.
  • Programmatic e2e tests driving the live API exactly as a real user would.

Every cross-tenant bug, every lookahead-bias regression, every silent numeric overflow was caught by a test before it shipped to users.

Pragmatic over dogmatic

  • Bundled the legacy backtest engine into the production Docker image rather than splitting it into a microservice. Why: the legacy engine has years of validation behind its trading math; service-boundary purity is a worse trade than correctness. - Chose Railway over Kubernetes for a team of one. Time-to-deploy beats theoretical scale that doesn't exist yet — and Railway's primitives (volumes, environments, autodeploy) cover everything I actually need today. - Bypassed Cloudflare's 100 MB body limit by routing large CSV uploads to the Railway-direct backend URL — pragmatic workaround, documented in the ops runbook.
Backtest result view
A completed backtest — equity curve, drawdown, classification of every trade by the structural axes (entry-quality preset, trend filter, trailing stop config).
Portfolio builder
The portfolio builder — drag accepted legs together, see correlation and combined risk at a glance before promoting to live.
Live trading session
A live session running on a real exchange — same domain code path that ran in the backtest, real-time event feed via WebSocket.

Outcomes

+$103k
vs −$150k baseline, 53 crypto pairs × 3 years, walk-forward-verified
50 legs
autonomous portfolio assembly across 500+ instruments in tens of minutes
2,011
tests passing — full domain + integration + e2e pyramid

Production multi-tenant SaaS live at algodesks.com and api.algodesks.com. AutoBuild can autonomously construct a 50-leg portfolio across 500+ instruments in tens of minutes — a workflow that would take a quant analyst days by hand. Validated research artefacts (dynamic-whitelist hybrid strategy, entry-quality filters, trend filters, trailing stops) are baked into the product as configurable, sweepable axes.

What I personally owned

Domain modelling, architectural decisions, infra topology, security review, the multi-tenant retrofit, the AutoBuild design + implementation, walk-forward optimiser, live-trading runner, frontend architecture, and the production deploy pipeline. End-to-end ownership from problem framing to a paying-customer-ready product — making the technical decisions, building the code, validating with real strategies, and shipping it.

Stack snapshot

LayerChoices
BackendPython 3.13, FastAPI, SQLAlchemy 2 + Alembic, Pydantic v2, asyncio, ccxt, yfinance
FrontendNext.js 15 (App Router, route groups, AuthGuard), React 18, Tailwind CSS, shadcn/ui, TanStack Query
DataPostgreSQL (transactional), Redis (Streams + KV), file-based OHLCV cache on Railway Volume
Auth & SecurityGoogle OIDC, JWT sessions, Fernet symmetric encryption, RBAC permissions, audit log
InfraDocker multi-stage builds with uv, Railway, Cloudflare proxy + DNS
ObservabilityStructured logging, Sentry, custom event store with /ws/events activity feed