PenFreely — write whole books with an LLM
A writing studio for book-length work. LLMs write a great page but lose the plot across a whole book; PenFreely keeps it coherent with a top-down plan cascade (book → part → chapter → section → page) plus bottom-up summaries, generating prose only at the page level. Model-agnostic, with a cross-platform bridge to run your own local model. Rust backend, SvelteKit, PostgreSQL + pgvector.
- Role
- Solo founder-engineer
- Stack
- Rust · SvelteKit · TypeScript · PostgreSQL · pgvector · Ollama · WebSocket · Tailwind
- Period
- 2025 — present
The problem
An LLM writes a great paragraph and a solid page. It falls apart at book length. By chapter nine the protagonist's eyes have changed colour, a subplot has quietly vanished, and the ending contradicts the setup. The model has no memory of the whole — only the window in front of it. So "write me a novel" gives you either incoherent drift or a vague synopsis, never a real, consistent book.
PenFreely is built for exactly that gap: keeping a whole book coherent while still using a model that can only ever see a small piece of it at a time.
What the product does
You describe the book; PenFreely builds a structured plan from the top down — book → part → chapter → section → page — and only ever generates prose at the page level, with each page conditioned on the plan plus rolling summaries of everything that came before. The structure holds the story together; the model just writes one page at a time, well.
It's model-agnostic by design: you pick the provider and model and switch whenever you like — cloud (OpenAI, Anthropic, OpenRouter) or your own local model. Every generation reports its cost, speed, and token use, so you stay in control of the bill. And it's your book on your model — no forced censorship, no platform deciding what you may write.
What I built
The coherence engine. The core idea is the top-down plan cascade plus bottom-up auto-summaries. Because prose is only ever generated at the page level — conditioned on the plan and short summaries, not the whole manuscript — the context window stays small and the cost stays bounded, while the book stays consistent at any length.
A local-model bridge — free and private. A small cross-platform binary (penfreely-bridge) lets
a user run their own model via Ollama and use it inside the studio. It makes only an outbound
secure WebSocket connection to the backend, so it works behind routers and firewalls with no ports to
open, and the text never leaves the user's machine. Binaries for macOS (Apple Silicon and Intel),
Windows, and Linux.
A real product surface. A SvelteKit studio for planning and writing, provider/model selection, per-generation telemetry, and the plan-to-page workflow — backed by a Rust API.
Architectural choices
Plans are the artifact; prose is derived. The source of truth is the structured plan, not the generated text. That's the same spec-driven discipline I apply everywhere: own the plan, and any page can be regenerated without breaking the book.
Generate small, stay coherent. Only page-level prose is generated, always conditioned on the plan plus summaries — never the whole book in context. That's what makes book-length output both coherent and affordable.
LLM-agnostic to the core. The provider and model sit behind a port; local and cloud are the same interface to the rest of the system. Switching models is a choice, not a migration — the same argument I keep making about never marrying one model.
Clean architecture, strict. The Rust backend is a workspace of layers — a pure domain crate with no IO, application use-cases and ports, infrastructure adapters, and the HTTP/WS interface — with sqlx migrations and a typed wire protocol shared with the bridge. The SvelteKit frontend is tested with vitest and Playwright. PostgreSQL + pgvector from day one, so retrieval and scale aren't bolted on later.
Current status
PenFreely is live and in active development at penfreely.com. I built it solo — product framing, architecture, and review are mine — directing AI agents (Claude Code on Opus, Codex) against a staged spec, each stage built end to end with tests. It's another working example of the thing I keep arguing for: pick the right structure, ground the model in it, and let it do the one job it's actually good at — writing the next page.