How WonderCast is assembled.
The runtime topology, domain services, data model, realtime channels, and the critical end-to-end flows that power personalized stories, videos, and audiobooks for families.
Quick reference
The six things you need in your head to read the rest of this doc.
Port 3000 · PM2 wonder-frontend · hash routing
Port 3001 · PM2 wonder-server · /api/v1/*
~55 models · ioredis for session + rate
Signed URLs · genai-*-worker PM2 forks
Shared HTTP listener · 2-min reconnect
Google + Apple OAuth · LoginEvent audit
System context
WonderCast is an AI-driven personalized children's media platform. A parent onboards characters (family members, pets, toys), and the system generates personalized stories as image storybooks, videos, and audiobooks.
Billing is based on monthly generation caps per modality, not raw credits (index.ts:1).
| Persona | Entry point | Primary UI |
|---|---|---|
| End user (parent) | www.wondercast.app | SPA under /app/* |
| Admin / staff | Same SPA | Gated by user.isAdmin → 11 tabs |
| Marketing visitor | Same host | /pricing, /blog, /corp/* |
| Native shell | iOS / tvOS | Capacitor WebView wrapping the same web bundle |
The native layer is thin (StoreKit 2, Sign-in-with-Apple); all business logic stays on the web.
Runtime topology
A single EC2 host, nine PM2 processes, everything in one VPC. Stateless app tier; all ephemeral state lives in Redis.
PM2 reports nine live processes: wonder-server, wonder-frontend, four genai-image-worker forks, three genai-video-worker forks, and pm2-logrotate. The Fortify workers are run by a sister service (controller-api) and communicate with wonder-server over HTTP webhooks and Socket.IO.
The Express process boots in server/src/index.ts:
HTTP API surface
All REST endpoints live under /api/v1/*. Everything below the maintenance middleware short-circuits to 503 for non-admins during outages.
| Prefix | File | Public? |
|---|---|---|
/email | email.routes.ts | token |
/features | features.routes.ts | optionalAuth |
/public/* | 4 routes (assets, costs, founding-family, contact) | yes |
/health | deep health | yes |
/webhooks | Stripe / Fortify / Apple | signed |
/auth | login, signup, OAuth, refresh | yes |
— maintenanceMiddleware — | gates all below | — |
/users | profile, account members | auth |
/characters, /voices, /stories | core domain CRUD | auth |
/media/img, /media | S3 presigned URL proxy | auth |
/billing | Stripe + IAP glue | auth |
/admin | admin panel APIs | admin |
/ai | safety, quiz, grammar | auth |
/generate | heavy generation · 14 endpoints | auth + rate |
/chat | dedicated chat (split out of /generate) | auth |
Two cross-cutting middlewares worth calling out
- authenticate + lastSeen — auth chain on
/usersstampsUser.lastSeenAtin a throttled fire-and-forget Redis+Postgres dance (lastSeen.middleware.ts:28-75) — at most one DB write per user per 60s, never awaited. - maintenanceMiddleware — everything above remains reachable in maintenance mode; everything below short-circuits to 503 unless the caller is an admin.
Backend services (domains)
Each server/src/services/<domain>/ folder owns a bounded context. The most load-bearing piece is the auth login pipeline.
auth
3 filesPassword login, OAuth linking, refresh tokens, account-deletion grace, LoginEvent audit
admin/
2 files30+ service files covering accounts, billing stats, campaigns, marketing, reports, team, webhooks, feature flags
admin-settings/
2 filesSystem-wide config: AI model selection, prompts, limits, maintenance flag
generation/
2 filesGemini-backed text + speech: story, scene-regen, safety, chat, per-item TTS
fortify/
2 filesImage / video / batch-audio pipeline: batch-builder → HTTP client → WebSocket tracker
ai/
2 filesLow-level Gemini wrappers: retry, JSON safety, text completion, structured schemas
billing/
2 filesStripe + App Store IAP dual-path subscription management
voice + training/
2 filesElevenLabs voice cloning, local bookkeeping, voice-training worker
LoginEvent audit row at every decision point — rate-limit hit, bad password, suspended account, success — each guarded so no audit failure can break a real login (L191-238). On success it also bumps the denormalized Account.lastActivityAt (L361), the sort key used by the admin Accounts tab.Data model highlights
Prisma schema is 2,559 lines of fully normalized PostgreSQL spanning ~55 models. The core cluster every request touches:
| Model | Role | Notable fields |
|---|---|---|
Account | Billing unit + tenant boundary | plan, usage counters, stripeCustomerId, lastActivityAt |
User | Family members within an account | accountId, googleId/appleId, isAdmin, lastActiveAt + lastSeenAt |
Character | Child / parent / pet / toy with reference images | links to CharacterToy, VoiceProfile |
VoiceProfile + VoiceSample | ElevenLabs clone + raw audio | — |
GeneratedStory | One story generation event | has many StoryScene |
StoryScene + SceneDialogue | Scene text, image URL, dialogue lines | image + audio URLs point at S3 keys |
Subscription + PricingPlan | Current plan state (Stripe or IAP) | — |
SystemConfig | Generic key/value config | individual feature.* keys live here |
Recently-landed additions
- ›
LoginEvent(L1878-1899) — append-only auth audit. Indexed byuserId,accountId,createdAt,email.onDelete: SetNullso audit survives user deletion. - ›
Account.lastActivityAt+ its descending index — indexedORDER BYin admin list without joiningUser. - ›
User.lastSeenAtvslastActiveAt— two columns by design.lastActiveAt= successful login;lastSeenAt= any authenticated request, throttled. DAU / "online now" uses the latter. - ›
AdminAuditLog— admin mutations, separate from user auth events.
Money-adjacent: CreditTransaction, StripeTransaction, AppStoreTransaction, Invoice, and RefundRequest are first-class; Stripe and App Store stay on separate tables so reconciliation runs independently.
Frontend architecture
Two top-level React Router trees; hash-based routing; lazy-loaded admin panel with 11 tabs.
Routing & auth hydration
Two top-level router trees, split in App.tsx:
- Website + marketing tree —
/,/pricing,/blog,/corp/*,/landing/*. UsesWebsiteLayout, no auth context. - Application tree —
/app/*,/login,/signup,/oauth-callback,/shared/:token,/onboarding. Wrapped inQueryClientProvider+AppProvider, lazy-imports every page.
Hash-based routing is used (#/app/...) because the iOS shell and email-link handling depend on it.
ProtectedRoute does three things on mount
- Waits for
AuthProvider.hydrate(isLoading). - Fetches
/api/v1/maintenance/status— only for non-admin users; admins bypass. - Fetches
/api/v1/legal/termsto check whether the user needs to accept a new terms version.
Only after all three complete does it render <Layout>. Admins skip the maintenance check, so a global outage doesn't lock staff out.
Self-healing media: SignedImage
Signed S3 URLs expire, but cached URLs float through WebSocket payloads and localStorage. Every generated <img> uses SignedImage.tsx with two layers:
Parse X-Amz-Date / X-Amz-Expires; refresh before a 5-minute default buffer.
On <img onError>, one-shot refresh via refreshSignedUrl(storageKey).
The app tolerates clock skew, long-lived pages, and stale payloads with no broken images visible to users.
Realtime (Socket.IO)
Single Socket.IO server attached to the same HTTP listener as Express. Redis-adapter-backed for horizontal scaling.
Configuration
- Redis adapter —
@socket.io/redis-adapterwith dual pub/sub clients againstREDIS_URL. On Redis failure the server logs and downgrades to single-node mode. - Connection-state recovery — 2-minute window for reconnecting clients to resume without missing events.
- Per-socket middleware —
authenticateSocket→rateLimitSocket→logSocketEvents. - Transports —
['websocket', 'polling']with upgrades allowed.
Emit pattern
Server-to-client emitters in emitters.ts are called from services — e.g. emitUserActivity({ action: 'login' }) from auth.service.ts:373 — so the admin dashboard sees events in real time.
// server side
emitUserActivity({ userId, action: 'login' });
// admin dashboard
socket.on('user:activity', (evt) => {
toast.show(`${evt.userId} just signed in`);
});Two settings systems
A leaky abstraction that causes real bugs. Writes fan out to both; reads merge the JSON blob with overrides from individual keys.
Individual SystemConfig rows
Keyed feature.share_story, etc. Written by admin panel toggles.
JSON blob in SystemConfig.feature_flags
Read by public /api/v1/features via features.routes.ts.
/features merge the JSON blob with overrides from the feature.* keys — see ADMIN_FEATURE_OVERRIDES at features.routes.ts:30-38. Consolidating these two systems would eliminate recurring "toggle flipped but feature still off" bugs.External integrations
Seven vendors, one .env, every key indirected through admin settings.
| Service | Use | Entry point |
|---|---|---|
| Google Gemini text | Story, scene, chat, safety, quiz, grammar. Model indirected through admin settings — default gemini-3-flash-preview | story.service.ts:7 |
| Fortify Media image video audio | Batch submit → worker process → POST back via webhook + Socket.IO push | client.ts websocket-handler.ts |
| ElevenLabs audio | TTS narration, voice cloning | voice.service.ts |
| Google Cloud TTS / OpenAI TTS | Alternative TTS providers, model-routed | speech.service.ts |
| AWS S3 | All generated media + uploaded assets. Website illustrations served unsigned; user media signed | s3.ts |
| AWS SES | Transactional email from website@wondercast.app (us-west-1) | services/email/ |
| Stripe | Web subscription billing + customer portal | stripe.payment.service.ts |
| Apple StoreKit 2 | iOS + tvOS IAP. Dual-bundle verifier registry — one app server handles both apps | appStore.service.ts:127 |
| Google OAuth (passport), Apple Sign In | Social login; Apple sub lookup works even without email thanks to provider-id lookup | oauth.service.ts:40-42 |
Secret configuration is loaded from server/.env via env.ts. This doc never hardcodes values.
Critical end-to-end flows
Three flows worth knowing by heart. The OAuth signup is the onboarding rung; story generation is the product; the audit pipeline is how staff sees the world.
10.1 · Google OAuth signup → first login
- 1
"Continue with Google" →
AuthContext.socialLogin('google')splits native vs web onCapacitor.isNativePlatform(). - 2
Web: redirect to
/api/v1/auth/google→ Passport → callback hitsoauth.service.ts#findOrCreateOAuthUser, which does a provider-id lookup first and falls back to email. On first sign-in, anAccount+ ownerUserare created in one transaction. - 3
OAuthCallback.tsx stores tokens and re-hydrates.
- 4
/app→ProtectedRouteseesuser.onboardingComplete === false→ redirects to/onboarding. - 5
First authenticated API call fires the
lastSeenmiddleware (writeslastSeenAt); the login path already wrotelastActiveAt,LoginEvent, andAccount.lastActivityAt.
10.2 · Story generation (Create Magic)
10.3 · Admin login-audit trail
LoginEvent. The admin dashboard sorts accounts by the denormalized Account.lastActivityAt.Each successful password login runs auth.service.ts:348-378:
prisma.user.update→ bumpslastActiveAt.recordLoginEvent({ success: true, ... })→LoginEventrow with method, provider, IP, user agent.touchAccountActivity(accountId)→ bumpsAccount.lastActivityAt.emitUserActivity({ action: 'login' })→ Socket.IO broadcast to admin dashboard.
The Accounts tab sorts on the indexed lastActivityAt and aggregates loginCount via _count.loginEvents.
Operational posture
Single-host, PM2-managed, Redis-backed. Built for beta; scaling out is a configuration change, not a rewrite.
Single-host deployment
All processes on one EC2 instance under PM2.
Reload cost
pm2 restart wonder-server produces ~300 ms of connection refusals. Acceptable at beta; revisit for GA.
Horizontal scaling readiness
Socket.IO Redis adapter wired; all ephemeral state (rate limits, throttles, pubsub) in Redis — app processes are stateless.
Observability
Structured logging via utils/logger with per-module children. PM2 captures stdout into /home/ubuntu/.pm2/logs/. No APM yet — known gap.
Rate limiting
Global rateLimiter; narrower aiRateLimiter, newsletterRateLimiter, publicChatRateLimiter on high-abuse surfaces. Per-email login throttle: Redis INCR + EXPIRE.
Fail-open audit
Both recordLoginEvent and touchAccountActivity catch and log. Audit infrastructure must never lock a legitimate user out.
Known architectural debt
The things we know about and haven't fixed yet. If you touch one, update this list.
- Vite in prodFrontend served by
vitedev mode under PM2 instead of a built, CDN-fronted bundle. - Legacy credits fieldOn
Account(schema.prisma:344) — deprecated but still read in edge paths. - Two settings systems§8 is a leaky abstraction; consolidating would eliminate recurring "toggle flipped but feature still off" bugs.
- 30+ files in services/admin/Becoming monolithic. Sub-folders (
team/,email/,reports/) exist — extending that pattern would help. - No type-safe API clientHand-written wrappers in
services/api/*.ts; OpenAPI or tRPC would remove a class of type-drift bugs. - Single-regionSES us-west-1; S3 and Postgres likewise single-region. DR posture is basic.
