Back to Blog
DevOps11 min readJuly 5, 2025

From Prototype to Production: Deploying Your AI SaaS

A practical roadmap for taking an AI SaaS from prototype to production. Covers infrastructure decisions, auth, billing, observability, and the common failure modes teams hit.

SaaSProductionAIDeploymentScaling
A

Azam

DevOps & AI Consultant

The Gap Between Demo and Production

An AI SaaS that works in a demo and an AI SaaS that runs reliably in production are separated by a long list of unglamorous engineering work. The AI part — the prompts, the RAG pipeline, the agent logic — is usually the easiest part to get right. The hard parts are auth, billing, rate limiting, error handling, observability, and infrastructure that holds up under concurrent users.

This guide is a practical roadmap through that gap, focused on what actually needs to happen and in what order.

Phase 1: Foundation (Week 1-2)

Authentication and Authorization

Use Auth.js (formerly NextAuth) or Clerk rather than rolling your own auth. The implementation cost of getting auth right — email verification, password reset, OAuth, session management, CSRF protection — is much higher than it looks. Ship auth in a day with a library; ship it in two weeks if you build it yourself.

// With Clerk in Next.js
import { clerkMiddleware } from '@clerk/nextjs/server'
export default clerkMiddleware()

// Protect API routes
import { auth } from '@clerk/nextjs/server'
export async function POST(request: Request) {
  const { userId } = await auth()
  if (!userId) return new Response('Unauthorized', { status: 401 })
  // ... your AI logic
}

Database and ORM

Use PostgreSQL with Prisma or Drizzle ORM. Set up connection pooling with PgBouncer or Prisma Data Proxy from the start — LLM API calls mean requests take 2-10 seconds, and without connection pooling you will hit connection limits quickly under concurrent load.

Phase 2: AI Infrastructure (Week 2-3)

LLM Client Abstraction

Build a thin wrapper around your LLM provider from day one. This enables switching providers without touching application code, adding retry logic in one place, and logging all calls for cost tracking.

class LLMClient {
  async complete(prompt: string, options: CompletionOptions) {
    const start = Date.now()
    try {
      const response = await this.provider.complete(prompt, options)
      await this.logUsage({ prompt, response, duration: Date.now() - start })
      return response
    } catch (error) {
      await this.logError({ prompt, error, duration: Date.now() - start })
      throw error
    }
  }
}

Per-User Rate Limiting

Implement rate limiting before you have real users, not after. Use Redis with a sliding window algorithm. Expose limits to users in the UI so they can see how many requests remain.

Phase 3: Billing (Week 3-4)

Use Stripe. Do not build billing yourself. Integrate Stripe Billing with subscription plans that map to usage tiers. The critical pieces: webhook handling for subscription events, usage metering if you bill per-token or per-query, and dunning management for failed payments.

  • Listen to customer.subscription.updated to update user permissions in your DB
  • Listen to invoice.payment_failed to downgrade accounts gracefully
  • Store stripe_customer_id on your user model from the first checkout

Phase 4: Observability (Week 4)

You cannot fix what you cannot see. Set up three layers of observability before launch:

  • Application errors: Sentry for error tracking and stack traces
  • Infrastructure metrics: Vercel Analytics or Datadog for request volume, latency, error rates
  • AI-specific telemetry: Langfuse for LLM request tracing, token usage, and quality metrics

Phase 5: Deployment and Scaling

For a Next.js AI SaaS, Vercel is the path of least resistance for the web tier. For any background processing (document ingestion, batch embedding, scheduled tasks), run separate workers on Railway, Render, or EC2.

Common Failure Modes

  • Timeout errors at scale: LLM calls take 5-30s. Vercel functions timeout at 60s on Pro. Use background jobs for anything that takes longer.
  • Database connection exhaustion: Every concurrent user holds a connection during their 10-second LLM call. Add connection pooling before you launch.
  • Cost explosions: A single user running a tight loop against your API can generate thousands of dollars in LLM costs. Rate limiting and per-user budgets are not optional.
  • Provider outages: Build fallback providers before you have users depending on the system.

The teams that fail at this transition are almost always under-invested in the boring parts: monitoring, rate limiting, error handling, and billing edge cases. The AI part works. The plumbing is what breaks in production.

Want to Build This for Your Team?

I help teams implement the patterns and architectures described in these articles. Let's talk about your project.

Book a Free Call