How We Built RagLeap — Technical Architecture of a Self-Hosted AI Platform

When we started building RagLeap, the hardest question was not which AI model to use. It was how to build something that runs reliably on a $20/month VPS, handles WhatsApp messages, voice calls, email, and database queries simultaneously — while keeping customer data completely on the user's own server.

This post is a transparent look at how we solved that. Every architectural decision, every tradeoff, and every lesson learned from building a production self-hosted AI platform from scratch.

Why Self-Hosted First

Most AI platforms are cloud-first. You sign up, connect your data, and it works. The problem is your data leaves your server the moment you do that.

For the businesses we built RagLeap for — law firms, healthcare providers, financial services companies, Indian SMBs with 10 years of operational data — this is a dealbreaker. Their data cannot leave their server. So we designed RagLeap to run entirely on the user's own infrastructure from day one.

The Core Stack

Backend: Django 4.2 + Django REST Framework

Database: PostgreSQL 14

Knowledge Graph: Neo4j 5

Task Queue: Celery + Redis

Serving: Gunicorn + Nginx

AI Providers: OpenAI, Gemini, Claude, Mistral (user's own keys)

Voice: ElevenLabs + custom GSM integration

Channels: WhatsApp Business API, Telegram Bot API, Discord

Everything runs on a standard Ubuntu 22.04 VPS. Minimum 4GB RAM. One install script.

Why Django

Django's batteries-included philosophy is perfect for building a product that needs to move fast. We get authentication, admin, ORM, migrations, and a mature ecosystem out of the box. We built a custom TenantMiddleware that resolves the workspace from the request context on every API call — every database query, every RAG retrieval, every AI response is automatically scoped to the correct workspace.

The Knowledge Graph — Why Neo4j

Standard RAG with vector search works fine for simple Q&A. But it breaks down for complex business queries like "Which customers complained about delivery last month and what products did they order?" That is a graph traversal problem — exactly what Neo4j solves.

In RagLeap, we build a knowledge graph from uploaded documents. Entities become nodes. Relationships become edges. We combine vector similarity search with graph traversal to retrieve contextually relevant information — answers that actually make sense for business queries, not just keyword matches.

The Database AI

RagLeap connects directly to the user's existing database — MySQL, PostgreSQL, MongoDB — and lets the AI query it in natural language. We built a schema intelligence layer that introspects the database structure and generates safe, read-only parameterised queries. The LLM never writes raw SQL — it selects from pre-validated query templates. This lets a business owner message their AI on Telegram: "How many orders came from Chennai this week?" and get a real answer from their actual database.

Celery — The Backbone of Async Operations

Almost everything in RagLeap that is not a synchronous API response goes through Celery — document ingestion, email monitoring, scheduled voice calls, lead follow-up, WhatsApp message processing. We separate the Celery beat scheduler from Celery workers so a crash in one does not affect the other. Redis is the message broker, staying well under 100MB even under heavy loads.

Multi-Channel Architecture

One AI brain serving WhatsApp, Telegram, Discord, voice calls, email, and web chat simultaneously — with consistent context across all of them. Our solution: a unified message processing pipeline with channel adapters. Every incoming message gets normalised into the same internal format before hitting the AI. Channel-specific formatting happens on the way out. Same AI, same knowledge, different output formats.

Running on a $20/month VPS

Key optimisations that made the entire stack run on 4GB RAM:

Gunicorn with 2-3 workers
Neo4j heap size limited to 512MB via JVM settings
PostgreSQL shared_buffers set to 512MB
Redis maxmemory set to 256MB with LRU eviction
Celery concurrency set to 2 workers
Static files served directly by Nginx

What We Would Do Differently

We would separate the voice system earlier — it has very different latency requirements from text channels. We would invest in observability from day one. And we would design the database schema for multi-tenancy from the very first migration.

Try RagLeap

RagLeap has a free self-hosted tier — single workspace, web embed chatbot, and AI Manager included. No credit card required. Cloud plans start at $29/month.

Get Started with RagLeap

Self-hosted AI platform — free tier available. Install on your own VPS in 20 minutes.

See Plans