How We Built RagLeap — Technical Architecture of a Self-Hosted AI Platform
A deep dive into the real technical decisions behind RagLeap — from Django and Neo4j to RAG pipelines, Celery workers, and multi-channel AI deployment on a $20/month VPS.
TC Antony
Founder, RagLeap · May 2026
When we started building RagLeap, the hardest question was not which AI model to use. It was how to build something that runs reliably on a $20/month VPS, handles WhatsApp messages, voice calls, email, and database queries simultaneously — while keeping customer data completely on the user's own server.
This post is a transparent look at how we solved that. Every architectural decision, every tradeoff, and every lesson learned from building a production self-hosted AI platform from scratch.
Why Self-Hosted First
Most AI platforms are cloud-first. You sign up, connect your data, and it works. The problem is your data leaves your server the moment you do that.
For the businesses we built RagLeap for — law firms, healthcare providers, financial services companies, Indian SMBs with 10 years of operational data — this is a dealbreaker. Their data cannot leave their server. So we designed RagLeap to run entirely on the user's own infrastructure from day one.
The Core Stack
Backend: Django 4.2 + Django REST Framework
Database: PostgreSQL 14
Knowledge Graph: Neo4j 5
Task Queue: Celery + Redis
Serving: Gunicorn + Nginx
AI Providers: OpenAI, Gemini, Claude, Mistral (user's own keys)
Voice: ElevenLabs + custom GSM integration
Channels: WhatsApp Business API, Telegram Bot API, Discord
Everything runs on a standard Ubuntu 22.04 VPS. Minimum 4GB RAM. One install script.
Why Django
Django's batteries-included philosophy is perfect for building a product that needs to move fast. We get authentication, admin, ORM, migrations, and a mature ecosystem out of the box. We built a custom TenantMiddleware that resolves the workspace from the request context on every API call — every database query, every RAG retrieval, every AI response is automatically scoped to the correct workspace.
The Knowledge Graph — Why Neo4j
Standard RAG with vector search works fine for simple Q&A. But it breaks down for complex business queries like "Which customers complained about delivery last month and what products did they order?" That is a graph traversal problem — exactly what Neo4j solves.
In RagLeap, we build a knowledge graph from uploaded documents. Entities become nodes. Relationships become edges. We combine vector similarity search with graph traversal to retrieve contextually relevant information — answers that actually make sense for business queries, not just keyword matches.
The Database AI
RagLeap connects directly to the user's existing database — MySQL, PostgreSQL, MongoDB — and lets the AI query it in natural language. We built a schema intelligence layer that introspects the database structure and generates safe, read-only parameterised queries. The LLM never writes raw SQL — it selects from pre-validated query templates. This lets a business owner message their AI on Telegram: "How many orders came from Chennai this week?" and get a real answer from their actual database.
Celery — The Backbone of Async Operations
Almost everything in RagLeap that is not a synchronous API response goes through Celery — document ingestion, email monitoring, scheduled voice calls, lead follow-up, WhatsApp message processing. We separate the Celery beat scheduler from Celery workers so a crash in one does not affect the other. Redis is the message broker, staying well under 100MB even under heavy loads.
Multi-Channel Architecture
One AI brain serving WhatsApp, Telegram, Discord, voice calls, email, and web chat simultaneously — with consistent context across all of them. Our solution: a unified message processing pipeline with channel adapters. Every incoming message gets normalised into the same internal format before hitting the AI. Channel-specific formatting happens on the way out. Same AI, same knowledge, different output formats.
Running on a $20/month VPS
Key optimisations that made the entire stack run on 4GB RAM:
- Gunicorn with 2-3 workers
- Neo4j heap size limited to 512MB via JVM settings
- PostgreSQL shared_buffers set to 512MB
- Redis maxmemory set to 256MB with LRU eviction
- Celery concurrency set to 2 workers
- Static files served directly by Nginx
What We Would Do Differently
We would separate the voice system earlier — it has very different latency requirements from text channels. We would invest in observability from day one. And we would design the database schema for multi-tenancy from the very first migration.
Try RagLeap
RagLeap has a free self-hosted tier — single workspace, web embed chatbot, and AI Manager included. No credit card required. Cloud plans start at $29/month.
Get Started with RagLeap
Self-hosted AI platform — free tier available. Install on your own VPS in 20 minutes.
See Plans