How to Self-Host Your AI on a $6/Month VPS

You’re paying $20/month for ChatGPT. Your data is being stored on someone else’s servers. You have zero control over what happens to your conversations.

What if you could run the same quality AI — privately, on your own server — for $6/month?

That’s less than a Netflix subscription. Less than a single latte per week.

I’ve been running my own AI stack for 6 months. Here’s exactly how I did it, how much it costs, and why I’ll never go back to cloud AI.

The Problem with Cloud AI

Let’s be honest about what happens when you use ChatGPT, Claude, or any cloud AI service:

Your conversations are stored on their servers. Read their terms of service.
Your data is used for training (opt-out exists for some, but you have to find it).
You’re rate-limited — “You’ve reached your message limit.”
Content filters decide what you can and can’t discuss.
Prices increase without notice. ChatGPT went from free to $20/month. What’s next?
Zero customization — you get the model they give you, configured the way they want.

If you’re a developer, founder, or anyone who values privacy and control, this is a problem.

What You Need

A VPS (Virtual Private Server) with:

2 vCPU cores
8GB RAM (minimum for 7B-13B models)
Ubuntu 22.04 or 24.04

My Recommendation: Hostinger KVM2

After testing half a dozen providers, here’s what I use and recommend:

Provider	Plan	RAM	Price	Affiliate Link
Hostinger KVM2	2 vCPU, 8GB	8GB	~€6/month	Get Hostinger
Hetzner Cloud CPX21	3 vCPU, 8GB	8GB	~€7/month	Get Hetzner
DigitalOcean Basic	2 vCPU, 4GB	4GB	$24/month	Get DigitalOcean

Disclosure: The links above are affiliate links. I get a small commission at no extra cost to you. I only recommend providers I actually use.

For most people, the Hostinger KVM2 at ~€6/month is the sweet spot. 8GB RAM lets you run Llama 3.1 8B comfortably, and 2 vCPUs handle the inference load fine.

If you want more headroom (14B models, multiple users), go with Hetzner CPX21 — 3 vCPUs for just €1 more per month.

Step 1: Set Up Your Server

Once you have a VPS, SSH in:

ssh root@YOUR_SERVER_IP

Update everything and create a user:

apt update && apt upgrade -y
adduser aiadmin
usermod -aG sudo aiadmin

Harden SSH

# Disable password auth and root login
sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
systemctl restart sshd

Set Up Firewall

ufw default deny incoming
ufw default allow outgoing
ufw allow ssh
ufw allow 443
ufw allow 80
ufw enable

Install Docker

curl -fsSL https://get.docker.com | sh
systemctl enable docker
usermod -aG docker aiadmin

This takes 2 minutes. Your server is now locked down and ready.

Step 2: Deploy Ollama

Ollama is the engine that runs AI models on your server. It’s like Docker for LLMs.

mkdir -p /opt/ai-stack && cd /opt/ai-stack

cat > docker-compose.yml << 'EOF'
version: "3.8"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped

volumes:
  ollama_data:
EOF

docker compose up -d

Pull Your First Model

# Fast and capable (4GB RAM needed)
docker exec ollama ollama pull llama3.1:8b

# Smaller and faster (2GB RAM needed)
docker exec ollama ollama pull phi3:mini

# Best quality (16GB+ RAM needed)
docker exec ollama ollama pull qwen2.5:14b

Llama 3.1 8B is my daily driver. It handles 90% of tasks — writing, coding, research, analysis. On a €6/month VPS, it runs at about 10-15 tokens per second. Not blazing fast, but perfectly usable.

Test it:

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1:8b",
  "messages": [{"role": "user", "content": "Hello! Tell me about self-hosting."}]
}'

You should see a streaming response. Your AI is alive.

Step 3: Add a Chat Interface

Command-line is fine for testing, but you want a proper interface. Enter Open WebUI — a ChatGPT-like interface that connects to Ollama.

Add this to your docker-compose.yml:

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=true
    volumes:
      - openwebui_data:/app/backend/data
    depends_on:
      - ollama
    restart: unless-stopped

volumes:
  ollama_data:
  openwebui_data:

docker compose up -d

Visit http://YOUR_SERVER_IP:3000, create an admin account, and start chatting.

Add HTTPS (Recommended)

You don’t want to access your AI over plain HTTP. Here’s how to add free HTTPS with Traefik:

  traefik:
    image: traefik:v3.0
    container_name: traefik
    command:
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.letsencrypt.acme.email=YOUR_EMAIL"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.letsencrypt.acme.tlschallenge=true"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - letsencrypt_data:/letsencrypt
    restart: unless-stopped

volumes:
  ollama_data:
  openwebui_data:
  letsencrypt_data:

Then add labels to your open-webui service:

    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.openwebui.rule=Host(`ai.YOURDOMAIN.com`)"
      - "traefik.http.routers.openwebui.entrypoints=websecure"
      - "traefik.http.routers.openwebui.tls.certresolver=letsencrypt"

You’ll need a domain name for this. I use Namecheap for domains — cheap, reliable, and good privacy features.

Step 4: Chat from Your Phone

This is where it gets good. You don’t want to open a browser every time you want to talk to your AI. You want to chat from your phone, like you chat with a friend.

Matrix is the answer. It’s an open messaging protocol (like WhatsApp, but open source and federated). You can run your own server and chat from any device.

I cover the full Matrix + E2E encryption setup in the complete guide, but here’s the short version:

Deploy a Matrix server (Synapse) on your VPS
Install the Element app on your phone
Bridge your AI to Matrix
All conversations are end-to-end encrypted

This means you can chat with your AI from your phone, your laptop, your tablet — all encrypted, all private, all on your server.

The Real Cost Breakdown

Let’s compare what you’d pay over a year:

Setup	Monthly	Yearly	Privacy	Control
ChatGPT Plus	$20	$240	❌	❌
Claude Pro	$20	$240	❌	❌
Both	$40	$480	❌	❌
API (heavy use)	$50-200	$600-2,400	❌	⚠️
This setup	$6	$72	✅	✅

The savings are real. And they compound — every year you save $168-2,328.

What This Setup Doesn’t Do (Yet)

Honest limitations:

No GPU acceleration — CPU inference is slower than cloud APIs. 10-15 tokens/sec for 8B models.
Complex reasoning — For the hardest problems, you might still want GPT-4/Claude as a fallback. But with a privacy proxy, you can use them safely.
Initial setup time — 1-2 hours to get everything running. But you only do it once.

Want the Full Guide?

This post covers the basics. The complete guide includes:

✅ 8 chapters with copy-paste configs
✅ Matrix bridge setup (chat from any device)
✅ E2E encryption (Pantalaimon)
✅ Privacy proxy (redact sensitive data)
✅ Persistent memory system
✅ Monitoring & backup automation
✅ Troubleshooting for every common issue

Get the Complete Guide — €49 →

Next Steps

Get a VPS — Hostinger KVM2 (~~€6/month) or Hetzner CPX21 (~~€7/month)
Follow the steps above — 30 minutes to a working AI
Read the full guide — Complete setup with encryption, privacy proxy, and more

Questions? Drop a comment or reach out on X.

Last updated: April 2026. Tested on Ubuntu 24.04 LTS.