ChatGPT is powerful, but it comes with a cost โ not just the $20/month subscription, but the knowledge that every prompt you write, every code snippet you share, every personal question you ask is stored on OpenAI's servers. For privacy-conscious users, developers, and businesses handling sensitive data, there's a better way.
In this comprehensive guide, we'll build a private AI assistant using Ollama and Open-WebUI โ two open-source tools that together deliver a ChatGPT-like experience running entirely on your own hardware. No subscriptions. No data leaving your network. Complete control.
By the end of this tutorial, you'll have a fully functional AI chat interface capable of running models like Llama 3.3, DeepSeek-R1, Mistral, and dozens more โ all for free.
What We're Building
Our stack consists of two components:
- Ollama โ A local LLM server that downloads, manages, and runs AI models. Think of it as "Docker for AI models."
- Open-WebUI โ A beautiful web interface that provides the ChatGPT-like chat experience, user management, and features like RAG (chat with your documents).
Together, they create a complete private AI assistant that's:
- โ Completely private โ No data leaves your machine
- โ Free forever โ No API costs or subscriptions
- โ Works offline โ No internet required after initial setup
- โ Multi-model โ Switch between dozens of models instantly
- โ Multi-user โ Share with family or team members
Prerequisites
Before we start, make sure you have:
| Requirement | Minimum | Recommended |
|---|---|---|
| RAM | 8GB | 16GB+ (32GB for 70B models) |
| Storage | 20GB free | 100GB+ SSD |
| OS | Linux, macOS, Windows | Linux or macOS |
| Docker | Docker Desktop or Docker Engine | Latest version |
| GPU (optional) | Not required | NVIDIA GPU with 8GB+ VRAM |
๐ก Apple Silicon Users
M1/M2/M3/M4 Macs are excellent for local AI. The unified memory architecture means Ollama can use all your RAM for model inference. A MacBook Pro with 32GB can run 70B parameter models that would require a $1000+ GPU on PC.
Step 1: Install Ollama
Ollama installation is remarkably simple โ a single command for any platform.
Linux / macOS / WSL
curl -fsSL https://ollama.com/install.sh | sh
This downloads and installs Ollama, sets up the service, and starts it automatically.
Windows
Download the installer from ollama.com/download and run it. The setup wizard handles everything.
Verify Installation
After installation, verify Ollama is running:
# Check Ollama version
ollama --version
# Check the service is responding
curl http://localhost:11434/api/tags
You should see version information and an empty models list (since we haven't pulled any models yet).
Step 2: Download Your First Model
Ollama makes downloading models as easy as pulling Docker images. Let's start with a versatile, high-quality model.
# Pull DeepSeek-R1 14B โ excellent for reasoning and coding
ollama pull deepseek-r1:14b
# Or try Llama 3.2 for general conversation
ollama pull llama3.2
# Or Phi-4 for fast responses on limited hardware
ollama pull phi4
The download may take several minutes depending on your connection. Model sizes range from 2GB (small models) to 40GB+ (70B parameter models).
Test the Model
Before adding the web UI, verify your model works:
ollama run deepseek-r1:14b
You'll get an interactive prompt. Try asking a question:
>>> What are the benefits of self-hosting AI models?
Self-hosting AI models provides several key advantages:
1. **Privacy**: Your data never leaves your hardware...
(response continues)
Press Ctrl+D or type /bye to exit.
Step 3: Deploy Open-WebUI
Now let's add the beautiful web interface. Open-WebUI is distributed as a Docker container, making deployment straightforward.
Option A: Quick Start (Single Command)
docker run -d \
--name open-webui \
--restart unless-stopped \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
This command:
- Runs Open-WebUI on port 3000
- Connects to Ollama on the host machine
- Persists data (chat history, settings) in a Docker volume
- Auto-restarts if it crashes or the server reboots
Option B: Docker Compose (Recommended for Production)
For a more maintainable setup, use Docker Compose. Create a file called docker-compose.yml:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
- WEBUI_AUTH=true
- WEBUI_SECRET_KEY=your-secret-key-change-this
volumes:
- open-webui-data:/app/backend/data
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
open-webui-data:
Start it with:
docker compose up -d
Option C: All-in-One (Ollama + Open-WebUI Together)
If you haven't installed Ollama separately, you can run everything in Docker:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- ollama-data:/root/.ollama
# Uncomment for NVIDIA GPU support:
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: all
# capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open-webui-data:/app/backend/data
depends_on:
- ollama
volumes:
ollama-data:
open-webui-data:
Step 4: Access and Configure Open-WebUI
Open your browser and navigate to:
http://localhost:3000
First-Time Setup
- Create an admin account โ The first user to sign up becomes the administrator
- Select your model โ Choose from the dropdown (you'll see the models you pulled with Ollama)
- Start chatting!
Essential Configuration
Navigate to Settings (gear icon) and configure:
- Default Model โ Set which model loads by default
- System Prompt โ Customize the AI's personality and instructions
- User Management โ Enable or disable user registration
- RAG Settings โ Configure document upload and retrieval
Step 5: Add More Models
Open-WebUI supports switching between models on the fly. Let's add a few more:
# For coding assistance
ollama pull codellama:34b
# For creative writing
ollama pull mistral:7b
# For fast, lightweight responses
ollama pull phi4
# For maximum quality (requires 48GB+ RAM)
ollama pull llama3.3:70b
After pulling, refresh Open-WebUI and the new models appear in the dropdown.
Model Recommendations by Use Case
| Use Case | Recommended Model | RAM Needed | Why |
|---|---|---|---|
| General assistant | llama3.3:70b or qwen2.5:72b | 48GB+ | Best overall quality |
| Coding | deepseek-coder-v2 or codellama:34b | 24GB | Trained on code |
| Reasoning/Math | deepseek-r1:14b or deepseek-r1:70b | 16GB/48GB | Chain-of-thought built-in |
| Fast responses | phi4 or gemma3:4b | 8GB | Quick inference |
| Limited hardware | llama3.2:3b or phi4 | 6-8GB | Runs anywhere |
Step 6: Enable GPU Acceleration (Optional)
If you have an NVIDIA GPU, Ollama can use it for significantly faster inference.
Check GPU Support
# See if Ollama detected your GPU
ollama run phi4
# While running, check NVIDIA GPU usage:
nvidia-smi
If GPU memory is being used, you're all set. Ollama auto-detects NVIDIA GPUs.
For Docker Installations
Add the NVIDIA runtime to your docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Step 7: Set Up RAG (Chat with Your Documents)
One of Open-WebUI's killer features is RAG (Retrieval-Augmented Generation) โ upload documents and ask questions about them.
Upload Documents
- Click the + button in the chat
- Select documents (PDF, TXT, Markdown, etc.)
- Wait for processing
- Ask questions about your documents
Create Knowledge Bases
For frequently used documents:
- Go to Workspace โ Documents
- Create a new collection
- Upload multiple documents
- Reference the collection in chats with
#collection-name
Step 8: Expose to Your Network (Optional)
To access Open-WebUI from other devices on your network:
Simple: Change the Port Binding
Modify docker-compose.yml:
ports:
- "0.0.0.0:3000:8080" # Listen on all interfaces
Access via your server's IP: http://192.168.1.100:3000
Better: Use a Reverse Proxy with HTTPS
For production use, add Nginx Proxy Manager or Caddy:
# Caddyfile example
ai.yourdomain.com {
reverse_proxy localhost:3000
}
This gives you automatic HTTPS via Let's Encrypt.
Step 9: Backup Your Data
Your chat history and settings are stored in the Docker volume. Back it up regularly:
# Find the volume location
docker volume inspect open-webui-data
# Create a backup
docker run --rm -v open-webui-data:/data -v $(pwd):/backup alpine \
tar czf /backup/open-webui-backup-$(date +%Y%m%d).tar.gz /data
Troubleshooting
"Connection refused" when accessing Open-WebUI
- Check if the container is running:
docker ps - Check logs:
docker logs open-webui - Verify port 3000 isn't blocked by firewall
"Cannot connect to Ollama"
- Verify Ollama is running:
ollama --version - Check if the API responds:
curl http://localhost:11434 - If using Docker, ensure
host.docker.internalresolves correctly
Slow performance
- Use smaller models (phi4, gemma3:4b) if RAM is limited
- Enable GPU acceleration if available
- Close other memory-intensive applications
Model download fails
- Check disk space: models can be 4-40GB
- Try a smaller model first:
ollama pull phi4 - Check your internet connection
Security Best Practices
If exposing to the internet:
- Enable authentication โ Set
WEBUI_AUTH=true - Use HTTPS โ Always use a reverse proxy with SSL
- Strong passwords โ Enforce strong admin credentials
- Limit registration โ Disable public signups if not needed
- Keep updated โ Regularly pull the latest images
What's Next?
You now have a fully functional private AI assistant. Here's where to go from here:
- Explore more models โ Browse Ollama's model library for specialized models
- Try different workflows โ Code generation, writing assistance, document analysis
- Set up integrations โ Open-WebUI supports webhooks and APIs for automation
- Add monitoring โ Use Uptime Kuma to track service availability
- Compare alternatives โ Check our 5 Self-Hosted ChatGPT Alternatives guide
Frequently Asked Questions
How does this compare to ChatGPT?
For most tasks, modern open-source models like Llama 3.3 70B and DeepSeek-R1 match or exceed GPT-4's quality. You might notice a difference in very niche knowledge, but for coding, writing, analysis, and general assistance โ local models are excellent.
Can I use this for work?
Yes! Both Ollama and Open-WebUI are open-source. Most models (Llama, Mistral, Qwen) have permissive licenses that allow commercial use. Always check the specific model's license.
How much does it cost to run?
Just electricity. A MacBook Pro M3 running inference draws about 30-50W. At $0.15/kWh, that's roughly $0.02/hour of active use. Compare that to $20/month for ChatGPT Plus.
Can I run this on a Raspberry Pi?
Technically yes, but performance will be very limited. The Raspberry Pi 5 (8GB) can run small models like Phi-4, but responses will be slow. For a good experience, use a more powerful machine.
Is my data really private?
Yes. With this setup, your conversations never leave your machine. There's no telemetry, no cloud connection, no data collection. The models run entirely locally.
Conclusion
You've just built a private AI assistant that rivals commercial offerings โ for free, forever, with complete privacy. The setup takes under 15 minutes, requires no AI expertise, and works on any modern computer.
The age of AI locked behind corporate APIs is ending. Your hardware is powerful enough. The models are good enough. The tools are ready. Take control of your AI.
Explore more self-hosted AI tools in our AI category, or browse our complete app directory at hostly.sh.