How to Self-Host Ollama + Open-WebUI: Your Private AI Assistant
Tutorials March 4, 2026 โ€ข 8 min read

How to Self-Host Ollama + Open-WebUI: Your Private AI Assistant

H

Hostly Team

Self-Hosting Enthusiast

Build your own ChatGPT alternative in under 15 minutes. This complete guide shows you how to deploy Ollama and Open-WebUI for a private, powerful AI assistant running entirely on your own hardware.

ChatGPT is powerful, but it comes with a cost โ€” not just the $20/month subscription, but the knowledge that every prompt you write, every code snippet you share, every personal question you ask is stored on OpenAI's servers. For privacy-conscious users, developers, and businesses handling sensitive data, there's a better way.

In this comprehensive guide, we'll build a private AI assistant using Ollama and Open-WebUI โ€” two open-source tools that together deliver a ChatGPT-like experience running entirely on your own hardware. No subscriptions. No data leaving your network. Complete control.

By the end of this tutorial, you'll have a fully functional AI chat interface capable of running models like Llama 3.3, DeepSeek-R1, Mistral, and dozens more โ€” all for free.

What We're Building

Our stack consists of two components:

  • Ollama โ€” A local LLM server that downloads, manages, and runs AI models. Think of it as "Docker for AI models."
  • Open-WebUI โ€” A beautiful web interface that provides the ChatGPT-like chat experience, user management, and features like RAG (chat with your documents).

Together, they create a complete private AI assistant that's:

  • โœ… Completely private โ€” No data leaves your machine
  • โœ… Free forever โ€” No API costs or subscriptions
  • โœ… Works offline โ€” No internet required after initial setup
  • โœ… Multi-model โ€” Switch between dozens of models instantly
  • โœ… Multi-user โ€” Share with family or team members

Prerequisites

Before we start, make sure you have:

RequirementMinimumRecommended
RAM8GB16GB+ (32GB for 70B models)
Storage20GB free100GB+ SSD
OSLinux, macOS, WindowsLinux or macOS
DockerDocker Desktop or Docker EngineLatest version
GPU (optional)Not requiredNVIDIA GPU with 8GB+ VRAM

๐Ÿ’ก Apple Silicon Users

M1/M2/M3/M4 Macs are excellent for local AI. The unified memory architecture means Ollama can use all your RAM for model inference. A MacBook Pro with 32GB can run 70B parameter models that would require a $1000+ GPU on PC.

Step 1: Install Ollama

Ollama installation is remarkably simple โ€” a single command for any platform.

Linux / macOS / WSL

curl -fsSL https://ollama.com/install.sh | sh

This downloads and installs Ollama, sets up the service, and starts it automatically.

Windows

Download the installer from ollama.com/download and run it. The setup wizard handles everything.

Verify Installation

After installation, verify Ollama is running:

# Check Ollama version
ollama --version

# Check the service is responding
curl http://localhost:11434/api/tags

You should see version information and an empty models list (since we haven't pulled any models yet).

Step 2: Download Your First Model

Ollama makes downloading models as easy as pulling Docker images. Let's start with a versatile, high-quality model.

# Pull DeepSeek-R1 14B โ€” excellent for reasoning and coding
ollama pull deepseek-r1:14b

# Or try Llama 3.2 for general conversation
ollama pull llama3.2

# Or Phi-4 for fast responses on limited hardware
ollama pull phi4

The download may take several minutes depending on your connection. Model sizes range from 2GB (small models) to 40GB+ (70B parameter models).

Test the Model

Before adding the web UI, verify your model works:

ollama run deepseek-r1:14b

You'll get an interactive prompt. Try asking a question:

>>> What are the benefits of self-hosting AI models?

Self-hosting AI models provides several key advantages:

1. **Privacy**: Your data never leaves your hardware...
(response continues)

Press Ctrl+D or type /bye to exit.

Step 3: Deploy Open-WebUI

Now let's add the beautiful web interface. Open-WebUI is distributed as a Docker container, making deployment straightforward.

Option A: Quick Start (Single Command)

docker run -d \
  --name open-webui \
  --restart unless-stopped \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

This command:

  • Runs Open-WebUI on port 3000
  • Connects to Ollama on the host machine
  • Persists data (chat history, settings) in a Docker volume
  • Auto-restarts if it crashes or the server reboots

Option B: Docker Compose (Recommended for Production)

For a more maintainable setup, use Docker Compose. Create a file called docker-compose.yml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_AUTH=true
      - WEBUI_SECRET_KEY=your-secret-key-change-this
    volumes:
      - open-webui-data:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  open-webui-data:

Start it with:

docker compose up -d

Option C: All-in-One (Ollama + Open-WebUI Together)

If you haven't installed Ollama separately, you can run everything in Docker:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ollama-data:/root/.ollama
    # Uncomment for NVIDIA GPU support:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - open-webui-data:/app/backend/data
    depends_on:
      - ollama

volumes:
  ollama-data:
  open-webui-data:

Step 4: Access and Configure Open-WebUI

Open your browser and navigate to:

http://localhost:3000

First-Time Setup

  1. Create an admin account โ€” The first user to sign up becomes the administrator
  2. Select your model โ€” Choose from the dropdown (you'll see the models you pulled with Ollama)
  3. Start chatting!

Essential Configuration

Navigate to Settings (gear icon) and configure:

  • Default Model โ€” Set which model loads by default
  • System Prompt โ€” Customize the AI's personality and instructions
  • User Management โ€” Enable or disable user registration
  • RAG Settings โ€” Configure document upload and retrieval

Step 5: Add More Models

Open-WebUI supports switching between models on the fly. Let's add a few more:

# For coding assistance
ollama pull codellama:34b

# For creative writing
ollama pull mistral:7b

# For fast, lightweight responses
ollama pull phi4

# For maximum quality (requires 48GB+ RAM)
ollama pull llama3.3:70b

After pulling, refresh Open-WebUI and the new models appear in the dropdown.

Model Recommendations by Use Case

Use CaseRecommended ModelRAM NeededWhy
General assistantllama3.3:70b or qwen2.5:72b48GB+Best overall quality
Codingdeepseek-coder-v2 or codellama:34b24GBTrained on code
Reasoning/Mathdeepseek-r1:14b or deepseek-r1:70b16GB/48GBChain-of-thought built-in
Fast responsesphi4 or gemma3:4b8GBQuick inference
Limited hardwarellama3.2:3b or phi46-8GBRuns anywhere

Step 6: Enable GPU Acceleration (Optional)

If you have an NVIDIA GPU, Ollama can use it for significantly faster inference.

Check GPU Support

# See if Ollama detected your GPU
ollama run phi4
# While running, check NVIDIA GPU usage:
nvidia-smi

If GPU memory is being used, you're all set. Ollama auto-detects NVIDIA GPUs.

For Docker Installations

Add the NVIDIA runtime to your docker-compose.yml:

services:
  ollama:
    image: ollama/ollama:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Step 7: Set Up RAG (Chat with Your Documents)

One of Open-WebUI's killer features is RAG (Retrieval-Augmented Generation) โ€” upload documents and ask questions about them.

Upload Documents

  1. Click the + button in the chat
  2. Select documents (PDF, TXT, Markdown, etc.)
  3. Wait for processing
  4. Ask questions about your documents

Create Knowledge Bases

For frequently used documents:

  1. Go to Workspace โ†’ Documents
  2. Create a new collection
  3. Upload multiple documents
  4. Reference the collection in chats with #collection-name

Step 8: Expose to Your Network (Optional)

To access Open-WebUI from other devices on your network:

Simple: Change the Port Binding

Modify docker-compose.yml:

ports:
  - "0.0.0.0:3000:8080"  # Listen on all interfaces

Access via your server's IP: http://192.168.1.100:3000

Better: Use a Reverse Proxy with HTTPS

For production use, add Nginx Proxy Manager or Caddy:

# Caddyfile example
ai.yourdomain.com {
    reverse_proxy localhost:3000
}

This gives you automatic HTTPS via Let's Encrypt.

Step 9: Backup Your Data

Your chat history and settings are stored in the Docker volume. Back it up regularly:

# Find the volume location
docker volume inspect open-webui-data

# Create a backup
docker run --rm -v open-webui-data:/data -v $(pwd):/backup alpine \
  tar czf /backup/open-webui-backup-$(date +%Y%m%d).tar.gz /data

Troubleshooting

"Connection refused" when accessing Open-WebUI

  • Check if the container is running: docker ps
  • Check logs: docker logs open-webui
  • Verify port 3000 isn't blocked by firewall

"Cannot connect to Ollama"

  • Verify Ollama is running: ollama --version
  • Check if the API responds: curl http://localhost:11434
  • If using Docker, ensure host.docker.internal resolves correctly

Slow performance

  • Use smaller models (phi4, gemma3:4b) if RAM is limited
  • Enable GPU acceleration if available
  • Close other memory-intensive applications

Model download fails

  • Check disk space: models can be 4-40GB
  • Try a smaller model first: ollama pull phi4
  • Check your internet connection

Security Best Practices

If exposing to the internet:

  • Enable authentication โ€” Set WEBUI_AUTH=true
  • Use HTTPS โ€” Always use a reverse proxy with SSL
  • Strong passwords โ€” Enforce strong admin credentials
  • Limit registration โ€” Disable public signups if not needed
  • Keep updated โ€” Regularly pull the latest images

What's Next?

You now have a fully functional private AI assistant. Here's where to go from here:

  • Explore more models โ€” Browse Ollama's model library for specialized models
  • Try different workflows โ€” Code generation, writing assistance, document analysis
  • Set up integrations โ€” Open-WebUI supports webhooks and APIs for automation
  • Add monitoring โ€” Use Uptime Kuma to track service availability
  • Compare alternatives โ€” Check our 5 Self-Hosted ChatGPT Alternatives guide

Frequently Asked Questions

How does this compare to ChatGPT?

For most tasks, modern open-source models like Llama 3.3 70B and DeepSeek-R1 match or exceed GPT-4's quality. You might notice a difference in very niche knowledge, but for coding, writing, analysis, and general assistance โ€” local models are excellent.

Can I use this for work?

Yes! Both Ollama and Open-WebUI are open-source. Most models (Llama, Mistral, Qwen) have permissive licenses that allow commercial use. Always check the specific model's license.

How much does it cost to run?

Just electricity. A MacBook Pro M3 running inference draws about 30-50W. At $0.15/kWh, that's roughly $0.02/hour of active use. Compare that to $20/month for ChatGPT Plus.

Can I run this on a Raspberry Pi?

Technically yes, but performance will be very limited. The Raspberry Pi 5 (8GB) can run small models like Phi-4, but responses will be slow. For a good experience, use a more powerful machine.

Is my data really private?

Yes. With this setup, your conversations never leave your machine. There's no telemetry, no cloud connection, no data collection. The models run entirely locally.

Conclusion

You've just built a private AI assistant that rivals commercial offerings โ€” for free, forever, with complete privacy. The setup takes under 15 minutes, requires no AI expertise, and works on any modern computer.

The age of AI locked behind corporate APIs is ending. Your hardware is powerful enough. The models are good enough. The tools are ready. Take control of your AI.

Explore more self-hosted AI tools in our AI category, or browse our complete app directory at hostly.sh.