In my previous post, we explored how to leverage OpenRouter to access multiple LLMs and utilize their free models for cost-effective development. But what if you want to combine the power of OpenRouter's cloud models with the absolute privacy of running local models natively on your own VPS?
Enter Ollama.
In this hands-on tutorial, we will securely set up Ollama directly on a Linux server and write a Python script that bridges your local instance with OpenRouter.
Step 1: Installing Ollama (Sudo vs. User)
When installing software on a production VPS, permissions matter. You might wonder: Should I install Ollama as a standard user or with sudo?
The best practice is to use the official installation script with sudo privileges. Why? Because the script doesn't just download the binary into your personal folder; it automatically creates a dedicated, restricted ollama user. Running the service under this isolated Linux user is significantly more secure than running it under your personal login account, as it limits the blast radius if the service is ever compromised.
Run the official script:
sudo curl -fsSL https://ollama.com/install.sh | sh
Step 2: Systemd Service & Startup Security
The installation script automatically configures Ollama as a systemd background service. This means it will automatically start when your server boots up and gracefully restart if it crashes.
You can verify the service status with:
sudo systemctl status ollama
Crucial Security Check: By default, the Ollama systemd service binds strictly to 127.0.0.1:11434 (localhost). Do not change this to 0.0.0.0. Exposing the Ollama port directly to the public internet will allow anyone to run heavy LLMs on your server, rapidly draining your compute resources or executing arbitrary commands via model files. If you must expose it, use a reverse proxy (like Nginx) with strict authentication.
Now, pull a lightweight model to test your local setup:
ollama run llama3
(Type /bye to exit the chat prompt once it finishes).
Step 3: Bridging Ollama and OpenRouter (Python)
The real magic is that Ollama natively supports the OpenAI API format. This means we can use the exact same Python code to talk to our secure local server and OpenRouter!
First, install the Python SDK:
pip install openai
Here is a Python script that acts as an intelligent router. It sends sensitive queries to your private local server, and complex coding queries to OpenRouter's free models:
import os
from openai import OpenAI
# Client 1: Local Ollama Server (Zero Cost, 100% Private)
local_client = OpenAI(
base_url='http://127.0.0.1:11434/v1',
api_key='ollama', # API key is required by the SDK but ignored by Ollama
)
# Client 2: OpenRouter (Cloud Models)
cloud_client = OpenAI(
base_url='https://openrouter.ai/api/v1',
api_key=os.environ.get('OPENROUTER_API_KEY'),
)
def ask_ai(prompt, is_sensitive=False):
if is_sensitive:
print('--> Routing to Local Server (Privacy First)...')
response = local_client.chat.completions.create(
model='llama3',
messages=[{'role': 'user', 'content': prompt}]
)
else:
print('--> Routing to OpenRouter (Cloud Power)...')
response = cloud_client.chat.completions.create(
model='meta-llama/llama-3-8b-instruct:free',
messages=[{'role': 'user', 'content': prompt}]
)
return response.choices[0].message.content
# Test the hybrid approach
print(ask_ai('Summarize this public article...', is_sensitive=False))
print(ask_ai('Analyze these private Linux server logs...', is_sensitive=True))
By leveraging the standardized API format, you can build a unified backend that intelligently routes traffic—keeping costs low with OpenRouter's free tier while maintaining strict data privacy with Ollama.
Step 4: Keeping Ollama Updated
Ollama is under active development, and new updates (often containing performance improvements or support for new model architectures) are released frequently. Since we installed Ollama directly on Linux without a package manager, we need to manage updates manually.
To automate this process, you can use a simple bash script. This script fetches your currently installed version, compares it against the latest release via the GitHub API, and automatically safely runs the official installer again if a new version is detected:
#!/bin/bash
# Get current version (removes 'v' prefix if present)
current_version=$(ollama -v | awk '{print $NF}' | sed 's/^v//')
# Fetch latest version from GitHub
latest_version=$(curl -s https://api.github.com/repos/ollama/ollama/releases/latest | grep -Po '"tag_name": "v\K[^"]*')
if [ "$current_version" != "$latest_version" ]; then
echo "Update available: $current_version -> $latest_version"
# Automatically update by re-running the install script
sudo curl -fsSL https://ollama.com/install.sh | sh
else
echo "Ollama is up to date ($current_version)."
fi
Tip: Save this script as update_ollama.sh, make it executable (chmod +x update_ollama.sh), and schedule it via a weekly cron job to ensure your local AI server is always running the latest engine without manual intervention.
Resources
- Ollama Linux Documentation: The official guide covering installation, manual setup, and service management for Ollama on Linux.
- Ollama OpenAI Compatibility Layer: Detailed documentation on how Ollama handles OpenAI-formatted endpoints locally.
- OpenRouter Documentation: The official docs for connecting to OpenRouter, fetching models, and setting up API keys.
- OpenAI Python SDK (GitHub): The standard Python library utilized to interface seamlessly with both platforms.