← Back to Dev Blog

A Practical Guide to OpenRouter: Unified LLM APIs with Free Models

While exploring ways to improve function and tool-calling workflows, I initially looked into OpenRouter, but its value was not immediately clear to me. I understood it involved access to multiple LLMs, but the true technical benefits only became clear after reviewing the documentation and implementing it in production. Specifically, the ability to leverage their free models provided immense added value, allowing for cost-effective experimentation and scaling.

This guide covers what I've learned building production systems on OpenRouter, with a focus on currently free models (verified April 2026) and Python integration.

What is OpenRouter?

OpenRouter is not a model or generation engine. It is a unified interface and API layer providing access to hundreds of LLMs from dozens of providers through a single OpenAI-compatible API.

At its core, OpenRouter gives you a single endpoint (https://openrouter.ai/api/v1) and API key to access 500+ models from 60+ providers. No more juggling separate authentication, billing, or SDKs for every provider.

Think of OpenRouter as the "Stripe for LLMs" — just as Stripe unified payment processors, OpenRouter unifies LLM providers.

The Problem OpenRouter Solves

Before OpenRouter, using multiple LLM providers meant:

  1. Multiple API Integrations: Each provider has its own API format and SDK
  2. Inconsistent Response Formats: Tool calling works differently across models
  3. Provider-Specific Failures: When a provider goes down, your app breaks
  4. Billing Complexity: Managing separate billing across multiple providers

OpenRouter eliminates all of these with a single abstraction layer.

Current Free Models on OpenRouter (April 2026)

Free model availability changes frequently. Below are verified free models (pricing: $0/M tokens) as of late April 2026:

Model ID Provider Context Notes
tencent/hy3-preview:free Tencent 262K Good general purpose, going away May 8 2026
nvidia/nemotron-3-super:free NVIDIA 262K Strong reasoning, stable
openai/gpt-oss-20b:free OpenAI 131K Fast, good for coding
openai/gpt-oss-120b:free OpenAI 131K More capable, slower
minimax/minimax-m2.5:free MiniMax 197K Multimodal support
google/gemma-4-31b:free Google 131K Efficient, good for summarization
nvidia/nemotron-3-nano-30b-a3b:free NVIDIA 256K Large context window
openrouter/free Auto Varies Recommended: Auto-selects best available free model

Important: Old free models like google/gemini-flash-1.5:free and meta-llama/llama-3.1-8b-instruct:free are no longer free. Always check OpenRouter's free model list for current options.

Why Use openrouter/free?

The openrouter/free model is a special router that automatically selects from available free models based on your request. Benefits:

  • No need to hardcode model IDs that may go paid
  • Automatically switches if a free model is deprecated
  • Balances load across free model pool

Complete Python Example: Multi-Model System with Free Models

Here's a production-ready Python example using the OpenAI SDK with OpenRouter:

example/openrouter_agent.py
import os
import json
import time
from openai import OpenAI
from typing import Iterator, Optional, Dict, Any

# Initialize OpenRouter client
# Get API key from https://openrouter.ai/keys (free to sign up)
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
    default_headers={
        "HTTP-Referer": "https://your-app.com",  # Required for free tier
        "X-Title": "Your App Name",  # Shows in OpenRouter dashboard
    }
)

# Define model routing strategy with current free models
MODELS = {
    "reasoning": "nvidia/nemotron-3-super:free",
    "coding": "openai/gpt-oss-20b:free",
    "summarization": "google/gemma-4-31b:free",
    "long_context": "nvidia/nemotron-3-nano-30b-a3b:free",
    "multimodal": "minimax/minimax-m2.5:free",
    "fallback": "openrouter/free",
}

class FreeModelAgent:
    def __init__(self):
        self.usage_stats = {"total_tokens": 0, "requests": 0}
    
    def chat(self, task: str, message: str, **kwargs) -> Dict[str, Any]:
        model = MODELS.get(task, MODELS["fallback"])
        
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": message}],
                temperature=kwargs.get("temperature", 0.7),
                max_tokens=kwargs.get("max_tokens", 2048),
            )
            
            tokens = response.usage.total_tokens
            self.usage_stats["total_tokens"] += tokens
            self.usage_stats["requests"] += 1
            
            return {
                "content": response.choices[0].message.content,
                "model": response.model,
            }
        except Exception as e:
            print(f"Failed with {model}: {e}")
            if model != MODELS["fallback"]:
                return self.chat(task, message, **kwargs)
            raise

if __name__ == "__main__":
    agent = FreeModelAgent()
    result = agent.chat("coding", "Write a Python function for fibonacci.")
    print(result["content"])

Streaming Example for Better UX

OpenRouter supports streaming across all providers. Here's a Python streaming example:

example/streaming.py
def stream_response(
    prompt: str, 
    model: str = "openrouter/free"
) -> Optional[str]:
    """Stream a response token by token."""
    try:
        stream = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            stream=True,
        )
        
        full_response = ""
        for chunk in stream:
            content = chunk.choices[0].delta.content
            if content:
                full_response += content
                print(content, end="", flush=True)
        
        print()
        return full_response
    except Exception as e:
        print(f"Stream failed: {e}")
        return None

Tool Calling with Free Models

Not all free models support tool calling. openai/gpt-oss-20b:free and nvidia/nemotron-3-super:free have good tool support. Here's an example:

example/tool_calling.py
def get_weather(location: str) -> str:
    """Mock weather function."""
    return f"The weather in {location} is sunny, 22°C."

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                },
                "required": ["location"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="openai/gpt-oss-20b:free",
    messages=[{"role": "user", "content": "What is the weather in London?"}],
    tools=tools,
    tool_choice="auto",
)

message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        if tool_call.function.name == "get_weather":
            import json
            args = json.loads(tool_call.function.arguments)
            weather = get_weather(args["location"])
            print(f"Weather: {weather}")

Cost Optimization with Free Models

With free models on OpenRouter, your LLM costs are $0. Here's our routing strategy:

example/model_selection.py
def select_free_model(task_type: str, complexity: str = "medium") -> str:
    """Select the best free model based on task and complexity."""
    free_models = {
        "simple": {"chat": "google/gemma-4-31b:free"},
        "medium": {"chat": "nvidia/nemotron-3-super:free"},
        "complex": {"reasoning": "nvidia/nemotron-3-super:free"},
    }
    return free_models.get(complexity, {}).get(task_type, "openrouter/free")

model = select_free_model("chat", "simple")
print(f"Selected model: {model}")

Why OpenRouter Matters in 2026

As agentic AI becomes mainstream, OpenRouter has become essential because it:

  1. Eliminates Vendor Lock-in: Switch models without code changes
  2. Zero Cost Development: Free models for testing and prototyping
  3. Improves Reliability: Automatic failover keeps agents running
  4. Simplifies Operations: One API key, one bill (or $0 with free models)
  5. Enables Experimentation: Test new models instantly

Best Practices for Free Models

Do's

  • Use openrouter/free as default to avoid deprecated model IDs
  • Set explicit timeouts on all requests (30s recommended)
  • Implement application-level retries with backoff
  • Use streaming for user-facing features
  • Check model deprecation dates (some free models expire)

Don'ts

  • Don't use deprecated model IDs (always verify at max_price=0)
  • Don't forget rate limits on free models (usually 10-50 req/min)
  • Don't use free models for mission-critical production without testing
  • Don't hardcode model names — use config/constants

Conclusion

OpenRouter with free models is a game-changer for developers. You get access to capable LLMs like GPT-OSS, NVIDIA Nemotron, and Gemma 4 at zero cost, with automatic failover and unified APIs.

If you're building AI features, start with openrouter/free — it automatically selects the best available free model without you needing to track changing model IDs.

The era of expensive LLM experimentation is over. With OpenRouter and free models, building AI-powered apps has never been more accessible.


Resources