Route chat completions across multiple AI providers with a single, unified API.
Getting Started Β· API Reference Β· Configuration Β· How It Works
Most AI providers offer generous free tiers β Groq, Cerebras, and Gemini each give you thousands of free API calls per month. The problem? Once you hit the limit on one key, your app stops working.
MultiRouter AI solves this by exposing a single HTTP endpoint that rotates requests across every API key you configure. When one provider's free quota runs out, the next request simply goes to another. You get continuous, uninterrupted AI access by stacking free tiers together β no code changes, no manual switching.
- Maximize Free Tiers β Stack free-tier keys from Groq, Cerebras, Gemini, and others behind one API. When one key hits its limit, the next provider picks up automatically.
- Single API, Multiple Providers β Send requests and let the gateway route them to Groq, Cerebras, OpenAI, OpenRouter, or Google Gemini.
- Choose Your Provider & Model β Target a specific provider and model per request, or let the gateway pick one automatically via round-robin.
- Zero-Config Provider Loading β Add an API key and the provider activates.
- Why MultiRouter AI?
- Supported Providers
- Quick Start
- API Reference
- Configuration
- How It Works
- Tech Stack
- License
| Provider | Default Model | Streaming | Status |
|---|---|---|---|
| Groq | llama-3.3-70b-versatile |
Yes | Stable |
| Cerebras | llama-3.3-70b |
Yes | Stable |
| OpenAI | gpt-4o-mini |
Yes | Stable |
| OpenRouter | meta-llama/llama-3.3-70b-instruct |
Yes | Stable |
| Google Gemini | gemini-2.5-flash |
Yes | Stable |
Only providers with a configured API key are loaded. You can use one provider or all five β it's up to you.
Each provider exposes a list of available models via the
/healthendpoint so you can check what's available before sending a request.
git clone https://github.com/Mykle23/MultiRouter-AI.git
cd MultiRouter-AIpnpm installcp .env.example .envOpen .env and add at least one provider API key. The gateway will auto-detect available providers on startup:
# Add the providers you want to use
GROQ_API_KEY=gsk_your_key_here
OPENAI_API_KEY=sk-your_key_here
GEMINI_API_KEY=your_key_here
# Optional: protect the gateway with a Bearer token
API_KEY=my-secret-token# Development (hot reload + pretty logs)
pnpm dev
# Production
pnpm startcurl -N http://localhost:3000/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "Hello! What can you do?" }
]
}'You should see a streaming text response from one of your configured providers.
Send a chat completion request. The gateway selects a provider based on the request parameters and streams the response.
Headers
| Header | Required | Description |
|---|---|---|
Content-Type |
Yes | application/json |
Authorization |
Conditional | Bearer <token> β required only if API_KEY is set |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
messages |
array |
Yes | Array of { role, content } objects |
provider |
string |
No | Target a specific provider by name (e.g. "Groq", "Gemini") |
model |
string |
No | Override the provider's default model. Requires provider |
Routing logic:
| Request | Behavior |
|---|---|
Only messages |
Round-robin β the gateway picks the next available provider with its default model |
messages + provider |
Uses that exact provider with its default model |
messages + provider + model |
Uses that exact provider with the specified model |
messages + model (no provider) |
400 error β provider is required when model is specified |
Example β round-robin (no provider specified):
curl -N -X POST http://localhost:3000/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'Example β specific provider:
curl -N -X POST http://localhost:3000/chat \
-H "Content-Type: application/json" \
-d '{
"provider": "Groq",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain quantum computing in simple terms." }
]
}'Example β specific provider and model:
curl -N -X POST http://localhost:3000/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "Write a haiku about TypeScript." }
]
}'Response: Streamed text via text/event-stream.
Error Responses
All errors return a JSON object with an error field and a human-readable message:
| Code | Reason | Example message |
|---|---|---|
400 |
Invalid request | "messages array is required and cannot be empty" |
400 |
Model without provider | "provider is required when model is specified" |
401 |
Auth failed | "Authentication failed β invalid API key for this provider" |
404 |
Invalid model | "Model not found β the requested model does not exist on this provider" |
429 |
Rate limited | "Rate limit exceeded β too many requests, try again later" |
502 |
Provider failure | "Provider internal error β the upstream service failed" |
503 |
No providers | "No AI providers available" |
Error responses also include provider and model fields when available, so you know exactly which combination failed:
{
"error": "Rate limit exceeded β too many requests, try again later",
"provider": "Gemini",
"model": "gemini-2.5-pro"
}Returns server status with the full list of active providers, their default models, and all available models. No authentication required.
{
"status": "ok",
"providers": [
{
"name": "Groq",
"defaultModel": "llama-3.3-70b-versatile",
"availableModels": [
"llama-3.3-70b-versatile",
"llama-3.1-8b-instant",
"openai/gpt-oss-120b",
"openai/gpt-oss-20b",
"meta-llama/llama-4-maverick-17b-128e-instruct",
"meta-llama/llama-4-scout-17b-16e-instruct",
"qwen/qwen3-32b",
"moonshotai/kimi-k2-instruct-0905"
]
},
{
"name": "Cerebras",
"defaultModel": "llama-3.3-70b",
"availableModels": ["llama3.1-8b", "llama-3.3-70b", "gpt-oss-120b", "..."]
}
],
"providerCount": 2,
"timestamp": "2026-02-11T12:00:00.000Z"
}Use this endpoint to discover which providers are active and which models you can use in your requests.
All settings are managed through environment variables. See .env.example for the full template.
| Variable | Default | Description |
|---|---|---|
PORT |
3000 |
Server port |
NODE_ENV |
development |
development / production |
LOG_LEVEL |
info |
Log level (debug, info, warn, error) |
API_KEY |
(empty) | Bearer token for auth β leave empty to disable |
RATE_LIMIT_MAX |
100 |
Max requests per minute per IP β set to 0 to disable |
Each provider requires only its API key. The model is optional and falls back to a sensible default.
| Provider | API Key Variable | Model Variable | Default Model |
|---|---|---|---|
| Groq | GROQ_API_KEY |
GROQ_MODEL |
llama-3.3-70b-versatile |
| Cerebras | CEREBRAS_API_KEY |
CEREBRAS_MODEL |
llama-3.3-70b |
| OpenAI | OPENAI_API_KEY |
OPENAI_MODEL |
gpt-4o-mini |
| OpenRouter | OPENROUTER_API_KEY |
OPENROUTER_MODEL |
meta-llama/llama-3.3-70b-instruct |
| Google Gemini | GEMINI_API_KEY |
GEMINI_MODEL |
gemini-2.5-flash |
ββββββββββββββββ
β Client β
ββββββββ¬ββββββββ
β POST /chat
β { provider?, model?, messages }
βΌ
ββββββββββββββββ
β Gateway β
β (Express) β
β β
β Auth Check β
β Rate Limit β
β Validation β
ββββββββ¬ββββββββ
β
ββββββββββββββΌβββββββββββββ
β β β
provider provider no provider
+ model only specified
β β β
βΌ βΌ βΌ
Use exact Use provider Round-Robin
provider + default across all
+ model model providers
β β β
ββββββββββββββΌβββββββββββββ
β
ββββββββββββββΌβββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β Groq β β OpenAI β β Gemini β ...
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
βββββββββββββββΌββββββββββββββ
β Streaming response
βΌ
ββββββββββββββββ
β Client β
ββββββββββββββββ
- Request arrives at
POST /chatwith messages and optionalprovider/modelfields. - Middleware pipeline runs: Helmet headers, rate limiting, Bearer token auth, body validation.
- Provider selection: if a provider is specified, that exact provider is used. Otherwise, the round-robin selector picks the next available provider.
- Provider streams the completion back through the gateway to the client in real time.
- If a provider fails, the gateway returns a descriptive error with the HTTP status, provider name, and model.
| Category | Technology |
|---|---|
| Runtime | Node.js 20+ |
| Language | TypeScript 5.9 (strict mode) |
| Framework | Express 5 |
| Logging | Pino + pino-http |
| Security | Helmet, express-rate-limit |
| Provider SDKs | groq-sdk, @cerebras/cerebras_cloud_sdk, openai, @openrouter/sdk, @google/generative-ai |
| Dev Tools | ESLint 9, tsx (hot reload) |
Distributed under the MIT License. See LICENSE for details.