Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 137 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ One wallet, 30+ models, zero API keys.

## Why ClawRouter?

- **100% local routing** — 14-dimension weighted scoring runs on your machine in <1ms
- **100% local routing** — 15-dimension weighted scoring runs on your machine in <1ms
- **Zero external calls** — no API calls for routing decisions, ever
- **30+ models** — OpenAI, Anthropic, Google, DeepSeek, xAI, Moonshot through one wallet
- **x402 micropayments** — pay per request with USDC on Base, no API keys
Expand Down Expand Up @@ -94,21 +94,22 @@ Request → Weighted Scorer (14 dimensions)

No external classifier calls. Ambiguous queries default to the MEDIUM tier (DeepSeek/GPT-4o-mini) — fast, cheap, and good enough for most tasks.

### 14-Dimension Weighted Scoring
### 15-Dimension Weighted Scoring

| Dimension | Weight | What It Detects |
| -------------------- | ------ | ---------------------------------------- |
| Reasoning markers | 0.18 | "prove", "theorem", "step by step" |
| Code presence | 0.15 | "function", "async", "import", "```" |
| Simple indicators | 0.12 | "what is", "define", "translate" |
| Multi-step patterns | 0.12 | "first...then", "step 1", numbered lists |
| **Agentic task** | 0.10 | "run", "test", "fix", "deploy", "edit" |
| Technical terms | 0.10 | "algorithm", "kubernetes", "distributed" |
| Token count | 0.08 | short (<50) vs long (>500) prompts |
| Creative markers | 0.05 | "story", "poem", "brainstorm" |
| Question complexity | 0.05 | Multiple question marks |
| Constraint count | 0.04 | "at most", "O(n)", "maximum" |
| Imperative verbs | 0.03 | "build", "create", "implement" |
| Output format | 0.03 | "json", "yaml", "schema" |
| Simple indicators | 0.02 | "what is", "define", "translate" |
| Domain specificity | 0.02 | "quantum", "fpga", "genomics" |
| Reference complexity | 0.02 | "the docs", "the api", "above" |
| Negation complexity | 0.01 | "don't", "avoid", "without" |
Expand All @@ -131,15 +132,93 @@ Mixed-language prompts are supported — keywords from all languages are checked

### Tier → Model Mapping

| Tier | Primary Model | Cost/M | Savings vs Opus |
| --------- | ----------------- | ------ | --------------- |
| SIMPLE | gemini-2.5-flash | $0.60 | **99.2%** |
| MEDIUM | deepseek-chat | $0.42 | **99.4%** |
| COMPLEX | claude-opus-4 | $75.00 | baseline |
| REASONING | deepseek-reasoner | $0.42 | **99.4%** |
| Tier | Primary Model | Cost/M | Savings vs Opus |
| --------- | ---------------------- | ------ | --------------- |
| SIMPLE | gemini-2.5-flash | $0.60 | **99.2%** |
| MEDIUM | grok-code-fast-1 | $1.50 | **98.0%** |
| COMPLEX | gemini-2.5-pro | $10.00 | **86.7%** |
| REASONING | grok-4-fast-reasoning | $0.50 | **99.3%** |

Special rule: 2+ reasoning markers → REASONING at 0.97 confidence.

### Agentic Auto-Detection

ClawRouter automatically detects multi-step agentic tasks and routes to models optimized for autonomous execution:

```
"what is 2+2" → gemini-flash (standard)
"build the project then run tests" → kimi-k2.5 (auto-agentic)
"fix the bug and make sure it works" → kimi-k2.5 (auto-agentic)
```

**How it works:**
- Detects agentic keywords: file ops ("read", "edit"), execution ("run", "test", "deploy"), iteration ("fix", "debug", "verify")
- Threshold: 2+ signals triggers auto-switch to agentic tiers
- No config needed — works automatically

**Agentic tier models** (optimized for multi-step autonomy):

| Tier | Agentic Model | Why |
| --------- | -------------------- | -------------------------------------- |
| SIMPLE | claude-haiku-4.5 | Fast + reliable tool use |
| MEDIUM | kimi-k2.5 | 200+ tool chains, 76% cheaper |
| COMPLEX | claude-sonnet-4 | Best balance for complex tasks |
| REASONING | kimi-k2.5 | Extended reasoning + execution |

You can also force agentic mode via config:

```yaml
# openclaw.yaml
plugins:
- id: "@blockrun/clawrouter"
config:
routing:
overrides:
agenticMode: true # Always use agentic tiers
```

### Tool Detection (v0.5)

When your request includes a `tools` array (function calling), ClawRouter automatically switches to agentic tiers:

```typescript
// Request with tools → auto-agentic mode
{
model: "blockrun/auto",
messages: [{ role: "user", content: "Check the weather" }],
tools: [{ type: "function", function: { name: "get_weather", ... } }]
}
// → Routes to claude-haiku-4.5 (excellent tool use)
// → Instead of gemini-flash (may produce malformed tool calls)
```

**Why this matters:** Some models (like `deepseek-reasoner`) are optimized for chain-of-thought reasoning but can generate malformed tool calls. Tool detection ensures requests with functions go to models proven to handle tool use correctly.

### Context-Length-Aware Routing (v0.5)

ClawRouter automatically filters out models that can't handle your context size:

```
150K token request:
Full chain: [grok-4-fast (131K), deepseek (128K), kimi (262K), gemini (1M)]
Filtered: [kimi (262K), gemini (1M)]
→ Skips models that would fail with "context too long" errors
```

This prevents wasted API calls and faster fallback to capable models.

### Session Persistence (v0.5)

For multi-turn conversations, ClawRouter pins the model to prevent mid-task switching:

```
Turn 1: "Build a React component" → claude-sonnet-4
Turn 2: "Add dark mode support" → claude-sonnet-4 (pinned)
Turn 3: "Now add tests" → claude-sonnet-4 (pinned)
```

Sessions are identified by conversation ID and persist for 1 hour of inactivity.

### Cost Savings (Real Numbers)

| Tier | % of Traffic | Cost/M |
Expand Down Expand Up @@ -179,8 +258,13 @@ Compared to **$75/M** for Claude Opus = **96% savings** on a typical workload.
| **xAI** | | | | |
| grok-3 | $3.00 | $15.00 | 131K | \* |
| grok-3-mini | $0.30 | $0.50 | 131K | |
| grok-4-fast-reasoning | $0.20 | $0.50 | 131K | \* |
| grok-4-fast | $0.20 | $0.50 | 131K | |
| grok-code-fast-1 | $0.20 | $1.50 | 131K | |
| **Moonshot** | | | | |
| kimi-k2.5 | $0.50 | $2.40 | 128K | \* |
| kimi-k2.5 | $0.50 | $2.40 | 262K | \* |
| **NVIDIA** | | | | |
| gpt-oss-120b | **FREE** | **FREE** | 128K | |

Full list: [`src/models.ts`](src/models.ts)

Expand Down Expand Up @@ -446,6 +530,38 @@ console.log(decision);

---

## Cost Tracking with /stats (v0.5)

Track your savings in real-time:

```bash
# In any OpenClaw conversation
/stats
```

Output:
```
╔════════════════════════════════════════════════════════════╗
║ ClawRouter Usage Statistics ║
╠════════════════════════════════════════════════════════════╣
║ Period: last 7 days ║
║ Total Requests: 442 ║
║ Total Cost: $1.73 ║
║ Baseline Cost (Opus): $20.13 ║
║ 💰 Total Saved: $18.40 (91.4%) ║
╠════════════════════════════════════════════════════════════╣
║ Routing by Tier: ║
║ SIMPLE ███████████ 55.0% (243) ║
║ MEDIUM ██████ 30.8% (136) ║
║ COMPLEX █ 7.2% (32) ║
║ REASONING █ 7.0% (31) ║
╚════════════════════════════════════════════════════════════╝
```

Stats are stored locally at `~/.openclaw/blockrun/logs/` and aggregated on demand.

---

## Why Not OpenRouter / LiteLLM?

They're built for developers. ClawRouter is built for **agents**.
Expand All @@ -468,7 +584,7 @@ Agents shouldn't need a human to paste API keys. They should generate a wallet,
### Quick Checklist

```bash
# 1. Check your version (should be 0.3.21+)
# 1. Check your version (should be 0.5.0+)
cat ~/.openclaw/extensions/clawrouter/package.json | grep version

# 2. Check proxy is running
Expand All @@ -477,6 +593,9 @@ curl http://localhost:8402/health
# 3. Watch routing in action
openclaw logs --follow
# Should see: gemini-2.5-flash $0.0012 (saved 99%)

# 4. View cost savings
/stats
```

### "Unknown model: blockrun/auto" or "Unknown model: auto"
Expand Down Expand Up @@ -586,14 +705,19 @@ BLOCKRUN_WALLET_KEY=0x... npx tsx test-e2e.ts

## Roadmap

- [x] Smart routing — 14-dimension weighted scoring, 4-tier model selection
- [x] Smart routing — 15-dimension weighted scoring, 4-tier model selection
- [x] x402 payments — per-request USDC micropayments, non-custodial
- [x] Response dedup — prevents double-charge on retries
- [x] Payment pre-auth — skips 402 round trip
- [x] SSE heartbeat — prevents upstream timeouts
- [x] Agentic auto-detect — auto-switch to agentic models for multi-step tasks
- [x] Tool detection — auto-switch to agentic mode when tools array present
- [x] Context-aware routing — filter out models that can't handle context size
- [x] Session persistence — pin model for multi-turn conversations
- [x] Cost tracking — /stats command with savings dashboard
- [ ] Cascade routing — try cheap model first, escalate on low quality
- [ ] Spend controls — daily/monthly budgets
- [ ] Analytics dashboard — cost tracking at blockrun.ai
- [ ] Remote analytics — cost tracking at blockrun.ai

---

Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@blockrun/clawrouter",
"version": "0.4.7",
"version": "0.4.9",
"description": "Smart LLM router — save 78% on inference costs. 30+ models, one wallet, x402 micropayments.",
"type": "module",
"main": "dist/index.js",
Expand Down
64 changes: 62 additions & 2 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import { homedir } from "node:os";
import { join } from "node:path";
import { VERSION } from "./version.js";
import { privateKeyToAccount } from "viem/accounts";
import { getStats, formatStatsAscii } from "./stats.js";

/**
* Detect if we're running in shell completion mode.
Expand Down Expand Up @@ -279,6 +280,41 @@ async function startProxyInBackground(api: OpenClawPluginApi): Promise<void> {
api.logger.info(`BlockRun provider active — ${proxy.baseUrl}/v1 (smart routing enabled)`);
}

/**
* /stats command handler for ClawRouter.
* Shows usage statistics and cost savings.
*/
async function createStatsCommand(): Promise<OpenClawPluginCommandDefinition> {
return {
name: "stats",
description: "Show ClawRouter usage statistics and cost savings",
acceptsArgs: true,
requireAuth: false,
handler: async (ctx: PluginCommandContext) => {
const arg = ctx.args?.trim().toLowerCase() || "7";
const days = parseInt(arg, 10) || 7;

try {
const stats = await getStats(Math.min(days, 30)); // Cap at 30 days
const ascii = formatStatsAscii(stats);

return {
text: [
"```",
ascii,
"```",
].join("\n"),
};
} catch (err) {
return {
text: `Failed to load stats: ${err instanceof Error ? err.message : String(err)}`,
isError: true,
};
}
},
};
}

/**
* /wallet command handler for ClawRouter.
* - /wallet or /wallet status: Show wallet address, balance, and key file location
Expand Down Expand Up @@ -438,6 +474,17 @@ const plugin: OpenClawPluginDefinition = {
);
});

// Register /stats command for usage statistics
createStatsCommand()
.then((statsCommand) => {
api.registerCommand(statsCommand);
})
.catch((err) => {
api.logger.warn(
`Failed to register /stats command: ${err instanceof Error ? err.message : String(err)}`,
);
});

// Register a service with stop() for cleanup on gateway shutdown
// This prevents EADDRINUSE when the gateway restarts
api.registerService({
Expand Down Expand Up @@ -477,8 +524,17 @@ export default plugin;
export { startProxy, getProxyPort } from "./proxy.js";
export type { ProxyOptions, ProxyHandle, LowBalanceInfo, InsufficientFundsInfo } from "./proxy.js";
export { blockrunProvider } from "./provider.js";
export { OPENCLAW_MODELS, BLOCKRUN_MODELS, buildProviderModels } from "./models.js";
export { route, DEFAULT_ROUTING_CONFIG } from "./router/index.js";
export {
OPENCLAW_MODELS,
BLOCKRUN_MODELS,
buildProviderModels,
MODEL_ALIASES,
resolveModelAlias,
isAgenticModel,
getAgenticModels,
getModelContextWindow,
} from "./models.js";
export { route, DEFAULT_ROUTING_CONFIG, getFallbackChain, getFallbackChainFiltered } from "./router/index.js";
export type { RoutingDecision, RoutingConfig, Tier } from "./router/index.js";
export { logUsage } from "./logger.js";
export type { UsageEntry } from "./logger.js";
Expand All @@ -501,3 +557,7 @@ export {
} from "./errors.js";
export { fetchWithRetry, isRetryable, DEFAULT_RETRY_CONFIG } from "./retry.js";
export type { RetryConfig } from "./retry.js";
export { getStats, formatStatsAscii } from "./stats.js";
export type { DailyStats, AggregatedStats } from "./stats.js";
export { SessionStore, getSessionId, DEFAULT_SESSION_CONFIG } from "./session.js";
export type { SessionEntry, SessionConfig } from "./session.js";
3 changes: 3 additions & 0 deletions src/logger.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@ import { homedir } from "node:os";
export type UsageEntry = {
timestamp: string;
model: string;
tier: string;
cost: number;
baselineCost: number;
savings: number; // 0-1 percentage
latencyMs: number;
};

Expand Down
Loading
Loading