Skip to content

Jianshu-She/Token-Routing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”€ Token-Level Routing for Edge Inference

Efficient collaborative decoding between edge-deployed small models and cloud-based large language models.

🎬 Demo

System Overview

πŸŽ₯ Watch the demo:
Watch the demo


🧠 Overview

This project implements Token-Level Routing, a novel collaborative inference system where a small on-device model performs most decoding, selectively routing critical tokens to a powerful cloud-based LLM.

This approach significantly reduces latency and cost while retaining output quality β€” ideal for edge scenarios like mobile phones or IoT devices.


πŸš€ Key Features

  • ⚑ Efficient: >60% accuracy boost by routing only ~7% of tokens to the LLM.
  • 🌐 Edge-Cloud Collaboration: Combines local lightweight models with cloud intelligence.
  • 🧭 Token-Level Routing: Fine-grained, confidence-driven token control.
  • πŸ“± Deployable: Lightweight ONNX runtime works on laptops and mobile devices.
  • πŸ–₯️ LLM Backend: Compatible with [SGLang] for LLM serving and kv-cache extension.

🧩 Architecture

+-------------+           +-------------+           +-------------+
|  User Input |--Prompt-->|  SLM (ONNX) |--Tokens-->|   Router     |
+-------------+           +-------------+           +-------------+
                                                 |
                            Tokens with low confidence
                                                 v
                                      +------------------+
                                      | LLM (Server-side)|
                                      +------------------+

πŸ“˜ Usage

See Guideline.md for setup and usage instructions.


πŸ’» Platform Support

  • βœ… macOS (Apple M1/M2/M3) are already support
  • πŸ”§ Android under development!

πŸ“« Contact

For questions or collaborations, feel free to open an issue or email us.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages