Short intro
A chat box is a starting interface, not a complete AI product architecture. Once tasks, users, and plans diverge, the system needs routing.
What I was trying to do
I wanted Zenquanta to avoid treating every prompt as the same kind of work. Planning, writing, debugging, analysis, and image tasks should not all hit the same model by default.
What I learned
- Different tasks need different models.
- Cost, latency, quality, modality, and plan limits shape the product experience.
- Assistant families create structure for users before the model is even called.
- Prompt precheck can improve UX by recommending a better assistant or warning about unsupported requests.
Technical notes
- Routing can start as simple policy: assistant family + plan + modality -> model.
- Raw provider cost should be tracked separately from user-facing usage.
- Fallbacks matter because provider failures are product failures if they are not handled.
- Streaming complicates accounting because the response is still arriving while usage is counted.
Problems / open questions
- When does routing need a classifier instead of rules?
- How transparent should model selection be to users?
- What quality signals should feed future routing decisions?
Next steps
- Document assistant family responsibilities.
- Track model call latency and cost by route.
- Add admin visibility into fallback behavior.
- Connect this to the AI model routing blueprint.