Multi-Model Freedom¶
One of pi's killer features: seamless model switching mid-session across 15+ providers.
Supported Providers¶
Via subscription (OAuth): - Anthropic Claude Pro/Max - OpenAI ChatGPT Plus/Pro (Codex) - GitHub Copilot - Google Gemini CLI - Google Antigravity
Via API key: - Anthropic, OpenAI, Azure OpenAI, Google Gemini, Google Vertex - Amazon Bedrock, Mistral, Groq, Cerebras, xAI - OpenRouter, Vercel AI Gateway, ZAI, OpenCode Zen - Hugging Face, Kimi For Coding, MiniMax
Self-hosted: - Ollama, llama.cpp, vLLM, LM Studio (via OpenAI-compatible API)
Switching Models¶
Ctrl+L -- Open model selector (full list)
Ctrl+P -- Cycle through scoped/favorite models
Shift+Ctrl+P -- Cycle backward
/model -- Switch via command
Context Handoff¶
Pi-ai is designed from the ground up for cross-provider context handoff. When you switch from Anthropic to OpenAI mid-session:
- Thinking traces are converted to
<thinking></thinking>content blocks - Provider-specific signed blobs are handled transparently
- Tool call history is preserved across providers
- Token/cost tracking continues accurately
This is best-effort (providers have different capabilities), but it works well in practice.
From Ewald Benes:
Models have finally become a commodity to me. I'm currently cycling through Anthropic, z.ai, and Moonshot AI (Kimi) within the same session. I can swap the "brain" of the agent mid-stream, and the new model picks up the context seamlessly where the last one left off.
Model Aliases (via pi-model-switch extension)¶
Install the community extension:
Configure aliases in ~/.pi/agent/extensions/model-switch/aliases.json:
{
"cheap": "google/gemini-2.5-flash",
"fast": "google/gemini-2.5-flash",
"coding": "anthropic/claude-opus-4-5",
"budget": ["openai/gpt-5-mini", "google/gemini-2.5-flash"]
}
Then just say: "switch to cheap" or "use the coding model for this refactor."
Custom Providers¶
Add providers via ~/.pi/agent/models.json if they speak a supported API:
import { getModel, stream } from "@mariozechner/pi-ai";
const ollamaModel = {
id: "llama-3.1-8b",
name: "Llama 3.1 8B (Ollama)",
api: "openai-completions",
provider: "ollama",
baseUrl: "http://localhost:11434/v1",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 32000,
};
Scoped Models¶
Configure a subset of models for quick cycling with Ctrl+P:
Use --models <patterns> on the CLI for comma-separated patterns.
Practical Workflow¶
A typical multi-model session:
- Start with Gemini Flash for quick exploration (cheap, fast, huge context)
- Switch to Claude Sonnet for implementation (best coding)
- Bring in GPT-5.2 via oracle extension for review (strong reasoning)
- Switch to a cheap model for boilerplate/tests
All in one session, all context preserved.