CLI Delegation
Agents Squads enables multi-LLM support by delegating to native provider CLIs. No SDK integrations needed—just shell out to claude, gemini, codex, grok, etc.
Check Available Providers
Provider CLI Status
────────────────────────────────────────
Anthropic claude ✓ ready
Google gemini ✓ ready
OpenAI codex ✓ ready
Mistral vibe ✓ ready
xAI grok ✓ ready
Aider aider ✓ ready
Ollama ollama ✓ ready
✓ All providers ready
Run with Specific Provider
# Override provider for this run
squads run research/analyst --execute --provider=google
# Uses Gemini CLI instead of Claude
Provider Resolution
The CLI resolves providers in this order:
- Agent file -
provider: in frontmatter or ## Provider header
- CLI flag -
--provider=google
- Squad default -
providers.default in SQUAD.md
- Fallback -
anthropic
Configuration
Squad-Level Providers
Configure default providers in SQUAD.md frontmatter:
---
name: intelligence
mission: Market research and competitive analysis
providers:
default: anthropic # CLI: claude
vision: openai # CLI: codex (for image analysis)
realtime: xai # CLI: grok (for real-time data)
cheap: google # CLI: gemini (for high-volume)
---
Agent-Level Override
Override the provider for specific agents:
---
provider: xai
---
# Social Monitor
## Purpose
Real-time X/Twitter trend detection using Grok.
Supported CLIs
| Provider | CLI | Install | Non-Interactive Flag |
|---|
| Anthropic | claude | npm i -g @anthropic-ai/claude-code | --print |
| Google | gemini | npm i -g @google/gemini-cli | --prompt |
| OpenAI | codex | npm i -g @openai/codex | exec subcommand |
| Mistral | vibe | curl -LsSf https://mistral.ai/vibe/install.sh | bash | --prompt --auto-approve |
| xAI | grok | bun add -g @vibe-kit/grok-cli | --prompt |
| Multi | aider | pip install aider-install && aider-install | --message --yes |
| Local | ollama | brew install ollama | run <model> "<prompt>" |
Environment Variables
Each CLI reads its own API keys:
# .env (or shell profile)
ANTHROPIC_API_KEY=sk-ant-... # claude
GOOGLE_API_KEY=AIza... # gemini (or OAuth)
OPENAI_API_KEY=sk-... # codex
MISTRAL_API_KEY=... # vibe
XAI_API_KEY=... # grok
Why Use Multiple LLMs?
Different LLMs excel at different tasks. A well-designed agent system can leverage:
- Claude - Complex reasoning, nuanced analysis, long context
- GPT-4 - General purpose, wide knowledge, tool use
- Gemini - Multimodal, Google ecosystem integration
- Grok - Real-time data, X/Twitter integration
- Llama/Open models - Privacy, self-hosting, cost control
Provider Comparison
| Provider | Strengths | Best For |
|---|
| Claude (Anthropic) | Reasoning, safety, long context | Complex analysis, code review |
| GPT-4 (OpenAI) | Versatility, ecosystem | General tasks, plugins |
| Gemini (Google) | Multimodal, speed | Vision tasks, Google integration |
| Grok (xAI) | Real-time, humor | Social media, current events |
| Llama (Meta) | Open source, self-host | Privacy-sensitive, offline |
| Mistral | European, efficient | EU compliance, edge deployment |
Model Tiers (Within Providers)
Each provider offers different capability tiers:
Anthropic (Claude)
OpenAI
Google
xAI
| Model | Use Case | Cost |
|---|
| Claude Opus | Complex reasoning | $$$ |
| Claude Sonnet | Balanced default | $$ |
| Claude Haiku | Fast, simple tasks | $ |
| Model | Use Case | Cost |
|---|
| GPT-4o | Multimodal, flagship | $$$ |
| GPT-4o-mini | Fast, efficient | $ |
| o1/o3 | Deep reasoning | $$$$ |
| Model | Use Case | Cost |
|---|
| Gemini Ultra | Most capable | $$$ |
| Gemini Pro | Balanced | $$ |
| Gemini Flash | Speed optimized | $ |
| Model | Use Case | Cost |
|---|
| Grok-2 | Full capability | $$$ |
| Grok-2-mini | Faster responses | $$ |
Squad Configuration
Agent-Level Provider Selection
Assign different providers to different agents:
# SQUAD.md - Intelligence Squad
## Agents
### market-researcher
**Provider**: Claude Sonnet
**Purpose**: Deep market analysis requiring nuanced reasoning
### social-monitor
**Provider**: Grok
**Purpose**: Real-time X/Twitter monitoring and trend detection
### data-analyst
**Provider**: GPT-4o
**Purpose**: Spreadsheet analysis with vision capabilities
### summarizer
**Provider**: Gemini Flash
**Purpose**: Fast summarization of research findings
Environment Configuration
Set up API keys for each provider:
# .env
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=AIza...
XAI_API_KEY=xai-...
Provider Selection in Code
Claude Code
OpenAI
Gemini
Grok
# Agent: Market Researcher
**Model**: claude-sonnet-4-20250514
## Instructions
Analyze market trends using deep reasoning...
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content(prompt)
from openai import OpenAI # Grok uses OpenAI-compatible API
client = OpenAI(
api_key=os.environ["XAI_API_KEY"],
base_url="https://api.x.ai/v1"
)
response = client.chat.completions.create(
model="grok-2",
messages=[{"role": "user", "content": prompt}]
)
Routing Patterns
Task-Based Routing
Route tasks to the best provider:
| Task Requirement | Best Provider |
|---|
| Real-time data | Grok |
| Image analysis | GPT-4o / Gemini |
| Deep reasoning | Claude Opus |
| Google integration | Gemini |
| General tasks | Claude Sonnet / GPT-4o |
Cascade Pattern
Start cheap, escalate when needed:
Start cheap
Gemini Flash (fastest, cheapest)
If insufficient
Escalate to Claude Sonnet
If still insufficient
Escalate to Claude Opus
Consensus Pattern
Use multiple providers for critical decisions:
Critical Decision → Run in parallel across Claude, GPT-4o, and Gemini → Voting/Synthesis → Final Answer
Cost Optimization
Price Comparison (approximate per 1M tokens)
| Provider | Input | Output |
|---|
| Claude Haiku | $0.25 | $1.25 |
| GPT-4o-mini | $0.15 | $0.60 |
| Gemini Flash | $0.075 | $0.30 |
| Claude Sonnet | $3.00 | $15.00 |
| GPT-4o | $2.50 | $10.00 |
| Gemini Pro | $1.25 | $5.00 |
| Claude Opus | $15.00 | $75.00 |
Prices change frequently. Check provider pricing pages for current rates.
Cost Strategy
## Budget Allocation Example
- 60% → Gemini Flash / GPT-4o-mini (high-volume tasks)
- 30% → Claude Sonnet / GPT-4o (standard tasks)
- 10% → Claude Opus / o1 (complex reasoning)
Implementation Examples
Multi-Provider Squad
# .agents/squads/intelligence/SQUAD.md
name: Intelligence Squad
description: Multi-provider research and analysis
agents:
- name: trend-scanner
provider: grok
model: grok-2-mini
purpose: Real-time social trend detection
- name: deep-researcher
provider: anthropic
model: claude-sonnet-4-20250514
purpose: In-depth analysis and synthesis
- name: data-visualizer
provider: openai
model: gpt-4o
purpose: Chart and image generation
- name: fast-summarizer
provider: google
model: gemini-1.5-flash
purpose: Quick summaries and translations
Provider Abstraction
Create a unified interface:
// lib/llm.ts
type Provider = 'anthropic' | 'openai' | 'google' | 'xai';
interface LLMConfig {
provider: Provider;
model: string;
temperature?: number;
}
async function query(config: LLMConfig, prompt: string) {
switch (config.provider) {
case 'anthropic':
return queryAnthropic(config.model, prompt);
case 'openai':
return queryOpenAI(config.model, prompt);
case 'google':
return queryGemini(config.model, prompt);
case 'xai':
return queryGrok(config.model, prompt);
}
}
Best Practices
- Match provider strengths to task requirements
- Use cheaper models for high-volume, simple tasks
- Reserve expensive models for complex reasoning
- Implement fallbacks across providers for reliability
- Monitor costs per provider weekly
- Abstract provider selection for easy switching
Avoid:
- Using one provider for everything (miss optimizations)
- Ignoring rate limits (each provider has different limits)
- Hardcoding provider choice (make it configurable)
- Forgetting about latency differences