Multi-LLM Usage

CLI Delegation

Agents Squads enables multi-LLM support by delegating to native provider CLIs. No SDK integrations needed—just shell out to claude, gemini, codex, grok, etc.

Check Available Providers

squads providers

Provider     CLI       Status
────────────────────────────────────────
Anthropic    claude    ✓ ready
Google       gemini    ✓ ready
OpenAI       codex     ✓ ready
Mistral      vibe      ✓ ready
xAI          grok      ✓ ready
Aider        aider     ✓ ready
Ollama       ollama    ✓ ready

✓ All providers ready

Run with Specific Provider

# Override provider for this run
squads run research/analyst --execute --provider=google

# Uses Gemini CLI instead of Claude

Provider Resolution

The CLI resolves providers in this order:

Agent file - provider: in frontmatter or ## Provider header
CLI flag - --provider=google
Squad default - providers.default in SQUAD.md
Fallback - anthropic

Configuration

Squad-Level Providers

Configure default providers in SQUAD.md frontmatter:

---
name: intelligence
mission: Market research and competitive analysis

providers:
  default: anthropic      # CLI: claude
  vision: openai          # CLI: codex (for image analysis)
  realtime: xai           # CLI: grok (for real-time data)
  cheap: google           # CLI: gemini (for high-volume)
---

Agent-Level Override

Override the provider for specific agents:

Frontmatter
Header

---
provider: xai
---

# Social Monitor

## Purpose
Real-time X/Twitter trend detection using Grok.

# Social Monitor

## Provider
xai

## Purpose
Real-time X/Twitter trend detection.

Supported CLIs

Provider	CLI	Install	Non-Interactive Flag
Anthropic	`claude`	`npm i -g @anthropic-ai/claude-code`	`--print`
Google	`gemini`	`npm i -g @google/gemini-cli`	`--prompt`
OpenAI	`codex`	`npm i -g @openai/codex`	`exec` subcommand
Mistral	`vibe`	`curl -LsSf https://mistral.ai/vibe/install.sh \| bash`	`--prompt --auto-approve`
xAI	`grok`	`bun add -g @vibe-kit/grok-cli`	`--prompt`
Multi	`aider`	`pip install aider-install && aider-install`	`--message --yes`
Local	`ollama`	`brew install ollama`	`run <model> "<prompt>"`

Environment Variables

Each CLI reads its own API keys:

# .env (or shell profile)
ANTHROPIC_API_KEY=sk-ant-...   # claude
GOOGLE_API_KEY=AIza...          # gemini (or OAuth)
OPENAI_API_KEY=sk-...           # codex
MISTRAL_API_KEY=...             # vibe
XAI_API_KEY=...                 # grok

Why Use Multiple LLMs?

Different LLMs excel at different tasks. A well-designed agent system can leverage:

Claude - Complex reasoning, nuanced analysis, long context
GPT-4 - General purpose, wide knowledge, tool use
Gemini - Multimodal, Google ecosystem integration
Grok - Real-time data, X/Twitter integration
Llama/Open models - Privacy, self-hosting, cost control

Provider Comparison

Provider	Strengths	Best For
Claude (Anthropic)	Reasoning, safety, long context	Complex analysis, code review
GPT-4 (OpenAI)	Versatility, ecosystem	General tasks, plugins
Gemini (Google)	Multimodal, speed	Vision tasks, Google integration
Grok (xAI)	Real-time, humor	Social media, current events
Llama (Meta)	Open source, self-host	Privacy-sensitive, offline
Mistral	European, efficient	EU compliance, edge deployment

Model Tiers (Within Providers)

Each provider offers different capability tiers:

Anthropic (Claude)
OpenAI
Google
xAI

Model	Use Case	Cost
Claude Opus	Complex reasoning	$$$
Claude Sonnet	Balanced default	$$
Claude Haiku	Fast, simple tasks	$

Model	Use Case	Cost
GPT-4o	Multimodal, flagship	$$$
GPT-4o-mini	Fast, efficient	$
o1/o3	Deep reasoning	$$$$

Model	Use Case	Cost
Gemini Ultra	Most capable	$$$
Gemini Pro	Balanced	$$
Gemini Flash	Speed optimized	$

Model	Use Case	Cost
Grok-2	Full capability	$$$
Grok-2-mini	Faster responses	$$

Squad Configuration

Agent-Level Provider Selection

Assign different providers to different agents:

# SQUAD.md - Intelligence Squad

## Agents

### market-researcher
**Provider**: Claude Sonnet
**Purpose**: Deep market analysis requiring nuanced reasoning

### social-monitor
**Provider**: Grok
**Purpose**: Real-time X/Twitter monitoring and trend detection

### data-analyst
**Provider**: GPT-4o
**Purpose**: Spreadsheet analysis with vision capabilities

### summarizer
**Provider**: Gemini Flash
**Purpose**: Fast summarization of research findings

Environment Configuration

Set up API keys for each provider:

# .env
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=AIza...
XAI_API_KEY=xai-...

Provider Selection in Code

Claude Code
OpenAI
Gemini
Grok

# Agent: Market Researcher

**Model**: claude-sonnet-4-20250514

## Instructions
Analyze market trends using deep reasoning...

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

import google.generativeai as genai

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content(prompt)

from openai import OpenAI  # Grok uses OpenAI-compatible API

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1"
)
response = client.chat.completions.create(
    model="grok-2",
    messages=[{"role": "user", "content": prompt}]
)

Routing Patterns

Task-Based Routing

Route tasks to the best provider:

Task Requirement	Best Provider
Real-time data	Grok
Image analysis	GPT-4o / Gemini
Deep reasoning	Claude Opus
Google integration	Gemini
General tasks	Claude Sonnet / GPT-4o

Cascade Pattern

Start cheap, escalate when needed:

Start cheap

Gemini Flash (fastest, cheapest)

If insufficient

Escalate to Claude Sonnet

If still insufficient

Escalate to Claude Opus

Consensus Pattern

Use multiple providers for critical decisions:

Critical Decision → Run in parallel across Claude, GPT-4o, and Gemini → Voting/Synthesis → Final Answer

Cost Optimization

Price Comparison (approximate per 1M tokens)

Provider	Input	Output
Claude Haiku	$0.25	$1.25
GPT-4o-mini	$0.15	$0.60
Gemini Flash	$0.075	$0.30
Claude Sonnet	$3.00	$15.00
GPT-4o	$2.50	$10.00
Gemini Pro	$1.25	$5.00
Claude Opus	$15.00	$75.00

Prices change frequently. Check provider pricing pages for current rates.

Cost Strategy

## Budget Allocation Example

- 60% → Gemini Flash / GPT-4o-mini (high-volume tasks)
- 30% → Claude Sonnet / GPT-4o (standard tasks)
- 10% → Claude Opus / o1 (complex reasoning)

Implementation Examples

Multi-Provider Squad

# .agents/squads/intelligence/SQUAD.md

name: Intelligence Squad
description: Multi-provider research and analysis

agents:
  - name: trend-scanner
    provider: grok
    model: grok-2-mini
    purpose: Real-time social trend detection

  - name: deep-researcher
    provider: anthropic
    model: claude-sonnet-4-20250514
    purpose: In-depth analysis and synthesis

  - name: data-visualizer
    provider: openai
    model: gpt-4o
    purpose: Chart and image generation

  - name: fast-summarizer
    provider: google
    model: gemini-1.5-flash
    purpose: Quick summaries and translations

Provider Abstraction

Create a unified interface:

// lib/llm.ts
type Provider = 'anthropic' | 'openai' | 'google' | 'xai';

interface LLMConfig {
  provider: Provider;
  model: string;
  temperature?: number;
}

async function query(config: LLMConfig, prompt: string) {
  switch (config.provider) {
    case 'anthropic':
      return queryAnthropic(config.model, prompt);
    case 'openai':
      return queryOpenAI(config.model, prompt);
    case 'google':
      return queryGemini(config.model, prompt);
    case 'xai':
      return queryGrok(config.model, prompt);
  }
}

Best Practices

Match provider strengths to task requirements
Use cheaper models for high-volume, simple tasks
Reserve expensive models for complex reasoning
Implement fallbacks across providers for reliability
Monitor costs per provider weekly
Abstract provider selection for easy switching

Avoid:

Using one provider for everything (miss optimizations)
Ignoring rate limits (each provider has different limits)
Hardcoding provider choice (make it configurable)
Forgetting about latency differences

Token Economics

Optimize costs across providers

Agent Parallelization

Run multi-provider agents concurrently

Get Started

Core Concepts

Configuration

Building Agents

Governance

Production

Resources

API

CLI Delegation

Check Available Providers

Run with Specific Provider

Provider Resolution

Configuration

Squad-Level Providers

Agent-Level Override

Supported CLIs

Environment Variables

Why Use Multiple LLMs?

Provider Comparison

Model Tiers (Within Providers)

Squad Configuration

Agent-Level Provider Selection

Environment Configuration

Provider Selection in Code

Routing Patterns

Task-Based Routing

Cascade Pattern

Consensus Pattern

Cost Optimization

Price Comparison (approximate per 1M tokens)

Cost Strategy

Implementation Examples

Multi-Provider Squad

Provider Abstraction

Best Practices

Token Economics

Agent Parallelization

Get Started

Core Concepts

Configuration

Building Agents

Governance

Production

Resources

API

​CLI Delegation

​Check Available Providers

​Run with Specific Provider

​Provider Resolution

​Configuration

​Squad-Level Providers

​Agent-Level Override

​Supported CLIs

​Environment Variables

​Why Use Multiple LLMs?

​Provider Comparison

​Model Tiers (Within Providers)

​Squad Configuration

​Agent-Level Provider Selection

​Environment Configuration

​Provider Selection in Code

​Routing Patterns

​Task-Based Routing

​Cascade Pattern

​Consensus Pattern

​Cost Optimization

​Price Comparison (approximate per 1M tokens)

​Cost Strategy

​Implementation Examples

​Multi-Provider Squad

​Provider Abstraction

​Best Practices

​Related

Token Economics

Agent Parallelization

CLI Delegation

Check Available Providers

Run with Specific Provider

Provider Resolution

Configuration

Squad-Level Providers

Agent-Level Override

Supported CLIs

Environment Variables

Why Use Multiple LLMs?

Provider Comparison

Model Tiers (Within Providers)

Squad Configuration

Agent-Level Provider Selection

Environment Configuration

Provider Selection in Code

Routing Patterns

Task-Based Routing

Cascade Pattern

Consensus Pattern

Cost Optimization

Price Comparison (approximate per 1M tokens)

Cost Strategy

Implementation Examples

Multi-Provider Squad

Provider Abstraction

Best Practices

Related