Token Economics

Understanding Tokens

Tokens are the unit of LLM pricing. Roughly:

1 token ≈ 4 characters (English)
1 token ≈ 0.75 words
100 tokens ≈ 75 words

Pricing by Provider

Anthropic (Claude)
OpenAI
Google (Gemini)
xAI (Grok)

Model	Input (per 1M)	Output (per 1M)
Claude Haiku	$0.25	$1.25
Claude Sonnet	$3.00	$15.00
Claude Opus	$15.00	$75.00

Model	Input (per 1M)	Output (per 1M)
GPT-4o-mini	$0.15	$0.60
GPT-4o	$2.50	$10.00
o1	$15.00	$60.00

Model	Input (per 1M)	Output (per 1M)
Gemini Flash	$0.075	$0.30
Gemini Pro	$1.25	$5.00
Gemini Ultra	$7.00	$21.00

Model	Input (per 1M)	Output (per 1M)
Grok-2-mini	$0.10	$0.40
Grok-2	$2.00	$10.00

Prices change frequently. Check provider pricing pages for current rates. Output tokens typically cost 3-5x more than input.

Cost Estimation

Per-Task Estimates

Task Type	Est. Tokens	Est. Cost (Sonnet)
Simple Q&A	1K in + 500 out	$0.01
Code review	10K in + 2K out	$0.06
Full implementation	50K in + 10K out	$0.30
Large codebase analysis	150K in + 5K out	$0.52

Monthly Projections

Light usage (10 tasks/day):
  ~$3-5/month

Medium usage (50 tasks/day):
  ~$15-30/month

Heavy usage (200 tasks/day):
  ~$60-150/month

Optimization Strategies

1. Reduce Input Tokens

Use targeted reads instead of full files:

# Bad: Read entire codebase
Read **/*.ts

# Good: Query specific patterns
Grep "export function" --type ts

Leverage project config for persistent context:

Claude Code
Gemini CLI
Cursor
OpenCode

# CLAUDE.md
Project context that doesn't need repeating each message.

# GEMINI.md
Project context that doesn't need repeating each message.

# .cursorrules
Project context that doesn't need repeating each message.

# AGENTS.md
Project context that doesn't need repeating each message.

2. Reduce Output Tokens

Request concise responses:

Respond in 2-3 sentences. No explanations unless asked.

Use structured output:

Return only JSON, no prose:
{"status": "success", "changes": [...]}

3. Use Appropriate Models

Anthropic
OpenAI
Google

// Expensive: Using Opus for simple lookup
const result = await opus.query("What's in package.json?");

// Optimal: Use Haiku for simple tasks
const result = await haiku.query("What's in package.json?");

// Expensive: Using GPT-4o for simple lookup
const result = await gpt4o.query("What's in package.json?");

// Optimal: Use GPT-4o-mini for simple tasks
const result = await gpt4oMini.query("What's in package.json?");

// Expensive: Using Gemini Pro for simple lookup
const result = await geminiPro.query("What's in package.json?");

// Optimal: Use Gemini Flash for simple tasks
const result = await geminiFlash.query("What's in package.json?");

4. Cache Expensive Operations

// Cache research results
const cacheKey = hash(query);
let result = await cache.get(cacheKey);

if (!result) {
  result = await agent.research(query);
  await cache.set(cacheKey, result, TTL);
}

5. Batch Operations

// Bad: Many small calls
for (const file of files) {
  await analyze(file);  // 10 API calls
}

// Good: Batch into one call
await analyzeBatch(files);  // 1 API call

Monitoring Costs

Track Usage

# View recent costs
squads feedback stats

# Check specific agent costs
squads memory show engineering

Set Budgets

Define budget limits in agent configurations:

# Agent Configuration

## Budget
- Max tokens per task: 50,000
- Max cost per day: $5.00
- Alert threshold: 80%

Cost Alerts

// Monitor in hooks
if (taskCost > BUDGET_THRESHOLD) {
  notify(`High cost task: $${taskCost}`);
}

Cost-Saving Patterns

Progressive Enhancement

Anthropic
OpenAI
Google
Multi-Provider

Start: Haiku (cheap, fast)
  ↓
If insufficient: Sonnet
  ↓
If still insufficient: Opus

Start: GPT-4o-mini (cheap, fast)
  ↓
If insufficient: GPT-4o
  ↓
If still insufficient: o1

Start: Gemini Flash (cheap, fast)
  ↓
If insufficient: Gemini Pro
  ↓
If still insufficient: Gemini Ultra

Start: Gemini Flash (cheapest)
  ↓
If insufficient: Claude Sonnet
  ↓
If still insufficient: Claude Opus / o1

Summarize Before Processing

Large document (100K tokens)
  ↓
Haiku summarizes → Summary (5K tokens)
  ↓
Sonnet processes summary

Parallel with Haiku, Synthesize with Sonnet

Fan out (cheap)

Run Task 1, Task 2, Task 3 in parallel with Haiku

Synthesize (quality)

Sonnet combines all results into final output

ROI Framework

Calculate Value per Token

Value = (Time Saved × Hourly Rate) / Tokens Used

Example:
  Task saves 2 hours of dev time
  Dev rate: $100/hour
  Tokens used: 50,000

  Value = ($200) / (50,000 tokens)
  Value = $0.004 per token

  Cost = 50,000 × $0.000003 = $0.15

  ROI = ($200 - $0.15) / $0.15 = 133,233%

When to Optimize

Scenario	Priority
High volume, low value	High - optimize aggressively
Low volume, high value	Low - focus on quality
High volume, high value	Medium - balance both
Low volume, low value	Question if needed at all

Best Practices

Monitor costs weekly at minimum
Set per-agent and per-squad budgets
Use fast/cheap models for high-volume tasks
Cache expensive research results
Batch similar operations
Track ROI, not just costs

Common cost traps:

Reading entire codebases repeatedly
Using premium models for simple tasks
Generating verbose explanations
Not caching repeated queries
Running agents without monitoring

Multi-LLM Usage

Choose the right model for each task

Context Optimization

Reduce input token usage

Get Started

Core Concepts

Configuration

Building Agents

Governance

Production

Resources

API

Understanding Tokens

Pricing by Provider

Cost Estimation

Per-Task Estimates

Monthly Projections

Optimization Strategies

1. Reduce Input Tokens

2. Reduce Output Tokens

3. Use Appropriate Models

4. Cache Expensive Operations

5. Batch Operations

Monitoring Costs

Track Usage

Set Budgets

Cost Alerts

Cost-Saving Patterns

Progressive Enhancement

Summarize Before Processing

Parallel with Haiku, Synthesize with Sonnet

ROI Framework

Calculate Value per Token

When to Optimize

Best Practices

Multi-LLM Usage

Context Optimization

Get Started

Core Concepts

Configuration

Building Agents

Governance

Production

Resources

API

​Understanding Tokens

​Pricing by Provider

​Cost Estimation

​Per-Task Estimates

​Monthly Projections

​Optimization Strategies

​1. Reduce Input Tokens

​2. Reduce Output Tokens

​3. Use Appropriate Models

​4. Cache Expensive Operations

​5. Batch Operations

​Monitoring Costs

​Track Usage

​Set Budgets

​Cost Alerts

​Cost-Saving Patterns

​Progressive Enhancement

​Summarize Before Processing

​Parallel with Haiku, Synthesize with Sonnet

​ROI Framework

​Calculate Value per Token

​When to Optimize

​Best Practices

​Related

Multi-LLM Usage

Context Optimization

Understanding Tokens

Pricing by Provider

Cost Estimation

Per-Task Estimates

Monthly Projections

Optimization Strategies

1. Reduce Input Tokens

2. Reduce Output Tokens

3. Use Appropriate Models

4. Cache Expensive Operations

5. Batch Operations

Monitoring Costs

Track Usage

Set Budgets

Cost Alerts

Cost-Saving Patterns

Progressive Enhancement

Summarize Before Processing

Parallel with Haiku, Synthesize with Sonnet

ROI Framework

Calculate Value per Token

When to Optimize

Best Practices

Related