Skip to main content

Understanding Tokens

Tokens are the unit of LLM pricing. Roughly:
  • 1 token ≈ 4 characters (English)
  • 1 token ≈ 0.75 words
  • 100 tokens ≈ 75 words

Pricing by Provider

ModelInput (per 1M)Output (per 1M)
Claude Haiku$0.25$1.25
Claude Sonnet$3.00$15.00
Claude Opus$15.00$75.00
Prices change frequently. Check provider pricing pages for current rates. Output tokens typically cost 3-5x more than input.

Cost Estimation

Per-Task Estimates

Task TypeEst. TokensEst. Cost (Sonnet)
Simple Q&A1K in + 500 out$0.01
Code review10K in + 2K out$0.06
Full implementation50K in + 10K out$0.30
Large codebase analysis150K in + 5K out$0.52

Monthly Projections

Light usage (10 tasks/day):
  ~$3-5/month

Medium usage (50 tasks/day):
  ~$15-30/month

Heavy usage (200 tasks/day):
  ~$60-150/month

Optimization Strategies

1. Reduce Input Tokens

Use targeted reads instead of full files:
# Bad: Read entire codebase
Read **/*.ts

# Good: Query specific patterns
Grep "export function" --type ts
Leverage project config for persistent context:
# CLAUDE.md
Project context that doesn't need repeating each message.

2. Reduce Output Tokens

Request concise responses:
Respond in 2-3 sentences. No explanations unless asked.
Use structured output:
Return only JSON, no prose:
{"status": "success", "changes": [...]}

3. Use Appropriate Models

// Expensive: Using Opus for simple lookup
const result = await opus.query("What's in package.json?");

// Optimal: Use Haiku for simple tasks
const result = await haiku.query("What's in package.json?");

4. Cache Expensive Operations

// Cache research results
const cacheKey = hash(query);
let result = await cache.get(cacheKey);

if (!result) {
  result = await agent.research(query);
  await cache.set(cacheKey, result, TTL);
}

5. Batch Operations

// Bad: Many small calls
for (const file of files) {
  await analyze(file);  // 10 API calls
}

// Good: Batch into one call
await analyzeBatch(files);  // 1 API call

Monitoring Costs

Track Usage

# View recent costs
squads feedback stats

# Check specific agent costs
squads memory show engineering

Set Budgets

Define budget limits in agent configurations:
# Agent Configuration

## Budget
- Max tokens per task: 50,000
- Max cost per day: $5.00
- Alert threshold: 80%

Cost Alerts

// Monitor in hooks
if (taskCost > BUDGET_THRESHOLD) {
  notify(`High cost task: $${taskCost}`);
}

Cost-Saving Patterns

Progressive Enhancement

Start: Haiku (cheap, fast)

If insufficient: Sonnet

If still insufficient: Opus

Summarize Before Processing

Large document (100K tokens)

Haiku summarizes → Summary (5K tokens)

Sonnet processes summary

Parallel with Haiku, Synthesize with Sonnet

1

Fan out (cheap)

Run Task 1, Task 2, Task 3 in parallel with Haiku
2

Synthesize (quality)

Sonnet combines all results into final output

ROI Framework

Calculate Value per Token

Value = (Time Saved × Hourly Rate) / Tokens Used

Example:
  Task saves 2 hours of dev time
  Dev rate: $100/hour
  Tokens used: 50,000

  Value = ($200) / (50,000 tokens)
  Value = $0.004 per token

  Cost = 50,000 × $0.000003 = $0.15

  ROI = ($200 - $0.15) / $0.15 = 133,233%

When to Optimize

ScenarioPriority
High volume, low valueHigh - optimize aggressively
Low volume, high valueLow - focus on quality
High volume, high valueMedium - balance both
Low volume, low valueQuestion if needed at all

Best Practices

  • Monitor costs weekly at minimum
  • Set per-agent and per-squad budgets
  • Use fast/cheap models for high-volume tasks
  • Cache expensive research results
  • Batch similar operations
  • Track ROI, not just costs
Common cost traps:
  • Reading entire codebases repeatedly
  • Using premium models for simple tasks
  • Generating verbose explanations
  • Not caching repeated queries
  • Running agents without monitoring