Understanding Tokens
Tokens are the unit of LLM pricing. Roughly:
1 token ≈ 4 characters (English)
1 token ≈ 0.75 words
100 tokens ≈ 75 words
Pricing by Provider
Anthropic (Claude)
OpenAI
Google (Gemini)
xAI (Grok)
Model Input (per 1M) Output (per 1M) Claude Haiku $0.25 $1.25 Claude Sonnet $3.00 $15.00 Claude Opus $15.00 $75.00
Model Input (per 1M) Output (per 1M) GPT-4o-mini $0.15 $0.60 GPT-4o $2.50 $10.00 o1 $15.00 $60.00
Model Input (per 1M) Output (per 1M) Gemini Flash $0.075 $0.30 Gemini Pro $1.25 $5.00 Gemini Ultra $7.00 $21.00
Model Input (per 1M) Output (per 1M) Grok-2-mini $0.10 $0.40 Grok-2 $2.00 $10.00
Prices change frequently. Check provider pricing pages for current rates. Output tokens typically cost 3-5x more than input.
Cost Estimation
Per-Task Estimates
Task Type Est. Tokens Est. Cost (Sonnet) Simple Q&A 1K in + 500 out $0.01 Code review 10K in + 2K out $0.06 Full implementation 50K in + 10K out $0.30 Large codebase analysis 150K in + 5K out $0.52
Monthly Projections
Light usage (10 tasks/day):
~$3-5/month
Medium usage (50 tasks/day):
~$15-30/month
Heavy usage (200 tasks/day):
~$60-150/month
Optimization Strategies
Use targeted reads instead of full files:
# Bad: Read entire codebase
Read ** / * .ts
# Good: Query specific patterns
Grep "export function" --type ts
Leverage project config for persistent context:
Claude Code
Gemini CLI
Cursor
OpenCode
# CLAUDE.md
Project context that doesn't need repeating each message.
# GEMINI.md
Project context that doesn't need repeating each message.
# .cursorrules
Project context that doesn't need repeating each message.
# AGENTS.md
Project context that doesn't need repeating each message.
2. Reduce Output Tokens
Request concise responses:
Respond in 2-3 sentences. No explanations unless asked.
Use structured output:
Return only JSON, no prose:
{"status": "success", "changes": [ ... ]}
3. Use Appropriate Models
// Expensive: Using Opus for simple lookup
const result = await opus . query ( "What's in package.json?" );
// Optimal: Use Haiku for simple tasks
const result = await haiku . query ( "What's in package.json?" );
// Expensive: Using GPT-4o for simple lookup
const result = await gpt4o . query ( "What's in package.json?" );
// Optimal: Use GPT-4o-mini for simple tasks
const result = await gpt4oMini . query ( "What's in package.json?" );
// Expensive: Using Gemini Pro for simple lookup
const result = await geminiPro . query ( "What's in package.json?" );
// Optimal: Use Gemini Flash for simple tasks
const result = await geminiFlash . query ( "What's in package.json?" );
4. Cache Expensive Operations
// Cache research results
const cacheKey = hash ( query );
let result = await cache . get ( cacheKey );
if ( ! result ) {
result = await agent . research ( query );
await cache . set ( cacheKey , result , TTL );
}
5. Batch Operations
// Bad: Many small calls
for ( const file of files ) {
await analyze ( file ); // 10 API calls
}
// Good: Batch into one call
await analyzeBatch ( files ); // 1 API call
Monitoring Costs
Track Usage
# View recent costs
squads feedback stats
# Check specific agent costs
squads memory show engineering
Set Budgets
Define budget limits in agent configurations:
# Agent Configuration
## Budget
- Max tokens per task: 50,000
- Max cost per day: $5.00
- Alert threshold: 80%
Cost Alerts
// Monitor in hooks
if ( taskCost > BUDGET_THRESHOLD ) {
notify ( `High cost task: $ ${ taskCost } ` );
}
Cost-Saving Patterns
Progressive Enhancement
Anthropic
OpenAI
Google
Multi-Provider
Start: Haiku (cheap, fast)
↓
If insufficient: Sonnet
↓
If still insufficient: Opus
Start: GPT-4o-mini (cheap, fast)
↓
If insufficient: GPT-4o
↓
If still insufficient: o1
Start: Gemini Flash (cheap, fast)
↓
If insufficient: Gemini Pro
↓
If still insufficient: Gemini Ultra
Start: Gemini Flash (cheapest)
↓
If insufficient: Claude Sonnet
↓
If still insufficient: Claude Opus / o1
Summarize Before Processing
Large document (100K tokens)
↓
Haiku summarizes → Summary (5K tokens)
↓
Sonnet processes summary
Parallel with Haiku, Synthesize with Sonnet
Fan out (cheap)
Run Task 1, Task 2, Task 3 in parallel with Haiku
Synthesize (quality)
Sonnet combines all results into final output
ROI Framework
Calculate Value per Token
Value = (Time Saved × Hourly Rate) / Tokens Used
Example:
Task saves 2 hours of dev time
Dev rate: $100/hour
Tokens used: 50,000
Value = ($200) / (50,000 tokens)
Value = $0.004 per token
Cost = 50,000 × $0.000003 = $0.15
ROI = ($200 - $0.15) / $0.15 = 133,233%
When to Optimize
Scenario Priority High volume, low value High - optimize aggressively Low volume, high value Low - focus on quality High volume, high value Medium - balance both Low volume, low value Question if needed at all
Best Practices
Monitor costs weekly at minimum
Set per-agent and per-squad budgets
Use fast/cheap models for high-volume tasks
Cache expensive research results
Batch similar operations
Track ROI, not just costs
Common cost traps:
Reading entire codebases repeatedly
Using premium models for simple tasks
Generating verbose explanations
Not caching repeated queries
Running agents without monitoring
Multi-LLM Usage Choose the right model for each task
Context Optimization Reduce input token usage