Understanding Tokens
Tokens are the unit of LLM pricing. Roughly:
- 1 token ≈ 4 characters (English)
- 1 token ≈ 0.75 words
- 100 tokens ≈ 75 words
Pricing by Provider
Anthropic (Claude)
OpenAI
Google (Gemini)
xAI (Grok)
| Model | Input (per 1M) | Output (per 1M) |
|---|
| Claude Haiku | $0.25 | $1.25 |
| Claude Sonnet | $3.00 | $15.00 |
| Claude Opus | $15.00 | $75.00 |
| Model | Input (per 1M) | Output (per 1M) |
|---|
| GPT-4o-mini | $0.15 | $0.60 |
| GPT-4o | $2.50 | $10.00 |
| o1 | $15.00 | $60.00 |
| Model | Input (per 1M) | Output (per 1M) |
|---|
| Gemini Flash | $0.075 | $0.30 |
| Gemini Pro | $1.25 | $5.00 |
| Gemini Ultra | $7.00 | $21.00 |
| Model | Input (per 1M) | Output (per 1M) |
|---|
| Grok-2-mini | $0.10 | $0.40 |
| Grok-2 | $2.00 | $10.00 |
Prices change frequently. Check provider pricing pages for current rates. Output tokens typically cost 3-5x more than input.
Cost Estimation
Per-Task Estimates
| Task Type | Est. Tokens | Est. Cost (Sonnet) |
|---|
| Simple Q&A | 1K in + 500 out | $0.01 |
| Code review | 10K in + 2K out | $0.06 |
| Full implementation | 50K in + 10K out | $0.30 |
| Large codebase analysis | 150K in + 5K out | $0.52 |
Monthly Projections
Light usage (10 tasks/day):
~$3-5/month
Medium usage (50 tasks/day):
~$15-30/month
Heavy usage (200 tasks/day):
~$60-150/month
Optimization Strategies
Use targeted reads instead of full files:
# Bad: Read entire codebase
Read **/*.ts
# Good: Query specific patterns
Grep "export function" --type ts
Leverage project config for persistent context:
Claude Code
Gemini CLI
Cursor
OpenCode
# CLAUDE.md
Project context that doesn't need repeating each message.
# GEMINI.md
Project context that doesn't need repeating each message.
# .cursorrules
Project context that doesn't need repeating each message.
# AGENTS.md
Project context that doesn't need repeating each message.
2. Reduce Output Tokens
Request concise responses:
Respond in 2-3 sentences. No explanations unless asked.
Use structured output:
Return only JSON, no prose:
{"status": "success", "changes": [...]}
3. Use Appropriate Models
// Expensive: Using Opus for simple lookup
const result = await opus.query("What's in package.json?");
// Optimal: Use Haiku for simple tasks
const result = await haiku.query("What's in package.json?");
// Expensive: Using GPT-4o for simple lookup
const result = await gpt4o.query("What's in package.json?");
// Optimal: Use GPT-4o-mini for simple tasks
const result = await gpt4oMini.query("What's in package.json?");
// Expensive: Using Gemini Pro for simple lookup
const result = await geminiPro.query("What's in package.json?");
// Optimal: Use Gemini Flash for simple tasks
const result = await geminiFlash.query("What's in package.json?");
4. Cache Expensive Operations
// Cache research results
const cacheKey = hash(query);
let result = await cache.get(cacheKey);
if (!result) {
result = await agent.research(query);
await cache.set(cacheKey, result, TTL);
}
5. Batch Operations
// Bad: Many small calls
for (const file of files) {
await analyze(file); // 10 API calls
}
// Good: Batch into one call
await analyzeBatch(files); // 1 API call
Monitoring Costs
Track Usage
# View recent costs
squads feedback stats
# Check specific agent costs
squads memory show engineering
Set Budgets
Define budget limits in agent configurations:
# Agent Configuration
## Budget
- Max tokens per task: 50,000
- Max cost per day: $5.00
- Alert threshold: 80%
Cost Alerts
// Monitor in hooks
if (taskCost > BUDGET_THRESHOLD) {
notify(`High cost task: $${taskCost}`);
}
Cost-Saving Patterns
Progressive Enhancement
Anthropic
OpenAI
Google
Multi-Provider
Start: Haiku (cheap, fast)
↓
If insufficient: Sonnet
↓
If still insufficient: Opus
Start: GPT-4o-mini (cheap, fast)
↓
If insufficient: GPT-4o
↓
If still insufficient: o1
Start: Gemini Flash (cheap, fast)
↓
If insufficient: Gemini Pro
↓
If still insufficient: Gemini Ultra
Start: Gemini Flash (cheapest)
↓
If insufficient: Claude Sonnet
↓
If still insufficient: Claude Opus / o1
Summarize Before Processing
Large document (100K tokens)
↓
Haiku summarizes → Summary (5K tokens)
↓
Sonnet processes summary
Parallel with Haiku, Synthesize with Sonnet
Fan out (cheap)
Run Task 1, Task 2, Task 3 in parallel with Haiku
Synthesize (quality)
Sonnet combines all results into final output
ROI Framework
Calculate Value per Token
Value = (Time Saved × Hourly Rate) / Tokens Used
Example:
Task saves 2 hours of dev time
Dev rate: $100/hour
Tokens used: 50,000
Value = ($200) / (50,000 tokens)
Value = $0.004 per token
Cost = 50,000 × $0.000003 = $0.15
ROI = ($200 - $0.15) / $0.15 = 133,233%
When to Optimize
| Scenario | Priority |
|---|
| High volume, low value | High - optimize aggressively |
| Low volume, high value | Low - focus on quality |
| High volume, high value | Medium - balance both |
| Low volume, low value | Question if needed at all |
Best Practices
- Monitor costs weekly at minimum
- Set per-agent and per-squad budgets
- Use fast/cheap models for high-volume tasks
- Cache expensive research results
- Batch similar operations
- Track ROI, not just costs
Common cost traps:
- Reading entire codebases repeatedly
- Using premium models for simple tasks
- Generating verbose explanations
- Not caching repeated queries
- Running agents without monitoring