Documentation Index
Fetch the complete documentation index at: https://docs.agents-squads.com/llms.txt
Use this file to discover all available pages before exploring further.
Without measurement, you can’t answer:
- Which agents deliver value?
- Where are bottlenecks?
- Is quality improving over time?
- What’s the ROI of agent investment?
Git-Based Metrics
Git provides a natural audit trail for agent work. Every commit, PR, and change is tracked.
Core Metrics
| Metric | Description | Calculation |
|---|
| Commits/day | Agent activity level | git log --author="agent" --since="1 day" |
| PR merge rate | Quality of output | Merged PRs / Total PRs |
| Time to merge | Review efficiency | PR created → PR merged |
| Lines changed | Scope of work | Additions + Deletions |
| Revert rate | Error frequency | Reverts / Total commits |
Tracking Agent Commits
Tag agent commits with consistent metadata:
git commit -m "feat: implement user auth
🤖 Generated with [Agents Squads](https://agents-squads.com)
Co-Authored-By: agents-squads <agents@agents-squads.com>
Agent: auth-implementer
Squad: engineering
Duration: 45m
Tokens: 125,000"
# Commits by agent in last week
git log --since="1 week" --grep="Agent:" --oneline | wc -l
# Agent-specific commits
git log --author="agents-squads" --since="1 month" --stat
# Commits per agent
git log --since="1 month" --grep="Agent:" --format="%s" | \
grep -oP "Agent: \K[^\n]+" | sort | uniq -c | sort -rn
# Revert rate
echo "scale=2; $(git log --grep="Revert" --since="1 month" | wc -l) / $(git log --since="1 month" | wc -l)" | bc
Using squads CLI
# Overall status
squads dashboard
# Agent-specific metrics
squads feedback stats
# View execution history
squads memory show engineering
Custom Metrics Script
#!/bin/bash
# scripts/agent-metrics.sh
SINCE="${1:-1 week}"
AGENT_EMAIL="agents@agents-squads.com"
echo "=== Agent Performance Report ==="
echo "Period: $SINCE"
echo ""
# Total commits
commits=$(git log --author="$AGENT_EMAIL" --since="$SINCE" --oneline | wc -l)
echo "Total commits: $commits"
# PRs created (requires gh cli)
prs_created=$(gh pr list --author="@me" --state=all --json createdAt --jq "length")
prs_merged=$(gh pr list --author="@me" --state=merged --json mergedAt --jq "length")
echo "PRs created: $prs_created"
echo "PRs merged: $prs_merged"
echo "Merge rate: $(echo "scale=2; $prs_merged / $prs_created * 100" | bc)%"
# Lines changed
git log --author="$AGENT_EMAIL" --since="$SINCE" --numstat --pretty="" | \
awk '{add+=$1; del+=$2} END {print "Lines added:", add, "deleted:", del}'
# By squad
echo ""
echo "=== By Squad ==="
git log --since="$SINCE" --grep="Squad:" --format="%s" | \
grep -oP "Squad: \K\w+" | sort | uniq -c | sort -rn
Quality Metrics
Code Review Scores
Track review feedback on agent PRs:
## PR Review Template
### Agent Output Quality
- [ ] Code is correct
- [ ] Code follows conventions
- [ ] Tests included
- [ ] Documentation updated
- [ ] No security issues
**Score**: 4/5
**Notes**: Minor style issues, otherwise good.
Automated Quality Checks
# .github/workflows/agent-quality.yml
name: Agent Quality Check
on:
pull_request:
branches: [main]
jobs:
quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check if agent PR
id: check
run: |
if git log -1 --format="%b" | grep -q "agents-squads"; then
echo "is_agent=true" >> $GITHUB_OUTPUT
fi
- name: Run quality metrics
if: steps.check.outputs.is_agent == 'true'
run: |
# Lint check
npm run lint
# Test coverage
npm test -- --coverage
# Complexity check
npx complexity-report src/
- name: Post metrics to PR
if: steps.check.outputs.is_agent == 'true'
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '## Agent Quality Report\n...'
})
Benchmarking
Task Completion Benchmarks
Track how long standard tasks take:
| Task Type | Target | Current Avg |
|---|
| Bug fix | < 30 min | 25 min |
| Feature (small) | < 2 hours | 1.5 hours |
| Feature (medium) | < 1 day | 6 hours |
| Refactor | < 4 hours | 3 hours |
A/B Testing Agents
Compare different agent configurations:
## Experiment: Prompt Optimization
**Hypothesis**: Adding domain context improves code quality
**Agent A**: Base prompt
**Agent B**: Base prompt + CLAUDE.md context
**Metrics**:
- PR merge rate
- Review iterations needed
- Time to completion
**Results**:
- Agent A: 75% merge rate, 2.3 iterations
- Agent B: 92% merge rate, 1.1 iterations
**Conclusion**: Domain context significantly improves quality
Monitoring & Alerts
# .agents/performance.yml
thresholds:
merge_rate:
warning: 0.80
critical: 0.60
revert_rate:
warning: 0.05
critical: 0.10
avg_review_iterations:
warning: 3
critical: 5
tokens_per_task:
warning: 100000
critical: 200000
Alerting
// Alert on performance degradation
async function checkPerformance() {
const metrics = await getAgentMetrics('last_week');
if (metrics.mergeRate < THRESHOLDS.merge_rate.critical) {
await alert({
level: 'critical',
message: `Agent merge rate dropped to ${metrics.mergeRate}`,
squad: metrics.squad
});
}
if (metrics.revertRate > THRESHOLDS.revert_rate.warning) {
await alert({
level: 'warning',
message: `High revert rate: ${metrics.revertRate}`,
squad: metrics.squad
});
}
}
Feedback Loop
Recording Feedback
# After agent task completion
squads feedback add engineering
# Prompts for:
# - Task success (yes/no)
# - Quality score (1-5)
# - Issues encountered
# - Improvement suggestions
Using Feedback
# View feedback trends
squads feedback stats
# Output:
# Squad: engineering
# Tasks: 45
# Success rate: 91%
# Avg quality: 4.2/5
# Common issues:
# - Missing tests (8 occurrences)
# - Style inconsistencies (5 occurrences)
Continuous Improvement
Measure → Analyze → Improve → Repeat
│ │ │
│ │ └── Update prompts, add context
│ └── Identify patterns in failures
└── Track metrics over time
Best Practices
- Tag all agent commits with consistent metadata
- Track metrics weekly, review monthly
- Set quality thresholds and alert on breaches
- A/B test prompt and configuration changes
- Record human feedback after task completion
- Use metrics to guide agent improvements
Measurement pitfalls:
- Optimizing for vanity metrics (commits ≠ value)
- Ignoring quality in favor of speed
- Not accounting for task difficulty
- Missing the feedback loop
Token Economics
Cost-focused metrics
Deployment
Production monitoring