Metrics Quickstart

Set up essential CD metrics in one day and start improving delivery performance

Tags:

10 minute read

This guide helps you quickly implement the minimum set of metrics needed to measure and improve your Continuous Delivery performance. Start tracking today, improve tomorrow.

Why Metrics Matter

Without metrics, improvement is guesswork.

Metrics help you:

✅ Identify bottlenecks in your delivery process
✅ Measure improvement over time
✅ Make data-driven decisions about where to focus
✅ Demonstrate value to leadership
✅ Prevent regression when optimizing

Critical

Use metrics in groups, never alone. Optimizing a single metric leads to unintended consequences. Always use offsetting metrics to maintain balance.

The Essential Four Metrics

Start with these four DORA metrics that predict delivery performance:

Metric	Good Target	Purpose
Development Cycle Time	< 2 days	Measure delivery speed
Deployment Frequency	Multiple/day	Measure delivery throughput
Change Failure Rate	< 5%	Measure quality
Mean Time to Repair	< 1 hour	Measure recovery speed

These four metrics balance speed (cycle time, deployment frequency) with stability (change failure rate, MTTR).

Day 1: Start Tracking

Step 1: Deployment Frequency (15 minutes)

What it measures: How often you deploy to production

Simplest implementation:

# Add to your deployment script
#!/bin/bash
# deploy.sh

DEPLOY_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
SERVICE_NAME="my-service"

# Your deployment logic here
kubectl apply -f deployment.yaml

# Log the deployment
echo "${DEPLOY_TIME},${SERVICE_NAME},${VERSION}" >> /var/log/deployments.csv

# Or send to metrics service
curl -X POST https://metrics.example.com/deployments \
  -d "{\"service\":\"${SERVICE_NAME}\",\"timestamp\":\"${DEPLOY_TIME}\",\"version\":\"${VERSION}\"}"

Query deployment frequency:

# Deployments per day (last 30 days)
cat /var/log/deployments.csv | \
  awk -F',' '{print $1}' | \
  cut -d'T' -f1 | \
  sort | uniq -c | \
  awk '{total+=$1; count++} END {print total/count " deployments/day"}'

Step 2: Development Cycle Time (20 minutes)

What it measures: Time from starting work to deploying to production

Track using git commits + deployment log:

# cycle-time.sh
#!/bin/bash

# Get commits from last deployment to now
LAST_DEPLOY_TIME=$(tail -1 /var/log/deployments.csv | cut -d',' -f1)

# For each commit since last deploy
git log --since="$LAST_DEPLOY_TIME" --format="%H|%ct|%s" | while IFS='|' read hash timestamp message; do
  # Extract story ID from commit message (e.g., JIRA-123)
  STORY_ID=$(echo "$message" | grep -oE '[A-Z]+-[0-9]+' | head -1)

  if [ -n "$STORY_ID" ]; then
    # Find when story was started (first commit with this ID)
    STARTED=$(git log --all --grep="$STORY_ID" --format="%ct" --reverse | head -1)
    CYCLE_TIME=$(( (timestamp - STARTED) / 3600 )) # hours

    echo "${STORY_ID},${CYCLE_TIME}"
  fi
done

Alternative: Track in your issue tracker

Most teams find it easier to track cycle time in Jira/GitHub Issues:

// Calculate cycle time from issue status transitions
const cycleTime = (issue) => {
  const startedAt = issue.transitions.find(t => t.to === 'In Progress').timestamp;
  const deployedAt = issue.transitions.find(t => t.to === 'Done').timestamp;

  const hours = (deployedAt - startedAt) / (1000 * 60 * 60);
  return hours / 24; // Convert to days
};

See Development Cycle Time for detailed implementation.

Step 3: Change Failure Rate (10 minutes)

What it measures: Percentage of deployments that require remediation

Simple tracking:

# Mark deployment as success or failure
#!/bin/bash
# deploy.sh (updated)

DEPLOY_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
VERSION=${1}
SUCCESS=${2:-true}  # Pass 'false' if deployment failed

echo "${DEPLOY_TIME},${VERSION},${SUCCESS}" >> /var/log/deployments.csv

# Calculate failure rate
total=$(wc -l < /var/log/deployments.csv)
failures=$(grep ",false$" /var/log/deployments.csv | wc -l)
echo "Change Failure Rate: $(awk "BEGIN {printf \"%.1f%%\", ($failures/$total)*100}")"

What counts as a failure:

Deployment rollback
Hotfix deployed within 24 hours
Production incident caused by the change
Manual intervention required

See Change Failure Rate for detailed guidance.

Step 4: Mean Time to Repair (10 minutes)

What it measures: How long it takes to restore service after an incident

Simple incident tracking:

# /var/log/incidents.csv
timestamp,service,severity,detected_at,resolved_at,mttr_minutes
2025-10-20T10:00:00Z,api,high,2025-10-20T10:00:00Z,2025-10-20T10:45:00Z,45
2025-10-20T15:30:00Z,web,medium,2025-10-20T15:30:00Z,2025-10-20T15:50:00Z,20

Calculate MTTR:

# Average time to repair (last 30 days)
cat /var/log/incidents.csv | tail -n +2 | \
  awk -F',' '{total+=$6; count++} END {print total/count " minutes (avg)"}'

Better: Integrate with incident management

// PagerDuty, Opsgenie, etc.
const mttr = (incident) => {
  const detected = new Date(incident.created_at);
  const resolved = new Date(incident.resolved_at);
  return (resolved - detected) / (1000 * 60); // minutes
};

See Mean Time to Repair for recovery strategies.

End of Day 1: You’re Tracking!

After one day, you should have:

✅ Deployment frequency logging
✅ Cycle time calculation (even if manual)
✅ Change failure tracking
✅ MTTR measurement

Calculate your baseline:

Baseline Metrics (Week of [DATE])

Deployment Frequency: _____ deployments/day
Development Cycle Time: _____ days (average)
Change Failure Rate: _____%
Mean Time to Repair: _____ minutes

Week 1: Make Metrics Visible

Create a Metrics Dashboard

Option 1: Simple Spreadsheet (30 minutes)

Create a Google Sheet or Excel file:

Week	Deployments	Avg Cycle Time (days)	Change Failures	MTTR (min)
2025-W42	23	3.2	8.7%	127
2025-W43	31	2.8	6.5%	89
2025-W44	28	2.5	4.3%	62

Update weekly and display prominently (print and post, or share link).

Option 2: Grafana Dashboard (2 hours)

# docker-compose.yml
version: '3'
services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - ./grafana-data:/var/lib/grafana
      - ./dashboards:/etc/grafana/provisioning/dashboards

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

Push metrics to Prometheus:

// metrics.js
const client = require('prom-client');

const deploymentCounter = new client.Counter({
  name: 'deployments_total',
  help: 'Total number of deployments',
  labelNames: ['service', 'environment']
});

const cycleTimeHistogram = new client.Histogram({
  name: 'development_cycle_time_hours',
  help: 'Development cycle time in hours',
  buckets: [2, 4, 8, 16, 24, 48, 96, 168] // hours
});

const changeFailureCounter = new client.Counter({
  name: 'change_failures_total',
  help: 'Number of failed changes',
  labelNames: ['service']
});

// Record metrics
deploymentCounter.inc({ service: 'api', environment: 'production' });
cycleTimeHistogram.observe(18.5); // 18.5 hours
changeFailureCounter.inc({ service: 'web' });

Option 3: Cloud Service (1 hour)

Use a hosted metrics service:

Datadog - Application monitoring
New Relic - Full-stack observability
CloudWatch - AWS native
Azure Monitor - Azure native
Google Cloud Monitoring - GCP native

Example with Datadog:

# Send metric via API
curl -X POST "https://api.datadoghq.com/api/v1/series" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -d '{
    "series": [{
      "metric": "deployment.frequency",
      "type": "count",
      "points": [['$(date +%s)', 1]],
      "tags": ["service:api","env:production"]
    }]
  }'

Week 2: Add Supporting Metrics

Once the four essentials are stable, add these supporting metrics:

Integration Frequency

What it measures: How often code is integrated to trunk

# Integrations per day
git log --since="30 days ago" --format="%cd" --date=short origin/main | \
  sort | uniq -c | \
  awk '{total+=$1; count++} END {print total/count " integrations/day"}'

Target: Multiple times per day per developer

See Integration Frequency.

Build Duration

What it measures: Time from commit to deployable artifact

# Extract from CI logs
# GitHub Actions example:
gh run list --limit 100 --json conclusion,createdAt,updatedAt | \
  jq -r '.[] | select(.conclusion=="success") |
    [.createdAt, .updatedAt] | @csv' | \
  awk -F',' '{
    start = mktime(gensub(/[:-]/, " ", "g", substr($1, 1, 19)))
    end = mktime(gensub(/[:-]/, " ", "g", substr($2, 1, 19)))
    print (end - start) / 60
  }' | \
  awk '{total+=$1; count++} END {print total/count " minutes (avg)"}'

Target: < 10 minutes

See Build Duration.

Work In Progress (WIP)

What it measures: Number of items in progress simultaneously

Track from your kanban board:

# Jira JQL
"status" = "In Progress" AND "assignee" is not EMPTY

Target: WIP < Team Size (ideally half)

See Work In Progress.

Metric Groups: Use Together

Goodhart's Law

“When a measure becomes a target, it ceases to be a good measure.”

Never optimize a single metric. Use offsetting groups:

Speed vs. Quality Group

Speed Metrics	Quality Metrics
Deployment Frequency ⬆️	Change Failure Rate ⬇️
Cycle Time ⬇️	Defect Rate ⬇️

Example: If you increase deployment frequency but change failure rate also increases, you’re going too fast. Slow down and improve quality feedback.

Flow vs. WIP Group

Flow Metrics	Constraint Metrics
Throughput ⬆️	WIP ⬇️
Velocity ⬆️	Lead Time ⬇️

Example: High velocity with high WIP means you’re starting too much work. Reduce WIP to improve flow.

See Metrics Cheat Sheet for comprehensive metric relationships.

Analyzing Your Metrics

Week 1: Establish Baseline

Don’t change anything. Just measure.

Week 1 Baseline:
- Deployment Frequency: 5/week
- Cycle Time: 4.2 days
- Change Failure Rate: 12%
- MTTR: 180 minutes

Week 2-4: Identify Bottlenecks

Look for patterns:

High Cycle Time + Low Deployment Frequency = Batch Size Problem

Solution: Story Slicing

High Change Failure Rate = Quality Feedback Problem

Solution: Improve Testing

High MTTR = Recovery Process Problem

Solution: Automate rollback, improve monitoring

High WIP + Low Velocity = Context Switching Problem

Solution: Limit WIP

Month 2+: Continuous Improvement

Set improvement targets:

Month 1 Target (achievable):
- Deployment Frequency: 10/week (+100%)
- Cycle Time: 3.0 days (-30%)
- Change Failure Rate: 8% (-33%)
- MTTR: 120 minutes (-33%)

Month 3 Target (stretch):
- Deployment Frequency: Daily
- Cycle Time: < 2 days
- Change Failure Rate: < 5%
- MTTR: < 60 minutes

Common Pitfalls

❌ Pitfall 1: Gaming the Metrics

Example: Moving tickets to “Done” before actually deploying

Solution:

Definition of Done must include “deployed to production”
Automate metric collection from deployment logs, not issue tracker

❌ Pitfall 2: Vanity Metrics

Example: “We deployed 100 times this month!” (but 20 were rollbacks)

Solution:

Always pair speed metrics with quality metrics
Track both successes and failures

❌ Pitfall 3: Metric Theater

Example: Spending more time tracking metrics than improving

Solution:

Automate data collection
Review metrics weekly, not daily
Focus on trends, not point-in-time values

❌ Pitfall 4: Using Metrics as Punishment

Example: “Bob’s change had a failure, Bob is a bad developer”

Solution:

Metrics measure the system, not individuals
Use metrics to identify process problems, not blame people
Celebrate improvements, not perfection

Advanced Metrics (Month 2+)

Once the essentials are stable, consider:

Quality Metrics

Code Coverage - But don’t target 100%
Defect Rate - Production bugs per release
Code Inventory - Undeployed code

Flow Metrics

Lead Time - Idea to production
Velocity - Work completed per sprint
Average Build Downtime - CI availability

Metric Automation Tools

Open Source

Grafana + Prometheus - Visualization + metrics
Loki - Log aggregation
Elasticsearch + Kibana - Search and analytics
Graphite - Time series database

Commercial

Datadog - Full-stack observability
New Relic - Application performance
Splunk - Log analysis and metrics
LinearB - Engineering metrics platform
Sleuth - Deployment tracking

CI/CD Native

GitHub Insights - Built into GitHub
GitLab Analytics - Built into GitLab
Azure DevOps Analytics - Built into Azure DevOps
CircleCI Insights - Built into CircleCI

Sample Implementation: Complete Script

Here’s a complete bash script to get started:

#!/bin/bash
# metrics-collector.sh - Run daily via cron

METRICS_FILE="/var/log/cd-metrics.csv"
TODAY=$(date +%Y-%m-%d)

# 1. Deployment Frequency (today)
DEPLOYMENTS_TODAY=$(grep "^${TODAY}" /var/log/deployments.csv | wc -l)

# 2. Average Cycle Time (last 7 days)
CYCLE_TIME=$(tail -50 /var/log/cycle-times.csv | \
  awk -F',' '{total+=$2; count++} END {print total/count}')

# 3. Change Failure Rate (last 30 days)
TOTAL=$(tail -200 /var/log/deployments.csv | wc -l)
FAILURES=$(tail -200 /var/log/deployments.csv | grep ",false$" | wc -l)
CFR=$(awk "BEGIN {printf \"%.1f\", ($FAILURES/$TOTAL)*100}")

# 4. Mean Time to Repair (last 30 days)
MTTR=$(tail -50 /var/log/incidents.csv | \
  awk -F',' 'NR>1 {total+=$6; count++} END {print total/count}')

# Write to metrics file
echo "${TODAY},${DEPLOYMENTS_TODAY},${CYCLE_TIME},${CFR},${MTTR}" >> "$METRICS_FILE"

# Generate report
cat <<EOF
📊 CD Metrics Report - ${TODAY}

Deployment Frequency: ${DEPLOYMENTS_TODAY} deployments today
Avg Cycle Time: ${CYCLE_TIME} days
Change Failure Rate: ${CFR}%
Mean Time to Repair: ${MTTR} minutes

View full report: https://metrics.example.com/dashboard
EOF

Set up daily collection:

# Add to crontab
crontab -e

# Run daily at 11:59 PM
59 23 * * * /usr/local/bin/metrics-collector.sh

Success Criteria

After implementing metrics, you should have:

✅ All four DORA metrics tracked automatically ✅ Baseline established (1 week of data) ✅ Dashboard visible to the entire team ✅ Weekly review scheduled ✅ Improvement targets defined ✅ Metric groups balanced (speed + quality)

Next Steps

Automate collection - Stop manual tracking
Add visualizations - Trend lines, not just numbers
Set up alerts - Get notified when metrics degrade
Correlate metrics - Find relationships between metrics
Share widely - Make metrics visible to stakeholders

Metrics Quickstart

Why Metrics Matter

Goodhart's Law

Critical

The Essential Four Metrics

Day 1: Start Tracking

Step 1: Deployment Frequency (15 minutes)

Step 2: Development Cycle Time (20 minutes)

Step 3: Change Failure Rate (10 minutes)

Step 4: Mean Time to Repair (10 minutes)

End of Day 1: You’re Tracking!

Week 1: Make Metrics Visible

Create a Metrics Dashboard

Week 2: Add Supporting Metrics

Integration Frequency

Build Duration

Work In Progress (WIP)

Metric Groups: Use Together

Goodhart's Law

Speed vs. Quality Group

Flow vs. WIP Group

Analyzing Your Metrics

Week 1: Establish Baseline

Week 2-4: Identify Bottlenecks

Month 2+: Continuous Improvement

Common Pitfalls

❌ Pitfall 1: Gaming the Metrics

❌ Pitfall 2: Vanity Metrics

❌ Pitfall 3: Metric Theater

❌ Pitfall 4: Using Metrics as Punishment

Advanced Metrics (Month 2+)

Quality Metrics

Flow Metrics

Metric Automation Tools

Open Source

Commercial

CI/CD Native

Sample Implementation: Complete Script

Success Criteria

Next Steps

Further Reading

Remember