This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Metrics Overview

An overview of key metrics for measuring and improving Continuous Delivery performance

1: Metrics Quickstart
2: Metrics Cheat Sheet
3: Average Build Downtime
4: Build Duration
5: Change Fail Rate
6: Code Coverage
7: Code Inventory
8: Defect Rate
9: Delivery Frequency
10: Development Cycle Time
11: Code Integration Frequency
12: Lead Time
13: Mean Time to Repair (MTTR)
14: Quality Metrics
15: Velocity / Throughput
16: Work in Progress (WIP)

Metrics are crucial for organizational improvement. Without measurement, improvement attempts are aimless. This guide outlines key metrics for Continuous Delivery (CD) and Continuous Integration (CI).

Quick Start

📚 Metrics Quickstart - Set up essential CD metrics in one day

CD Execution Metrics

These metrics measure our ability to reliably and sustainably deliver high-quality changes through frequent, small batches.

Throughput

Key Metrics

Stability

Key Metrics

CI Execution Metrics

Continuous Integration is the foundation of Continuous Delivery. These metrics focus on amplifying quality feedback.

Key Metrics

Integration Frequency Guidelines

Mob programming: Several times a day
Pair programming: Several times a day per pair
Individual tasks: Several times a day per developer

Workflow Management Metrics

These metrics help manage and optimize the overall development workflow.

Key Metrics

Metrics Usage Guide

Use metrics in offsetting groups
Focus improvement efforts on the group of metrics as a whole, not individual measures
Refer to the Metrics Cheat Sheet for a high-level view of key metrics, their intent, and appropriate usage

Remember

Metrics, like any tool, must be used correctly to drive the improvement we need. Focusing on a single metric can lead to unintended consequences and suboptimal outcomes.

1 - Metrics Quickstart

Set up essential CD metrics in one day and start improving delivery performance

This guide helps you quickly implement the minimum set of metrics needed to measure and improve your Continuous Delivery performance. Start tracking today, improve tomorrow.

Why Metrics Matter

Without metrics, improvement is guesswork.

Metrics help you:

✅ Identify bottlenecks in your delivery process
✅ Measure improvement over time
✅ Make data-driven decisions about where to focus
✅ Demonstrate value to leadership
✅ Prevent regression when optimizing

Critical

Use metrics in groups, never alone. Optimizing a single metric leads to unintended consequences. Always use offsetting metrics to maintain balance.

The Essential Four Metrics

Start with these four DORA metrics that predict delivery performance:

Metric	Good Target	Purpose
Development Cycle Time	< 2 days	Measure delivery speed
Deployment Frequency	Multiple/day	Measure delivery throughput
Change Failure Rate	< 5%	Measure quality
Mean Time to Repair	< 1 hour	Measure recovery speed

These four metrics balance speed (cycle time, deployment frequency) with stability (change failure rate, MTTR).

Day 1: Start Tracking

Step 1: Deployment Frequency (15 minutes)

What it measures: How often you deploy to production

Simplest implementation:

# Add to your deployment script
#!/bin/bash
# deploy.sh

DEPLOY_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
SERVICE_NAME="my-service"

# Your deployment logic here
kubectl apply -f deployment.yaml

# Log the deployment
echo "${DEPLOY_TIME},${SERVICE_NAME},${VERSION}" >> /var/log/deployments.csv

# Or send to metrics service
curl -X POST https://metrics.example.com/deployments \
  -d "{\"service\":\"${SERVICE_NAME}\",\"timestamp\":\"${DEPLOY_TIME}\",\"version\":\"${VERSION}\"}"

Query deployment frequency:

# Deployments per day (last 30 days)
cat /var/log/deployments.csv | \
  awk -F',' '{print $1}' | \
  cut -d'T' -f1 | \
  sort | uniq -c | \
  awk '{total+=$1; count++} END {print total/count " deployments/day"}'

Step 2: Development Cycle Time (20 minutes)

What it measures: Time from starting work to deploying to production

Track using git commits + deployment log:

# cycle-time.sh
#!/bin/bash

# Get commits from last deployment to now
LAST_DEPLOY_TIME=$(tail -1 /var/log/deployments.csv | cut -d',' -f1)

# For each commit since last deploy
git log --since="$LAST_DEPLOY_TIME" --format="%H|%ct|%s" | while IFS='|' read hash timestamp message; do
  # Extract story ID from commit message (e.g., JIRA-123)
  STORY_ID=$(echo "$message" | grep -oE '[A-Z]+-[0-9]+' | head -1)

  if [ -n "$STORY_ID" ]; then
    # Find when story was started (first commit with this ID)
    STARTED=$(git log --all --grep="$STORY_ID" --format="%ct" --reverse | head -1)
    CYCLE_TIME=$(( (timestamp - STARTED) / 3600 )) # hours

    echo "${STORY_ID},${CYCLE_TIME}"
  fi
done

Alternative: Track in your issue tracker

Most teams find it easier to track cycle time in Jira/GitHub Issues:

// Calculate cycle time from issue status transitions
const cycleTime = (issue) => {
  const startedAt = issue.transitions.find(t => t.to === 'In Progress').timestamp;
  const deployedAt = issue.transitions.find(t => t.to === 'Done').timestamp;

  const hours = (deployedAt - startedAt) / (1000 * 60 * 60);
  return hours / 24; // Convert to days
};

See Development Cycle Time for detailed implementation.

Step 3: Change Failure Rate (10 minutes)

What it measures: Percentage of deployments that require remediation

Simple tracking:

# Mark deployment as success or failure
#!/bin/bash
# deploy.sh (updated)

DEPLOY_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
VERSION=${1}
SUCCESS=${2:-true}  # Pass 'false' if deployment failed

echo "${DEPLOY_TIME},${VERSION},${SUCCESS}" >> /var/log/deployments.csv

# Calculate failure rate
total=$(wc -l < /var/log/deployments.csv)
failures=$(grep ",false$" /var/log/deployments.csv | wc -l)
echo "Change Failure Rate: $(awk "BEGIN {printf \"%.1f%%\", ($failures/$total)*100}")"

What counts as a failure:

Deployment rollback
Hotfix deployed within 24 hours
Production incident caused by the change
Manual intervention required

See Change Failure Rate for detailed guidance.

Step 4: Mean Time to Repair (10 minutes)

What it measures: How long it takes to restore service after an incident

Simple incident tracking:

# /var/log/incidents.csv
timestamp,service,severity,detected_at,resolved_at,mttr_minutes
2025-10-20T10:00:00Z,api,high,2025-10-20T10:00:00Z,2025-10-20T10:45:00Z,45
2025-10-20T15:30:00Z,web,medium,2025-10-20T15:30:00Z,2025-10-20T15:50:00Z,20

Calculate MTTR:

# Average time to repair (last 30 days)
cat /var/log/incidents.csv | tail -n +2 | \
  awk -F',' '{total+=$6; count++} END {print total/count " minutes (avg)"}'

Better: Integrate with incident management

// PagerDuty, Opsgenie, etc.
const mttr = (incident) => {
  const detected = new Date(incident.created_at);
  const resolved = new Date(incident.resolved_at);
  return (resolved - detected) / (1000 * 60); // minutes
};

See Mean Time to Repair for recovery strategies.

End of Day 1: You’re Tracking!

After one day, you should have:

✅ Deployment frequency logging
✅ Cycle time calculation (even if manual)
✅ Change failure tracking
✅ MTTR measurement

Calculate your baseline:

Baseline Metrics (Week of [DATE])

Deployment Frequency: _____ deployments/day
Development Cycle Time: _____ days (average)
Change Failure Rate: _____%
Mean Time to Repair: _____ minutes

Week 1: Make Metrics Visible

Create a Metrics Dashboard

Option 1: Simple Spreadsheet (30 minutes)

Create a Google Sheet or Excel file:

Week	Deployments	Avg Cycle Time (days)	Change Failures	MTTR (min)
2025-W42	23	3.2	8.7%	127
2025-W43	31	2.8	6.5%	89
2025-W44	28	2.5	4.3%	62

Update weekly and display prominently (print and post, or share link).

Option 2: Grafana Dashboard (2 hours)

# docker-compose.yml
version: '3'
services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - ./grafana-data:/var/lib/grafana
      - ./dashboards:/etc/grafana/provisioning/dashboards

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

Push metrics to Prometheus:

// metrics.js
const client = require('prom-client');

const deploymentCounter = new client.Counter({
  name: 'deployments_total',
  help: 'Total number of deployments',
  labelNames: ['service', 'environment']
});

const cycleTimeHistogram = new client.Histogram({
  name: 'development_cycle_time_hours',
  help: 'Development cycle time in hours',
  buckets: [2, 4, 8, 16, 24, 48, 96, 168] // hours
});

const changeFailureCounter = new client.Counter({
  name: 'change_failures_total',
  help: 'Number of failed changes',
  labelNames: ['service']
});

// Record metrics
deploymentCounter.inc({ service: 'api', environment: 'production' });
cycleTimeHistogram.observe(18.5); // 18.5 hours
changeFailureCounter.inc({ service: 'web' });

Option 3: Cloud Service (1 hour)

Use a hosted metrics service:

Datadog - Application monitoring
New Relic - Full-stack observability
CloudWatch - AWS native
Azure Monitor - Azure native
Google Cloud Monitoring - GCP native

Example with Datadog:

# Send metric via API
curl -X POST "https://api.datadoghq.com/api/v1/series" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -d '{
    "series": [{
      "metric": "deployment.frequency",
      "type": "count",
      "points": [['$(date +%s)', 1]],
      "tags": ["service:api","env:production"]
    }]
  }'

Week 2: Add Supporting Metrics

Once the four essentials are stable, add these supporting metrics:

Integration Frequency

What it measures: How often code is integrated to trunk

# Integrations per day
git log --since="30 days ago" --format="%cd" --date=short origin/main | \
  sort | uniq -c | \
  awk '{total+=$1; count++} END {print total/count " integrations/day"}'

Target: Multiple times per day per developer

See Integration Frequency.

Build Duration

What it measures: Time from commit to deployable artifact

# Extract from CI logs
# GitHub Actions example:
gh run list --limit 100 --json conclusion,createdAt,updatedAt | \
  jq -r '.[] | select(.conclusion=="success") |
    [.createdAt, .updatedAt] | @csv' | \
  awk -F',' '{
    start = mktime(gensub(/[:-]/, " ", "g", substr($1, 1, 19)))
    end = mktime(gensub(/[:-]/, " ", "g", substr($2, 1, 19)))
    print (end - start) / 60
  }' | \
  awk '{total+=$1; count++} END {print total/count " minutes (avg)"}'

Target: < 10 minutes

See Build Duration.

Work In Progress (WIP)

What it measures: Number of items in progress simultaneously

Track from your kanban board:

# Jira JQL
"status" = "In Progress" AND "assignee" is not EMPTY

Target: WIP < Team Size (ideally half)

See Work In Progress.

Metric Groups: Use Together

Goodhart's Law

“When a measure becomes a target, it ceases to be a good measure.”

Never optimize a single metric. Use offsetting groups:

Speed vs. Quality Group

Speed Metrics	Quality Metrics
Deployment Frequency ⬆️	Change Failure Rate ⬇️
Cycle Time ⬇️	Defect Rate ⬇️

Example: If you increase deployment frequency but change failure rate also increases, you’re going too fast. Slow down and improve quality feedback.

Flow vs. WIP Group

Flow Metrics	Constraint Metrics
Throughput ⬆️	WIP ⬇️
Velocity ⬆️	Lead Time ⬇️

Example: High velocity with high WIP means you’re starting too much work. Reduce WIP to improve flow.

See Metrics Cheat Sheet for comprehensive metric relationships.

Analyzing Your Metrics

Week 1: Establish Baseline

Don’t change anything. Just measure.

Week 1 Baseline:
- Deployment Frequency: 5/week
- Cycle Time: 4.2 days
- Change Failure Rate: 12%
- MTTR: 180 minutes

Week 2-4: Identify Bottlenecks

Look for patterns:

High Cycle Time + Low Deployment Frequency = Batch Size Problem

Solution: Story Slicing

High Change Failure Rate = Quality Feedback Problem

Solution: Improve Testing

High MTTR = Recovery Process Problem

Solution: Automate rollback, improve monitoring

High WIP + Low Velocity = Context Switching Problem

Solution: Limit WIP

Month 2+: Continuous Improvement

Set improvement targets:

Month 1 Target (achievable):
- Deployment Frequency: 10/week (+100%)
- Cycle Time: 3.0 days (-30%)
- Change Failure Rate: 8% (-33%)
- MTTR: 120 minutes (-33%)

Month 3 Target (stretch):
- Deployment Frequency: Daily
- Cycle Time: < 2 days
- Change Failure Rate: < 5%
- MTTR: < 60 minutes

Common Pitfalls

❌ Pitfall 1: Gaming the Metrics

Example: Moving tickets to “Done” before actually deploying

Solution:

Definition of Done must include “deployed to production”
Automate metric collection from deployment logs, not issue tracker

❌ Pitfall 2: Vanity Metrics

Example: “We deployed 100 times this month!” (but 20 were rollbacks)

Solution:

Always pair speed metrics with quality metrics
Track both successes and failures

❌ Pitfall 3: Metric Theater

Example: Spending more time tracking metrics than improving

Solution:

Automate data collection
Review metrics weekly, not daily
Focus on trends, not point-in-time values

❌ Pitfall 4: Using Metrics as Punishment

Example: “Bob’s change had a failure, Bob is a bad developer”

Solution:

Metrics measure the system, not individuals
Use metrics to identify process problems, not blame people
Celebrate improvements, not perfection

Advanced Metrics (Month 2+)

Once the essentials are stable, consider:

Quality Metrics

Code Coverage - But don’t target 100%
Defect Rate - Production bugs per release
Code Inventory - Undeployed code

Flow Metrics

Lead Time - Idea to production
Velocity - Work completed per sprint
Average Build Downtime - CI availability

Metric Automation Tools

Open Source

Grafana + Prometheus - Visualization + metrics
Loki - Log aggregation
Elasticsearch + Kibana - Search and analytics
Graphite - Time series database

Commercial

Datadog - Full-stack observability
New Relic - Application performance
Splunk - Log analysis and metrics
LinearB - Engineering metrics platform
Sleuth - Deployment tracking

CI/CD Native

GitHub Insights - Built into GitHub
GitLab Analytics - Built into GitLab
Azure DevOps Analytics - Built into Azure DevOps
CircleCI Insights - Built into CircleCI

Sample Implementation: Complete Script

Here’s a complete bash script to get started:

#!/bin/bash
# metrics-collector.sh - Run daily via cron

METRICS_FILE="/var/log/cd-metrics.csv"
TODAY=$(date +%Y-%m-%d)

# 1. Deployment Frequency (today)
DEPLOYMENTS_TODAY=$(grep "^${TODAY}" /var/log/deployments.csv | wc -l)

# 2. Average Cycle Time (last 7 days)
CYCLE_TIME=$(tail -50 /var/log/cycle-times.csv | \
  awk -F',' '{total+=$2; count++} END {print total/count}')

# 3. Change Failure Rate (last 30 days)
TOTAL=$(tail -200 /var/log/deployments.csv | wc -l)
FAILURES=$(tail -200 /var/log/deployments.csv | grep ",false$" | wc -l)
CFR=$(awk "BEGIN {printf \"%.1f\", ($FAILURES/$TOTAL)*100}")

# 4. Mean Time to Repair (last 30 days)
MTTR=$(tail -50 /var/log/incidents.csv | \
  awk -F',' 'NR>1 {total+=$6; count++} END {print total/count}')

# Write to metrics file
echo "${TODAY},${DEPLOYMENTS_TODAY},${CYCLE_TIME},${CFR},${MTTR}" >> "$METRICS_FILE"

# Generate report
cat <<EOF
📊 CD Metrics Report - ${TODAY}

Deployment Frequency: ${DEPLOYMENTS_TODAY} deployments today
Avg Cycle Time: ${CYCLE_TIME} days
Change Failure Rate: ${CFR}%
Mean Time to Repair: ${MTTR} minutes

View full report: https://metrics.example.com/dashboard
EOF

Set up daily collection:

# Add to crontab
crontab -e

# Run daily at 11:59 PM
59 23 * * * /usr/local/bin/metrics-collector.sh

Success Criteria

After implementing metrics, you should have:

✅ All four DORA metrics tracked automatically ✅ Baseline established (1 week of data) ✅ Dashboard visible to the entire team ✅ Weekly review scheduled ✅ Improvement targets defined ✅ Metric groups balanced (speed + quality)

Next Steps

Automate collection - Stop manual tracking
Add visualizations - Trend lines, not just numbers
Set up alerts - Get notified when metrics degrade
Correlate metrics - Find relationships between metrics
Share widely - Make metrics visible to stakeholders

2 - Metrics Cheat Sheet

Quick reference guide for key CD metrics with targets and improvement strategies

Organizational Metrics

These metrics are important for teams and management to track the health of the delivery system

Metric	Meaning	Goal of Measuring	Guardrail Metrics
Integration/Merge Frequency	How frequently code changes are integrated to the trunk for testing	Reduce the size of change to improve quality and reduce risk	Defect Rates should not increase
Build Cycle Time	Total duration from commit to production delivery	Improve the ability to deliver changes to improve feedback and reduce MTTR	Defect Rates should not increase
Change Fail %	The % of production deploys that are reverted	Improve the upstream quality processes	Development Cycle Time should not increase
Code Inventory	Lines of code added or removed that have not been delivered to production	Reduce the amount of code inventory and move closer to Just In Time delivery.	Change Fail % & Defect Rate should not increase
Defect Rate	Number of defects created during a set interval	Improve the quality processes in the delivery flow	Delivery Frequency should not reduce
Development Cycle Time	Time from when a story is started until marked “done”	Reduce the size of work to improve the feedback from the end user on the value of the work and to improve the quality of the acceptance criteria and testing	Defect Rate should not increase
MTTR	The time from when customer impact begins until it is resolved	Improve the stability and resilience of both the application and the system of delivery	Quality should not decrease
Delivery Frequency	The frequency that changes are delivered to production	Reduce the size of delivered change, improve the feedback loop on quality and increase the speed of value delivery.	Defect Rates should not degrade
Work in Progress	The number of items in progress on the team relative to the size of the team	Reduce the number of items in progress so that the team can focus on completing work vs/ being busy.	Delivery frequency should not degrade

Team Metrics

These metrics should only be used by teams to inform decision making. They are ineffective for measuring quality, productivity, or delivery system health.

Metric	Meaning	Goal of Measuring	Issues with Metric
Code Coverage	The % of code that us executed by test code	Prevent unexpected reduction of code coverage. Find code that should be better tested	When coverage goals are set, can generate tests that meet the goals but are ineffective as tests.
Velocity/Throughput	The average amount of the backlog delivered during a sprint by the team. Used by the product team for planning. There is no such thing as good or bad velocity.

3 - Average Build Downtime

Time the build stays broken before being fixed - measures team discipline and CI commitment

The average length of time between when a build breaks and when it is fixed.

What is the intended behavior?

Keep the pipelines always deployable by fixing broken builds as rapidly as possible. Broken builds are the highest priority since they prevent production fixes from being deployed in a safe, standard way.

How to improve it

Refactor to improve testability and modularity.
Improve tests to locate problems more rapidly.
Decrease the size of the component to reduce complexity.
Add automated alerts for broken builds.
Ensure the proper team practice is in place to support each other in solving the problem as a team.

How to game it

Re-build the previous version.
Remove tests that are failing.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Integration Frequency decreases as additional manual or automated process overhead is added before integration to trunk.

4 - Build Duration

Time for CI pipeline to complete - critical for fast feedback and should be under 10 minutes

The time from code commit to production deploy. This is the minimum time changes can be applied to production. This is referenced as “hard lead time” in Accelerate

What is the intended behavior?

Reduce pipeline duration to improve MTTR and improve test efficiency to give the team more rapid feedback to any issues. Long build cycle times delay quality feedback and create more opportunity for defect penetration.

How to improve it

Identify areas of the build that can run concurrently.
Replace end to end tests in the pipeline with virtual services and move end to end testing to an asynchronous process.
Break down large services into smaller sub-domains that are easier and faster to build / test.
Add alerts to the pipeline if a maximum duration is exceeded to inform test refactoring priorities.

How to game it

Reduce the number of tests running or test types executed.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Defect rates increase if quality gates are skipped to reduce build time.

5 - Change Fail Rate

Percentage of changes that result in degraded service or require remediation - a key DORA stability metric

The percentage of changes that result in negative customer impact, or rollback.

changeFailRate = failedChangeCount / changeCount

What is the intended behavior?

Reduce the percentage of failed changes.

How to improve it

Release more, smaller changes to make quality steps more effective and reduce the impact of failure.
Identify root cause for each failure and improve the automated quality checks.

How to game it

Deploy fixes without recording the defect.
Create defect review meetings and re-classify defects as feature requests.
Re-deploy the latest working version to increase deploy count.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Delivery frequency can decrease if focus is placed on “zero defect” changes.
Defect rates can increase as reduced delivery frequency increases code change batch size and delivery risk.

References

“Accelerate” Ch2: Measuring Performance - Nicole Forsgren PhD, Jez Humble & Gene Kim

6 - Code Coverage

Percentage of code exercised by tests - useful indicator but can be gamed, use with caution

A measure of the amount of code that is executed by test code.

What is the intended behavior?

Inform the team of risky or complicated portions of the code that are not sufficiently covered by tests. Care should be taken not to confuse high coverage with good testing.

How to improve it

Write tests for code that SHOULD be covered but isn’t
Refactor the application to improve testability
Remove unreachable code
Delete pointless tests
Refactor tests to test behavior rather than implementation details

How to game it

Tests are written for code that receives no value from testing.
Test code is written without assertions.
Tests are written with meaningless assertions.

Example: The following test will result in 100% function, branch, and line coverage with no behavior tested.

/* Return the sum of two integers */
/* Return null if one of that parms is not an integer */
function addWholeNumbers(a, b) {

  if (a % 1 === 0 && b % 1 === 0) {
    return a + b; 
  } else {
    return null;
  }
}

it('Should not return null of both numbers are integers' () => {
  /*
  * This call will return 4, which is not null. 
  * Pass 
  */
  expect(addWholeNumbers(2, 2)).not.toBe(null);
  
  /*
  * This returns "22" because JS sees a string will helpfully concatenate them.
  * Pass
  */
  expect(addWholeNumbers(2, '2')).not.toBe(null);

  /* 
  * The function will never return the JS `NaN` constant 
  * Pass
  */  
  expect(addWholeNumbers(1.1, 0)).not.toBe(NaN);
})

The following is an example of test code with no assertions. This will also produce 100% code coverage reporting but does not test anything because there are no assertions to cause the test to fail.

it('Should not return null if both numbers are integers' () => {
  addWholeNumbers(2, 2);
  addWholeNumbers(2, '2');
  addWholeNumbers(1.1, 0);
})

Guardrail Metrics

Test coverage should never be used as a goal or an indicator of application health. Measure outcomes. If testing is poor, the following metrics will show poor results.

Defect Rates will increase as poor-quality tests are created to meet coverage targets that do not reliably catch defects.
Development Cycle Time will increase as more emphasis is placed on improper testing methods (manual functional testing, testing teams, etc.) to overcome the lack of reliable tests.

7 - Code Inventory

Amount of code written but not yet delivered to production - represents unrealized value and risk

The lines of code that have been changed but have not been delivered to production. This can be measured at several points in the delivery flow, starting with code not merged to trunk.

What is the intended behavior?

Reduce the size of individual changes and reduce the duration of branches to improve quality feedback. We also want to eliminate stale branches that represent risk of lost change or merge conflicts that result in additional manual steps that add risk.

How to improve it

Improve continuous integration behavior where changes are integrated to the trunk and verified multiple times per day.

How to game it

Use forks to hide changes.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality can decrease as quality steps are skipped or batch size increases.

8 - Defect Rate

Measure of escaped defects found in production, indicating test effectiveness and quality processes

Defect rates are the total number of defects by severity reported for a period of time.

Defect count / Time range

What is the intended behavior?

Use defect rates and trends to inform improvement of upstream quality processes.

Defect rates in production indicate how effective our overall quality process is. Defect rates in lower environments inform us of specific areas where quality process can be improved. The goal is to push detection closer to the developer.

How to improve it

Track trends over time and identify common issues for the defects Design test design changes that would reduce the time to detect defects.

How to game it

Mark defects as enhancement requests
Don’t track defects
Deploy changes that do not modify the application to improve the percentage

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Delivery frequency is reduced if too much emphasis is place on zero defects. This can be self-defeating as large change batches will contain more defects.

9 - Delivery Frequency

How often changes are deployed to production - a key DORA metric measuring throughput and team capability

How frequently per day the team releases changes to production.

What is the intended behavior?

Small changes deployed very frequently to exercise the ability to fix production rapidly, reduce MTTR, increase quality, and reduce risk.

How to improve it

Reduce Development Cycle Time.
Remove handoffs to other teams.
Remove manual processes.
Improve testing and move quality ownership into the team.
Move hard dependencies to soft dependencies with feature flags and service virtualization.
Focus on Continuous Integration with small changes integrated to the trunk continuously.
Use Trunk Based Development to reduce the risk of lost changes and process overhead.

How to game it

Re-deploying the same artifact repeatedly.
Building new artifacts that contain no changes.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Change Fail Rate increases as focus shifts to speed instead of quality.
Quality decreases if steps are skipped in refining work for the sake of output.

10 - Development Cycle Time

Average time from starting work until release to production - a key flow metric for identifying delivery bottlenecks and improving feedback speed

The average time from starting work until release to production.

What is the intended behavior?

Reduce the time it takes to deliver refined work to production to mitigate the effects of priorities changing and get rapid feedback on quality.

How to improve it

Decompose work so it can be delivered in smaller increments and by more team members.
Identify and remove process waste, handoffs, and delays in the construction process.
Improve test design.
Automate and standardize the build and deploy pipeline.

How to game it

Move things to “Done” status that are not in production.
Move items directly from “Backlog” to “Done” after deploying to production.
Split work into functional tasks that should be considered part of development (development task, testing task, etc.).

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality decreases if quality processes are skipped.
Standard deviation of the control chart can show issues being closed too rapidly.

References

“Accelerate” Ch2: Measuring Performance - Nicole Forsgren PhD, Jez Humble & Gene Kim

11 - Code Integration Frequency

How often code is integrated to trunk/main - indicator of CI practice maturity and team collaboration

The average number of production-ready pull requests a team closes per day, normalized by the number of developers on the team. On a team with 5 developers, healthy CI practice is at least 5 per day.

What is the intended behavior?

Increase the frequency of code integration
Reduce the size of each change
Improve code review processes
Remove unneeded processes
Improve quality feedback

How to improve it

Decompose code changes into smaller units to incrementally deliver features.
Use BDD to aid functional breakdown.
Use TDD to design more modular code that can be integrated more frequently.
USe feature flags, branch by abstraction, or other coding techniques to control the release of new features.

How to game it

Meaningless changes integrated to trunk.
Breaking changes integrated to trunk.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality decreases if testing is skipped.

Recommended Practices

12 - Lead Time

Total time from customer request to delivery in production - measures entire value stream efficiency

This shows the average time it takes for a new request to be delivered. This is measured from the creation date to release date for each unit of work and includes Development Cycle Time.

What is the intended behavior?

Identify over utilized teams, backlogs that need more Product Owner attention, or in conjunction with velocity to help teams optimize their processes.

How to improve it

Relentlessly remove old items from the backlog. Improve team processes to reduce Development Cycle Time. Use Innersourcing to allow other teams to help when surges of work arrive. Re-assign, carefully, some components to another team to scale delivery.

How to game it

Requests can be tracked in spreadsheet or other locations and then added to the backlog just before development. This can be identified by decreased customer satisfaction.
Reduce feature refining rigour.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality is reduced if less time is spent refining and defining testable requirements.

References

InnerSourcing.

13 - Mean Time to Repair (MTTR)

Average time to restore service after an incident - a key DORA stability metric measuring recovery capability

Mean Time to Repair is the average time between when a incidents is detected and when it is resolved.

“Software delivery performance is a combination of three metrics: lead time, release frequency, and MTTR. Change fail rate is not included, though it is highly correlated.”

“Accelerate” uses Lead Time for Development Cycle Time.

What is the intended behavior?

Improve the ability to more rapidly resolve system instability and service outages.

How to improve it

Make sure the pipeline alway deployable.
Keep build cycle time short to allow roll-forward.
Implement feature flags for larger feature changes to allow the them to be deactivated without re-deploying.
Identify stability issues and prioritize them in the backlog.

How to game it

Updating support incidents to “closed” before service is restored.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality decreases if issues re-occur due to lack of improving pipeline quality gates.

References

“Accelerate” Ch2: Measuring Performance - Nicole Forsgren PhD, Jez Humble & Gene Kim

14 - Quality Metrics

Comprehensive view of quality indicators including defects, test coverage, and technical debt

Quality is measured as the percentage of finished work that is unused, unstable, unavailable, or defective according to the end user.

What is the intended behavior?

Continuously improve the quality steps in the construction process, reduce the size of delivered change, and increase the speed of feedback from the end user. Improving this cycle improves roadmap decisions.

How to improve it

Add automated checks to the pipeline to prevent re-occurrence of root causes.
Only begin new work with testable acceptance criteria.
Accelerate feedback loops at every step to alert to quality, performance, or availability issues.

How to game it

Log defects as new features

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

[Delivery frequency may be reduced if more manual quality steps are added
Build cycle time may increase as additional tests are added to the pipeline
Lead time can increase as more time is spent on business analysis

15 - Velocity / Throughput

Amount of work completed per iteration - team capacity planning metric that should be used carefully, not as productivity measure

The average amount of the backlog delivered during a sprint by the team. Used by the product team for planning. There is no such thing as good or bad velocity. This is commonly misunderstood to be a productivity metric. It is not.

What is the intended behavior?

After a team stabilizes, the standard deviation should be low. This will enable realistic planning of future deliverables based on relative complexity. Find ways to increase this over time by reducing waste, improving planning, and focusing on teamwork.

How to improve it

Reduce story size so they are easier to understand and more predictable.
Minimize hard dependencies. Each hard dependency reduces the odds of on-time delivery by 50%.
Swarm stories by decomposing them into tasks that can be executed in parallel so that the team is working as a unit to deliver faster.

How to game it

Cherry pick easy, low priority items.
Increase story points
Skip quality steps.
Prematurely sign-off work only to have defects reported later.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality defect ratio goes up as more defects are reported.
WIP increases as teams start more work to look more busy.

References

Harvard Business Review: Six Myths of Product Development Scrum.org: Velocity

16 - Work in Progress (WIP)

Count of started but unfinished work - leading indicator of flow problems and context switching

Work in Progress (WIP) is the total work that has been started but not completed. This includes all work, defects, tasks, stories, etc.

What is the intended behavior?

Focus the team on finishing work and delivering it rather than switching between tasks but not finishing them.

How to improve it

The team should focus on finishing items closest to being ready for production.
- Prioritize code review over starting new work
- Prioritize pairing to solve a problem over starting new work
Set and do not exceed WIP limits for the team.
- Total WIP should not exceed team size.
Keep the Kanban board visible at all times to monitor WIP

How to game it

Update incomplete work to “done” before it is delivered to production.
Create stories for each step of development instead of for value to be delivered.
Do not update work to “in progress” when working on it.

Metrics Overview

Quick Start

Goodhart's Law

CD Execution Metrics

Throughput

Key Metrics

Stability

Key Metrics

CI Execution Metrics

Key Metrics

Integration Frequency Guidelines

Workflow Management Metrics

Key Metrics

Metrics Usage Guide

Remember

1 - Metrics Quickstart

Why Metrics Matter

Goodhart's Law

Critical

The Essential Four Metrics

Day 1: Start Tracking

Step 1: Deployment Frequency (15 minutes)

Step 2: Development Cycle Time (20 minutes)

Step 3: Change Failure Rate (10 minutes)

Step 4: Mean Time to Repair (10 minutes)

End of Day 1: You’re Tracking!

Week 1: Make Metrics Visible

Create a Metrics Dashboard

Week 2: Add Supporting Metrics

Integration Frequency

Build Duration

Work In Progress (WIP)

Metric Groups: Use Together

Goodhart's Law

Speed vs. Quality Group

Flow vs. WIP Group

Analyzing Your Metrics

Week 1: Establish Baseline

Week 2-4: Identify Bottlenecks

Month 2+: Continuous Improvement

Common Pitfalls

❌ Pitfall 1: Gaming the Metrics

❌ Pitfall 2: Vanity Metrics

❌ Pitfall 3: Metric Theater

❌ Pitfall 4: Using Metrics as Punishment

Advanced Metrics (Month 2+)

Quality Metrics

Flow Metrics

Metric Automation Tools

Open Source

Commercial

CI/CD Native

Sample Implementation: Complete Script

Success Criteria

Next Steps

Further Reading

Remember

2 - Metrics Cheat Sheet

Organizational Metrics

Team Metrics

3 - Average Build Downtime

What is the intended behavior?

How to improve it

How to game it

Guardrail Metrics

4 - Build Duration

What is the intended behavior?

How to improve it

How to game it

Guardrail Metrics

5 - Change Fail Rate

What is the intended behavior?

How to improve it

How to game it

Guardrail Metrics

References

6 - Code Coverage

What is the intended behavior?

How to improve it

How to game it