Metrics are crucial for organizational improvement. Without measurement, improvement attempts are aimless. This guide outlines key metrics for Continuous Delivery (CD) and Continuous Integration (CI).
Focus improvement efforts on the group of metrics as a whole, not individual measures
Refer to the Metrics Cheat Sheet for a high-level view of key metrics, their intent, and appropriate usage
Remember
Metrics, like any tool, must be used correctly to drive the improvement we need. Focusing on a single metric can lead to unintended consequences and suboptimal outcomes.
1 - Metrics Quickstart
Set up essential CD metrics in one day and start improving delivery performance
This guide helps you quickly implement the minimum set of metrics needed to measure and improve your Continuous Delivery performance. Start tracking today, improve tomorrow.
Why Metrics Matter
Goodhart's Law
Without metrics, improvement is guesswork.
Metrics help you:
✅ Identify bottlenecks in your delivery process
✅ Measure improvement over time
✅ Make data-driven decisions about where to focus
✅ Demonstrate value to leadership
✅ Prevent regression when optimizing
Critical
Use metrics in groups, never alone. Optimizing a single metric leads to unintended consequences. Always use offsetting metrics to maintain balance.
The Essential Four Metrics
Start with these four DORA metrics that predict delivery performance:
These four metrics balance speed (cycle time, deployment frequency) with stability (change failure rate, MTTR).
Day 1: Start Tracking
Step 1: Deployment Frequency (15 minutes)
What it measures: How often you deploy to production
Simplest implementation:
# Add to your deployment script#!/bin/bash# deploy.shDEPLOY_TIME=$(date-u +"%Y-%m-%dT%H:%M:%SZ")SERVICE_NAME="my-service"# Your deployment logic here
kubectl apply -f deployment.yaml
# Log the deploymentecho"${DEPLOY_TIME},${SERVICE_NAME},${VERSION}">> /var/log/deployments.csv
# Or send to metrics servicecurl-X POST https://metrics.example.com/deployments \-d"{\"service\":\"${SERVICE_NAME}\",\"timestamp\":\"${DEPLOY_TIME}\",\"version\":\"${VERSION}\"}"
Query deployment frequency:
# Deployments per day (last 30 days)cat /var/log/deployments.csv |\awk -F',''{print $1}'|\cut -d'T'-f1|\sort|uniq-c|\awk'{total+=$1; count++} END {print total/count " deployments/day"}'
Step 2: Development Cycle Time (20 minutes)
What it measures: Time from starting work to deploying to production
Track using git commits + deployment log:
# cycle-time.sh#!/bin/bash# Get commits from last deployment to nowLAST_DEPLOY_TIME=$(tail-1 /var/log/deployments.csv |cut -d','-f1)# For each commit since last deploygit log --since="$LAST_DEPLOY_TIME"--format="%H|%ct|%s"|whileIFS='|'readhash timestamp message;do# Extract story ID from commit message (e.g., JIRA-123)STORY_ID=$(echo"$message"|grep-oE'[A-Z]+-[0-9]+'|head-1)if[-n"$STORY_ID"];then# Find when story was started (first commit with this ID)STARTED=$(git log --all--grep="$STORY_ID"--format="%ct"--reverse|head-1)CYCLE_TIME=$(((timestamp - STARTED)/3600))# hoursecho"${STORY_ID},${CYCLE_TIME}"fidone
Alternative: Track in your issue tracker
Most teams find it easier to track cycle time in Jira/GitHub Issues:
// Calculate cycle time from issue status transitionsconstcycleTime=(issue)=>{const startedAt = issue.transitions.find(t=> t.to ==='In Progress').timestamp;const deployedAt = issue.transitions.find(t=> t.to ==='Done').timestamp;const hours =(deployedAt - startedAt)/(1000*60*60);return hours /24;// Convert to days};
# Average time to repair (last 30 days)cat /var/log/incidents.csv |tail-n +2 |\awk -F',''{total+=$6; count++} END {print total/count " minutes (avg)"}'
Baseline Metrics (Week of [DATE])
Deployment Frequency: _____ deployments/day
Development Cycle Time: _____ days (average)
Change Failure Rate: _____%
Mean Time to Repair: _____ minutes
Week 1: Make Metrics Visible
Create a Metrics Dashboard
Option 1: Simple Spreadsheet (30 minutes)
Create a Google Sheet or Excel file:
Week
Deployments
Avg Cycle Time (days)
Change Failures
MTTR (min)
2025-W42
23
3.2
8.7%
127
2025-W43
31
2.8
6.5%
89
2025-W44
28
2.5
4.3%
62
Update weekly and display prominently (print and post, or share link).
// metrics.jsconst client =require('prom-client');const deploymentCounter =newclient.Counter({name:'deployments_total',help:'Total number of deployments',labelNames:['service','environment']});const cycleTimeHistogram =newclient.Histogram({name:'development_cycle_time_hours',help:'Development cycle time in hours',buckets:[2,4,8,16,24,48,96,168]// hours});const changeFailureCounter =newclient.Counter({name:'change_failures_total',help:'Number of failed changes',labelNames:['service']});// Record metrics
deploymentCounter.inc({service:'api',environment:'production'});
cycleTimeHistogram.observe(18.5);// 18.5 hours
changeFailureCounter.inc({service:'web'});
#!/bin/bash# metrics-collector.sh - Run daily via cronMETRICS_FILE="/var/log/cd-metrics.csv"TODAY=$(date +%Y-%m-%d)# 1. Deployment Frequency (today)DEPLOYMENTS_TODAY=$(grep"^${TODAY}" /var/log/deployments.csv |wc-l)# 2. Average Cycle Time (last 7 days)CYCLE_TIME=$(tail-50 /var/log/cycle-times.csv |\awk -F',''{total+=$2; count++} END {print total/count}')# 3. Change Failure Rate (last 30 days)TOTAL=$(tail-200 /var/log/deployments.csv |wc-l)FAILURES=$(tail-200 /var/log/deployments.csv |grep",false$"|wc-l)CFR=$(awk"BEGIN {printf \"%.1f\", ($FAILURES/$TOTAL)*100}")# 4. Mean Time to Repair (last 30 days)MTTR=$(tail-50 /var/log/incidents.csv |\awk -F',''NR>1 {total+=$6; count++} END {print total/count}')# Write to metrics fileecho"${TODAY},${DEPLOYMENTS_TODAY},${CYCLE_TIME},${CFR},${MTTR}">>"$METRICS_FILE"# Generate reportcat<<EOF
📊 CD Metrics Report - ${TODAY}
Deployment Frequency: ${DEPLOYMENTS_TODAY} deployments today
Avg Cycle Time: ${CYCLE_TIME} days
Change Failure Rate: ${CFR}%
Mean Time to Repair: ${MTTR} minutes
View full report: https://metrics.example.com/dashboard
EOF
Set up daily collection:
# Add to crontabcrontab-e# Run daily at 11:59 PM5923 * * * /usr/local/bin/metrics-collector.sh
Success Criteria
After implementing metrics, you should have:
✅ All four DORA metrics tracked automatically
✅ Baseline established (1 week of data)
✅ Dashboard visible to the entire team
✅ Weekly review scheduled
✅ Improvement targets defined
✅ Metric groups balanced (speed + quality)
Next Steps
Automate collection - Stop manual tracking
Add visualizations - Trend lines, not just numbers
Set up alerts - Get notified when metrics degrade
Correlate metrics - Find relationships between metrics
Share widely - Make metrics visible to stakeholders
Metrics are a means, not an end. The goal is to improve delivery performance, not to hit specific metric targets. Use metrics to guide improvement, celebrate progress, and maintain focus on outcomes.
2 - Metrics Cheat Sheet
Quick reference guide for key CD metrics with targets and improvement strategies
Organizational Metrics
These metrics are important for teams and management to track the health of the delivery system
Time from when a story is started until marked “done”
Reduce the size of work to improve the feedback from the end user on the value of the work and to improve the quality of the acceptance criteria and testing
The number of items in progress on the team relative to the size of the team
Reduce the number of items in progress so that the team can focus on completing work vs/ being busy.
Delivery frequency should not degrade
Team Metrics
These metrics should only be used by teams to inform decision making. They are ineffective for measuring quality, productivity, or
delivery system health.
The average amount of the backlog delivered during a sprint by the team. Used by the product team for planning. There is no such thing as good or bad velocity.
3 - Average Build Downtime
Time the build stays broken before being fixed - measures team discipline and CI commitment
The average length of time between when a build breaks and when it is fixed.
What is the intended behavior?
Keep the pipelines always deployable by fixing broken builds as rapidly as possible. Broken builds are the highest priority since
they prevent production fixes from being deployed in a safe, standard way.
How to improve it
Refactor to improve testability and modularity.
Improve tests to locate problems more rapidly.
Decrease the size of the component to reduce complexity.
Add automated alerts for broken builds.
Ensure the proper team practice is in place to support each other in solving the problem as a team.
How to game it
Re-build the previous version.
Remove tests that are failing.
Guardrail Metrics
Metrics to use in combination with this metric to prevent unintended consequences.
Integration Frequency decreases as additional manual or automated process overhead is
added before integration to trunk.
4 - Build Duration
Time for CI pipeline to complete - critical for fast feedback and should be under 10 minutes
The time from code commit to production deploy. This is the minimum time changes can be applied to production. This is
referenced as “hard lead time” in Accelerate
What is the intended behavior?
Reduce pipeline duration to improve MTTR and improve test efficiency to
give the team more rapid feedback to any issues. Long build cycle times delay quality feedback
and create more opportunity for defect penetration.
How to improve it
Identify areas of the build that can run concurrently.
Replace end to end tests in the pipeline with virtual services and move end to end testing to an asynchronous process.
Break down large services into smaller sub-domains that are easier and faster to build / test.
Add alerts to the pipeline if a maximum duration is exceeded to inform test refactoring priorities.
How to game it
Reduce the number of tests running or test types executed.
Guardrail Metrics
Metrics to use in combination with this metric to prevent unintended consequences.
Defect rates increase if quality gates are skipped to reduce build time.
5 - Change Fail Rate
Percentage of changes that result in degraded service or require remediation - a key DORA stability metric
The percentage of changes that result in negative customer impact, or rollback.
changeFailRate = failedChangeCount / changeCount
What is the intended behavior?
Reduce the percentage of failed changes.
How to improve it
Release more, smaller changes to make quality steps more effective and reduce the impact of failure.
Identify root cause for each failure and improve the automated quality checks.
How to game it
Deploy fixes without recording the defect.
Create defect review meetings and re-classify defects as feature requests.
Re-deploy the latest working version to increase deploy count.
Guardrail Metrics
Metrics to use in combination with this metric to prevent unintended consequences.
Delivery frequency can decrease if focus is placed on “zero defect” changes.
Defect rates can increase as reduced delivery frequency increases code change batch size and delivery risk.
Percentage of code exercised by tests - useful indicator but can be gamed, use with caution
A measure of the amount of code that is executed by test code.
What is the intended behavior?
Inform the team of risky or complicated portions of the code that are not sufficiently covered by tests. Care should be
taken not to confuse high coverage with good testing.
How to improve it
Write tests for code that SHOULD be covered but isn’t
Refactor the application to improve testability
Remove unreachable code
Delete pointless tests
Refactor tests to test behavior rather than implementation details
How to game it
Tests are written for code that receives no value from testing.
Test code is written without assertions.
Tests are written with meaningless assertions.
Example: The following test will result in 100% function, branch, and line coverage with no behavior tested.
/* Return the sum of two integers *//* Return null if one of that parms is not an integer */functionaddWholeNumbers(a, b){if(a %1===0&& b %1===0){return a + b;}else{returnnull;}}it('Should not return null of both numbers are integers'()=>{/*
* This call will return 4, which is not null.
* Pass
*/expect(addWholeNumbers(2,2)).not.toBe(null);/*
* This returns "22" because JS sees a string will helpfully concatenate them.
* Pass
*/expect(addWholeNumbers(2,'2')).not.toBe(null);/*
* The function will never return the JS `NaN` constant
* Pass
*/expect(addWholeNumbers(1.1,0)).not.toBe(NaN);})
The following is an example of test code with no assertions. This will also produce 100% code coverage reporting but does not test anything because there are no assertions to cause the test to fail.
it('Should not return null if both numbers are integers'()=>{addWholeNumbers(2,2);addWholeNumbers(2,'2');addWholeNumbers(1.1,0);})
Guardrail Metrics
Test coverage should never be used as a goal or an indicator of application health. Measure outcomes. If testing is poor, the following metrics will show poor results.
Defect Rates will increase as poor-quality tests are created to meet coverage targets that do not reliably catch defects.
Development Cycle Time will increase as more emphasis is placed on improper testing methods (manual functional testing, testing teams, etc.) to overcome the lack of reliable tests.
7 - Code Inventory
Amount of code written but not yet delivered to production - represents unrealized value and risk
The lines of code that have been changed but have not been delivered to production. This can be measured at several points in the
delivery flow, starting with code not merged to trunk.
What is the intended behavior?
Reduce the size of individual changes and reduce the duration of branches to improve quality feedback. We also want to
eliminate stale branches that represent risk of lost change or merge conflicts that result in additional
manual steps that add risk.
How to improve it
Improve continuous integration behavior where changes are integrated to the trunk and
verified multiple times per day.
How to game it
Use forks to hide changes.
Guardrail Metrics
Metrics to use in combination with this metric to prevent unintended consequences.
Quality can decrease as quality steps are skipped or batch size increases.
8 - Defect Rate
Measure of escaped defects found in production, indicating test effectiveness and quality processes
Defect rates are the total number of defects by severity reported for a period of time.
Defect count / Time range
What is the intended behavior?
Use defect rates and trends to inform improvement of upstream quality processes.
Defect rates in production indicate how effective our overall quality process is. Defect rates in lower environments inform us of
specific areas where quality process can be improved. The goal is to push detection closer to the developer.
How to improve it
Track trends over time and identify common issues for the defects Design test design changes that would reduce the time
to detect defects.
How to game it
Mark defects as enhancement requests
Don’t track defects
Deploy changes that do not modify the application to improve the percentage
Guardrail Metrics
Metrics to use in combination with this metric to prevent unintended consequences.
Delivery frequency is reduced if too much emphasis is place on zero defects. This can be
self-defeating as large change batches will contain more defects.
9 - Delivery Frequency
How often changes are deployed to production - a key DORA metric measuring throughput and team capability
How frequently per day the team releases changes to production.
What is the intended behavior?
Small changes deployed very frequently to exercise the ability to fix production
rapidly, reduce MTTR, increase quality, and reduce risk.
How often code is integrated to trunk/main - indicator of CI practice maturity and team collaboration
The average number of production-ready pull requests a team closes per day, normalized by the number of developers on
the team. On a team with 5 developers, healthy CI practice is
at least 5 per day.
What is the intended behavior?
Increase the frequency of code integration
Reduce the size of each change
Improve code review processes
Remove unneeded processes
Improve quality feedback
How to improve it
Decompose code changes into smaller units to incrementally deliver features.
Total time from customer request to delivery in production - measures entire value stream efficiency
This shows the average time it takes for a new request to be delivered. This is
measured from the creation date to release date for each unit of work and includes Development Cycle Time.
What is the intended behavior?
Identify over utilized teams, backlogs that need more Product Owner attention,
or in conjunction with velocity to help teams optimize their processes.
How to improve it
Relentlessly remove old items from the backlog.
Improve team processes to reduce Development Cycle Time.
Use Innersourcing to allow other teams to help when surges of work arrive.
Re-assign, carefully, some components to another team to scale delivery.
How to game it
Requests can be tracked in spreadsheet or other locations and then added to
the backlog just before development. This can be identified by decreased
customer satisfaction.
Reduce feature refining rigour.
Guardrail Metrics
Metrics to use in combination with this metric to prevent unintended consequences.
Quality is reduced if less time is spent refining and defining
testable requirements.
Average time to restore service after an incident - a key DORA stability metric measuring recovery capability
Mean Time to Repair is the average time between when a incidents is
detected and when it is resolved.
“Software delivery performance is a combination of three metrics: lead time, release
frequency, and MTTR. Change fail rate is not included, though it
is highly correlated.”
Comprehensive view of quality indicators including defects, test coverage, and technical debt
Quality is measured as the percentage of finished work that is unused, unstable, unavailable, or defective according to the end user.
What is the intended behavior?
Continuously improve the quality steps in the construction process, reduce the size of delivered change, and increase
the speed of feedback from the end user. Improving this cycle improves roadmap decisions.
How to improve it
Add automated checks to the pipeline to prevent re-occurrence of root causes.
Only begin new work with testable acceptance criteria.
Accelerate feedback loops at every step to alert to quality, performance, or availability issues.
How to game it
Log defects as new features
Guardrail Metrics
Metrics to use in combination with this metric to prevent unintended consequences.
[Delivery frequency may be reduced if more manual quality steps are added
Build cycle time may increase as additional tests are added to the pipeline
Lead time can increase as more time is spent on business analysis
15 - Velocity / Throughput
Amount of work completed per iteration - team capacity planning metric that should be used carefully, not as productivity measure
The average amount of the backlog delivered during a sprint by the team. Used by the product team for planning. There is no such thing as good or bad velocity. This is commonly misunderstood to be a productivity metric. It is not.
What is the intended behavior?
After a team stabilizes, the standard deviation should be low. This will enable realistic planning of future
deliverables based on relative complexity. Find ways to increase this over time by reducing waste, improving planning,
and focusing on teamwork.
How to improve it
Reduce story size so they are easier to understand and more predictable.
Minimize hard dependencies. Each hard dependency reduces the odds of on-time
delivery by 50%.
Swarm stories by decomposing them into tasks that can be executed in parallel so that the team is working as a unit to deliver faster.
How to game it
Cherry pick easy, low priority items.
Increase story points
Skip quality steps.
Prematurely sign-off work only to have defects reported later.
Guardrail Metrics
Metrics to use in combination with this metric to prevent unintended consequences.
Quality defect ratio goes up as more defects are reported.
WIP increases as teams start more work to look more
busy.