1 - Metrics Overview

Metrics are key to organizational improvement. If we do not measure, then any attempt at improvement is aimless. Metrics, like any tool, must be used correctly to drive the improvement we need. It’s important to use metrics in offsetting groups and to focus improvement efforts on the group of metrics as a whole, not as individual measures.

The Metrics Cheat Sheet has a high level view of the key metrics, their intent, and how to use them appropriately.

Goodhart’s Law

CD Execution

When measuring the performance of continuous delivery, we are measuring our ability to reliably and sustainably deliver high quality changes. We do this by focusing on very frequent small batches of high quality delivery.

  • Change frequency is important to make sure that waste is driven out of the process. This reduces costs, improves the sustainability of flow, and ensures there is a verified quality process for emergency changes.

  • Small batches are easier to inspect for quality and limit the impact of any quality issues.

  • Change success is an important offsetting metric. If we only focus on change size and change frequency, quality will suffer. If we only focus on reducing the number of defects and eliminating impacting changes, batch size and frequency suffer. The data shows that this actually results in more defects and more costs.

  • Throughput

    • Development Cycle Time: Time from when a task is started until it is “Done”. The suggested definition of “Done” is delivered to production. KPI for how big a unit of work is. Indicator of possible upstream quality issues with requirements definition and teamwork.
    • Delivery Frequency: KPI for batch size, cost, and efficient quality process.
  • Stability

    • Change Failure Rate: Percentage of changes that require remediation. KPI for effectiveness of the quality process.
    • Defect Rate: Rate of defect creation over time relative to change frequency, generally P1 and P2.
    • Mean Time to Repair: KPI for the maturity of our disaster response preparations and the forethought to design for recovery instead of just for delivery.

CI Execution

Continuous delivery stands on the bedrock of continuous integration. If code is not continuously integrating, it cannot be safely delivered.

The focus of CI is to amplify quality feedback. The more frequently code is integrated and tested, the sooner any quality issues will be found and the smaller those issues will be.

  • Integration Frequency: Frequency of code integrating to the trunk. KPI for efficacy of refining requirements, quality process, and teamwork.
    • When a team is mob programming, this should occur several times a day.
    • When a team is pair programming, this should occur several times a day per pair.
    • When the team is working on individual tasks, this should occur several times a day per developer.
  • Build Cycle Time: Time from commit to production deploy. KPI for the stability of the pipeline and efficiency of the quality process. Long build cycle times have a direct negative impact on MTTR, and batch size. This encourages abandoning defined quality checks in emergency situations making emergency changes the riskiest changes to make.

Workflow Management

  • Velocity / Throughput: Planning metric to allow the team to predict date ranges for delivery. The standard deviation of this metric is a KPI for predictability. The average value of the metric has no meaning outside the team.
  • Lead Time: Total time from when a request is made until it is delivered. KPI for team over-utilization. As the team’s utilization approaches 100%, this metric approaches infinity.
  • Work In Process (WIP): Key flow metric. Excessive WIP results in rework and delivery delays.

2 - Metrics Cheat Sheet

Organizational Metrics

These metrics are important for teams and management to track the health of the delivery system

Metric Meaning Goal of Measuring Guardrail Metrics
Integration/Merge Frequency How frequently code changes are integrated to the trunk for testing Reduce the size of change to improve quality and reduce risk Defect Rates should not increase
Build Cycle Time Total duration from commit to production delivery Improve the ability to deliver changes to improve feedback and reduce MTTR Defect Rates should not increase
Change Fail % The % of production deploys that are reverted Improve the upstream quality processes Development Cycle Time should not increase
Code Inventory Lines of code added or removed that have not been delivered to production Reduce the amount of code inventory and move closer to Just In Time delivery. Change Fail % & Defect Rate should not increase
Defect Rate Number of defects created during a set interval Improve the quality processes in the delivery flow Delivery Frequency should not reduce
Development Cycle Time Time from when a story is started until marked “done” Reduce the size of work to improve the feedback from the end user on the value of the work and to improve the quality of the acceptance criteria and testing Defect Rate should not increase
MTTR The time from when customer impact begins until it is resolved Improve the stability and resilience of both the application and the system of delivery Quality should not decrease
Delivery Frequency The frequency that changes are delivered to production Reduce the size of delivered change, improve the feedback loop on quality and increase the speed of value delivery. Defect Rates should not degrade
Work in Progress The number of items in progress on the team relative to the size of the team Reduce the number of items in progress so that the team can focus on completing work vs/ being busy. Delivery frequency should not degrade

Team Metrics

These metrics should only be used by teams to inform decision making. They are ineffective for measuring quality, productivity, or delivery system health.

Metric Meaning Goal of Measuring Issues with Metric
Code Coverage The % of code that us executed by test code Prevent unexpected reduction of code coverage. Find code that should be better tested When coverage goals are set, can generate tests that meet the goals but are ineffective as tests.
Velocity/Throughput The average amount of the backlog delivered during a sprint by the team. Used by the product team for planning. There is no such thing as good or bad velocity.

3 - Average Build Downtime

The average length of time between when a build breaks and when it is fixed.

What is the intended behavior?

Keep the pipelines always deployable by fixing broken builds as rapidly as possible. Broken builds are the highest priority since they prevent production fixes from being deployed in a safe, standard way.

How to improve it

  • Refactor to improve testability and modularity.
  • Improve tests to locate problems more rapidly.
  • Decrease the size of the component to reduce complexity.
  • Add automated alerts for broken builds.
  • Ensure the proper team practice is in place to support each other in solving the problem as a team.

How to game it

  • Re-build the previous version.
  • Remove tests that are failing.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Integration Frequency decreases as additional manual or automated process overhead is added before integration to trunk.

4 - Build Cycle Time

The time from code commit to production deploy. This is the minimum time changes can be applied to production. This is referenced as “hard lead time” in Accelerate

What is the intended behavior?

Reduce pipeline duration to improve MTTR and improve test efficiency to give the team more rapid feedback to any issues. Long build cycle times delay quality feedback and create more opportunity for defect penetration.

How to improve it

  • Identify areas of the build that can run concurrently.
  • Replace end to end tests in the pipeline with virtual services and move end to end testing to an asynchronous process.
  • Break down large services into smaller sub-domains that are easier and faster to build / test.
  • Add alerts to the pipeline if a maximum duration is exceeded to inform test refactoring priorities.

How to game it

  • Reduce the number of tests running or test types executed.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Defect rates increase if quality gates are skipped to reduce build time.

5 - Change Fail Rate

The percentage of changes that result in negative customer impact, or rollback.

changeFailRate = failedChangeCount / changeCount

What is the intended behavior?

Reduce the percentage of failed changes.

How to improve it

  • Release more, smaller changes to make quality steps more effective and reduce the impact of failure.
  • Identify root cause for each failure and improve the automated quality checks.

How to game it

  • Deploy fixes without recording the defect.
  • Create defect review meetings and re-classify defects as feature requests.
  • Re-deploy the latest working version to increase deploy count.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Delivery frequency can decrease if focus is placed on “zero defect” changes.
  • Defect rates can increase as reduced delivery frequency increases code change batch size and delivery risk.

References

6 - Code Coverage

Measure of the amount of code that is executed by test code.

What is the intended behavior?

Inform the team of risky or complicated portions of the code that are not sufficiently covered by tests. Care should be taken not to confuse high coverage with good testing.

How to improve it

  • Write tests for code that SHOULD be covered but isn’t.
  • Refactor the application to improve testability.
  • Remove unreachable code.

How to game it

  • Tests are written for code that receives no value from testing.
  • Test code is written without assertions.
  • Code is inappropriately excluded from test coverage reporting.

Example: The following test will result in 100% function, branch, and line coverage with no behavior tested.

/* Returns the sum of two integers */
/* Returns NaN for non-integers */
function addWholeNumbers(a, b) {

  if (a % 1 === 0 && b % 1 === 0) {
    return a + b;
  } else {
    return NaN;
  }
}

it('Should add two whole numbers' () => {
  expect(addWholeNumbers(2, 2)).to.not.be.NaN;
  expect(addWholeNumbers(1.1, 0)).to.not.be.null;
})

The following will report the same code coverage results

it('Should add two whole numbers' () => {
  addWholeNumbers(2, 2)
  addWholeNumbers(1.1, 0)
})

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Development Cycle Time increases with additional development time dedicated to chasing the coverage metric.
  • Defect rates can increase as poor quality tests are created to meet the coverage minimums.

7 - Code Integration Frequency

The average number of production-ready pull requests a team closes per day, normalized by the number of developers on the team. On a team with 5 developers, healthy CI practice is at least 5 per day.

What is the intended behavior?

  • Increase the frequency of code integration
  • Reduce the size of each change
  • Improve code review processes
  • Remove unneeded processes
  • Improve quality feedback

How to improve it

  • Decompose code changes into smaller units to incrementally deliver features.
  • Use BDD to aid functional breakdown.
  • Use TDD to design more modular code that can be integrated more frequently.
  • USe feature flags, branch by abstraction, or other coding techniques to control the release of new features.

How to game it

  • Meaningless changes integrated to trunk.
  • Breaking changes integrated to trunk.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Quality decreases if testing is skipped.

8 - Code Inventory

The lines of code that have been changed but have not been delivered to production. This can be measured at several points in the delivery flow, starting with code not merged to trunk.

What is the intended behavior?

Reduce the size of individual changes and reduce the duration of branches to improve quality feedback. We also want to eliminate stale branches that represent risk of lost change or merge conflicts that result in additional manual steps that add risk.

How to improve it

  • Improve continuous integration behavior where changes are integrated to the trunk and verified multiple times per day.

How to game it

  • Use forks to hide changes.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Quality can decrease as quality steps are skipped or batch size increases.

9 - Defect Rate

Defect rates are the total number of defects by severity reported for a period of time.

Defect count / Time range

What is the intended behavior?

Use defect rates and trends to inform improvement of upstream quality processes.

Defect rates in production indicate how effective our overall quality process is. Defect rates in lower environments inform us of specific areas where quality process can be improved. The goal is to push detection closer to the developer.

How to improve it

Track trends over time and identify common issues for the defects Design test design changes that would reduce the time to detect defects.

How to game it

  • Mark defects as enhancement requests
  • Don’t track defects
  • Deploy changes that do not modify the application to improve the percentage

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Delivery frequency is reduced if too much emphasis is place on zero defects. This can be self-defeating as large change batches will contain more defects.

10 - Delivery Frequency

How frequently per day the team releases changes to production.

What is the intended behavior?

Small changes deployed very frequently to exercise the ability to fix production rapidly, reduce MTTR, increase quality, and reduce risk.

How to improve it

How to game it

  • Re-deploying the same artifact repeatedly.
  • Building new artifacts that contain no changes.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Change Fail Rate increases as focus shifts to speed instead of quality.
  • Quality decreases if steps are skipped in refining work for the sake of output.

11 - Development Cycle Time

The average time from starting work until release to production.

What is the intended behavior?

Reduce the time it takes to deliver refined work to production to mitigate the effects of priorities changing and get rapid feedback on quality.

How to improve it

  • Decompose work so it can be delivered in smaller increments and by more team members.
  • Identify and remove process waste, handoffs, and delays in the construction process.
  • Improve test design.
  • Automate and standardize the build and deploy pipeline.

How to game it

  • Move things to “Done” status that are not in production.
  • Move items directly from “Backlog” to “Done” after deploying to production.
  • Split work into functional tasks that should be considered part of development (development task, testing task, etc.).

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Quality decreases if quality processes are skipped.
  • Standard deviation of the control chart can show issues being closed too rapidly.

References

12 - Lead Time

This shows the average time it takes for a new request to be delivered. This is measured from the creation date to release date for each unit of work and includes Development Cycle Time.

What is the intended behavior?

Identify over utilized teams, backlogs that need more Product Owner attention, or in conjunction with velocity to help teams optimize their processes.

How to improve it

Relentlessly remove old items from the backlog. Improve team processes to reduce Development Cycle Time. Use Innersourcing to allow other teams to help when surges of work arrive. Re-assign, carefully, some components to another team to scale delivery.

How to game it

  • Requests can be tracked in spreadsheet or other locations and then added to the backlog just before development. This can be identified by decreased customer satisfaction.
  • Reduce feature refining rigour.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Quality is reduced if less time is spent refining and defining testable requirements.

References

13 - MTTR

Mean Time to Repair is the average time between when a incidents is detected and when it is resolved.

“Software delivery performance is a combination of three metrics: lead time, release frequency, and MTTR. Change fail rate is not included, though it is highly correlated.”

“Accelerate” uses Lead Time for Development Cycle Time.

What is the intended behavior?

Improve the ability to more rapidly resolve system instability and service outages.

How to improve it

  • Make sure the pipeline alway deployable.
  • Keep build cycle time short to allow roll-forward.
  • Implement feature flags for larger feature changes to allow the them to be deactivated without re-deploying.
  • Identify stability issues and prioritize them in the backlog.

How to game it

  • Updating support incidents to “closed” before service is restored.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Quality decreases if issues re-occur due to lack of improving pipeline quality gates.

References

14 - Quality

Quality is measured as the percentage of finished work that is unused, unstable, unavailable, or defective according to the end user.

What is the intended behavior?

Continuously improve the quality steps in the construction process, reduce the size of delivered change, and increase the speed of feedback from the end user. Improving this cycle improves roadmap decisions.

How to improve it

  • Add automated checks to the pipeline to prevent re-occurrence of root causes.
  • Only begin new work with testable acceptance criteria.
  • Accelerate feedback loops at every step to alert to quality, performance, or availability issues.

How to game it

  • Log defects as new features

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

15 - Velocity / Throughput

The average amount of the backlog delivered during a sprint by the team. Used by the product team for planning. There is no such thing as good or bad velocity. This is commonly misunderstood to be a productivity metric. It is not.

What is the intended behavior?

After a team stabilizes, the standard deviation should be low. This will enable realistic planning of future deliverables based on relative complexity. Find ways to increase this over time by reducing waste, improving planning, and focusing on teamwork.

How to improve it

  • Reduce story size so they are easier to understand and more predictable.
  • Minimize hard dependencies. Each hard dependency reduces the odds of on-time delivery by 50%.
  • Swarm stories by decomposing them into tasks that can be executed in parallel so that the team is working as a unit to deliver faster.

How to game it

  • Cherry pick easy, low priority items.
  • Increase story points
  • Skip quality steps.
  • Prematurely sign-off work only to have defects reported later.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Quality defect ratio goes up as more defects are reported.
  • WIP increases as teams start more work to look more busy.

References

Harvard Business Review: Six Myths of Product Development Scrum.org: Velocity

16 - WIP

Work in Progress (WIP) is the total work that has been started but not completed. This includes all work, defects, tasks, stories, etc.

What is the intended behavior?

Focus the team on finishing work and delivering it rather than switching between tasks but not finishing them.

How to improve it

  • The team should focus on finishing items closest to being ready for production.
    • Prioritize code review over starting new work
    • Prioritize pairing to solve a problem over starting new work
  • Set and do not exceed WIP limits for the team.
    • Total WIP should not exceed team size.
  • Keep the Kanban board visible at all times to monitor WIP

How to game it

  • Update incomplete work to “done” before it is delivered to production.
  • Create stories for each step of development instead of for value to be delivered.
  • Do not update work to “in progress” when working on it.