MTTR

Mean Time to Repair is the average time between when a incidents is detected and when it is resolved.

“Software delivery performance is a combination of three metrics: lead time, release frequency, and MTTR. Change fail rate is not included, though it is highly correlated.”

“Accelerate” uses Lead Time for Development Cycle Time.

What is the intended behavior?

Improve the ability to more rapidly resolve system instability and service outages.

How to improve it

  • Make sure the pipeline alway deployable.
  • Keep build cycle time short to allow roll-forward.
  • Implement feature flags for larger feature changes to allow the them to be deactivated without re-deploying.
  • Identify stability issues and prioritize them in the backlog.

How to game it

  • Updating support incidents to “closed” before service is restored.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

  • Quality decreases if issues re-occur due to lack of improving pipeline quality gates.

References

Last modified February 2, 2022: Switch to hugo (#2) (a244f74)