This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Flow Improvement Learning Path

1: Glossary
2: Starting CD

2.1: Common Blockers
2.2: Pipeline & Application Architecture

3: Metrics Overview

3.1: Metrics Cheat Sheet
3.2: Average Build Downtime
3.3: Build Cycle Time
3.4: Change Fail Rate
3.5: Code Coverage
3.6: Code Integration Frequency
3.7: Code Inventory
3.8: Defect Rate
3.9: Delivery Frequency
3.10: Development Cycle Time
3.11: Lead Time
3.12: MTTR
3.13: Quality
3.14: Velocity / Throughput
3.15: WIP

4: Team Workflow

4.1: Code Review
4.2: Source Management
4.3: Source Ownership
4.4: Definition of Done
4.5: Retrospectives
4.6: Unplanned Work
4.7: Visualizing Workflow
4.8: Work in Progress

5: Designing Tests for CD

5.1: Testing Terms Glossary
5.2: E2E Testing
5.3: Functional Testing
5.4: Test Doubles
5.5: Testing Best Practices
5.6: Contract Testing
5.7: Customer Experience Alarms
5.8: Integration Testing
5.9: Static Testing
5.10: Unit Testing

6: Work Decomposition

6.1: From Roadmap to User Story
6.2: Work Decomposition
6.3: Behavior Driven Development
6.4: Task Decomposition
6.5: Contract Driven Development
6.6: Defining Product Goals
6.7: Definition of Ready
6.8: Spikes
6.9: Story Slicing

7: 24 Capabilities to Drive Improvement
8: Value Stream Mapping
9: Cloud Native Checklist

These are the core skills we recommend everyone learn to execute CD.

Behavior-Driven Development

Every step in CD requires clear, testable acceptance criteria as a prerequisite. BDD is not test automation. BDD is the discussion that informs acceptance test driven development.

Videos
- What is BDD Dave Farley co-author of Continuous Delivery - 16:28 min
- Acceptance Testing By Dave Farley - 14:49 min
Recommended Reading
- BDD In Action by John Ferguson Smart
- Behavior-Driven Development with Cucumber: Better Collaboration for Better Software by Richard Lawrence, Paul Rayner

Continuous Integration

Continuous integration is a requirement for CD. It requires very frequent integration of non-breaking code.

Videos
- Top 10 Rules For Continuous Integration Dave Farley - 17 min.
- Continuous-Integration-Practices on Linkedin Learning. Instructed by Ernest Mueller and James Wickett - 4 min.
- Continuous-Integration on Linkedin Learning. Instructor by Laura Stone - 4 min.
Recommended Reading
- Continuous Integration: Improving Software Quality and Reducing Risk by Paul M. Duvall, Steve Matyas, Andrew Glover.

Conway’s law

“Any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.” - Melvin Conway

Loosely coupled teams create loosely coupled systems. The opposite is also true.

Videos
- Don’t Forget Conway’s Law Sarah Novotny - 8:50 mins.
Recommended Reading
- Building Microservices - Ch 10 by Sam Newman.

Domain-Driven Design

This is another key design tool both for organizational and system design. This is a critical skill for developing microservices.

Videos
- What is DDD Eric Evans - 57:06 min.
- Software Architecture: Domain-Driven Design LinkedIn Training Course.
Recommended Reading
- What Is Domain-Driven Design? by Vladik Khononov.

Pipeline Steps

Architecting a system of delivery is about designing efficient quality gates for the system’s context.

Videos
- Understanding A DevOps Pipeline David Farley - 13:24 mins.
Recommended Reading
- Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation Jez Humble and David Farley

Test-Driven Development

TDD highly correlates with application architecture that is easy to maintain and easy to upgrade.

Videos
- Does TDD Lead to Better Software Design? Dave Farley co-author of Continuous Delivery - 18:32 min.
- Three Mindsets of TDD Dave Farley co-author of Continuous Delivery - 18:57 min.
- TDD and DDD with .NET Core and VSCode - 1 hour
Recommended Reading
- Test Driven Development: By Example by Kent Beck.

Three Ways

The core principles that define DevOps:

Consider the system of delivery as a whole
Amplify feedback loops
Continuously learn and improve the delivery system

Videos
- The 3 Ways of The Phoenix Project co-author Gene Kim - 3:30 mins.
Recommended Reading
- The Three Ways: The Principles Underpinning DevOps by Gene Kim
- The DevOps Handbook - Gene Kim et al

Value Stream Mapping

The primary process analysis tool used to help identify and attack constraints to delivery.

Videos
- How we used Value Stream Mapping to accelerate DevOps adoption Marcus Robinson - 45:26 min.
Recommended Reading
- Value Stream Mapping: How to Visualize Work and Align Leadership for Organizational Transformation - Karen Martin and Mike Osterling

Wastes

Our goal is to remove waste daily. We must first learn to recognize it.

Videos
- The 7 Types of Waste in Software Development Alex Green - 10:34 mins.
Recommended Reading
- Making Work Visible by Dominica DeGrandis.
- The Art of Lean Software Development by Curt Hibbs; Mike Sullivan; Steve Jewett.

1 - Glossary

Continuous Delivery

The ability to deliver the latest changes to production on demand.

Continuous Deployment

Delivering the latest changes to production as they occur.

Continuous Integration

Continuous integration requires that every time somebody commits any change, the entire application is built and a comprehensive set of automated tests is run against it. Crucially, if the build or test process fails, the development team stops whatever they are doing and fixes the problem immediately. The goal of continuous integration is that the software is in a working state all the time.

Continuous integration is a practice, not a tool. It requires a degree of commitment and discipline from your development team. You need everyone to check in small incremental changes frequently to mainline and agree that the highest priority task on the project is to fix any change that breaks the application. If people don’t adopt the discipline necessary for it to work, your attempts at continuous integration will not lead to the improvement in quality that you hope for.

– “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation.” - Jez Humble & David Farley

You can find recommended practices for CI at MimimumCD.org

Hard Dependency

A hard dependency is something that must be in place before a feature is delivered. In most cases, a hard dependency can be converted to a soft dependency with feature flags.

Soft Dependency

A soft dependency is something that must be in place before a feature can be fully functional, but does not block the delivery of code.

Story Points

A measure of the relative complexity of delivering a story. Historically, 1 story point was 1 “ideal day”. An ideal day is a day where there are no distractions, the code is flowing, and we aren’t waiting on anything. No such day exists. :wink:

There are many common story point dysfunctions: pointing defects, unplanned work, and spikes are some of the more common. Adjusting points after work is done is another common mistake. The need for story points is a good indication that we do not understand the work. If we have decomposed the work correctly, everything should be 1 point.

Toil

The repetitive, predictable, constant stream of tasks related to maintaining an application.

SRE Workbook: Eliminating Toil

Unplanned Work

Any work that the team inserts before the current planned work. Critical defects and “walk up” requests are unplanned work. It’s important that the team track all unplanned work and the reason so that steps can be taken by the team to reduce the future impact.

Vertical Sliced Story

A story should represent a response to a request that can be deployed independently of other stories. It should be aligned across the tech stack so that no other story needs to be deployed in concert to make the function work.

Examples:

Submitting a search term and returning results.
Requesting user information from a service and receiving a response.

WIP

Work in progress is any work that has been started but not delivered to the end-user

2 - Starting CD

Migrating your system to Continuous Delivery

Continuous Delivery (CD) is the ability to deliver the latest changes on-demand, with no human touchpoints between code integration and production delivery.

Overview

Continuous Delivery extends beyond automation. It encompasses the entire cycle of identifying value, delivering it, and verifying with the end-user that the expected value was delivered.

Goals

CD aims to:

Uncover external dependencies and organizational process issues
Reduce overhead
Improve quality feedback
Enhance end-user outcomes and team work/life balance

CD Maturity

While avoiding rigid “maturity models,” we can outline competency levels:

Minimums

Daily integration of tested changes to the trunk
Consistent delivery process for all changes
No manual quality gates
Same artifact used in all environments

Good

New work delivered in less than 2 days
All changes delivered from the trunk
Commit-to-production time under 60 minutes
Less than 5% of changes require remediation
Service restoration time under 60 minutes

Continuous Integration (CI)

CI Working Agreement

Branches originate from and are deleted within 24 hours
Changes must pass existing tests before merging
Team prioritizes completing work in progress over starting new work
Fixing a broken build is the highest priority

Desired Outcomes

More frequent integration of smaller, higher quality changes
Efficient test architecture
Lean code review process
Reduced Work In Progress (WIP)

Continuous Delivery/Deploy

Aims to achieve:

Increased delivery frequency and stability
Improved deploy success and time to restore service
Reduced development cycle time and process waste
Smaller, less risky production releases
High-performing product teams with domain expertise

Recommended Practices

Conduct a Value Stream Map
Build a roadmap to remove constraints
Align with the CI working agreement
Implement a single CD automated pipeline per repository

Note

A valid CD process has only one method to build and deploy any change. Deviations indicate an incomplete process that puts the team and business at risk.

Pipeline Best Practices

Focus on hardening the pipeline to block bad changes
Integrate outside the pipeline, virtualize inside
Limit stage gates (ideally one or fewer)
Developers own the full pipeline

Key Metrics

CI cycle time: < 10 minutes from commit to artifact creation
CD cycle time: < 60 minutes from commit to Production

Tips

Use trunk merge frequency, development cycle time, and delivery frequency to uncover pain points
Keep all metrics visible and refer to them often
See CD best practices and CD Roadblocks for more tips

2.1 - Common Blockers

The following are very frequent issues that teams encounter when working to improve the flow of delivery.

Work Breakdown

Stories without testable acceptance criteria

All stories should be defined with declarative and testable acceptance criteria. This reduces the amount of waiting and rework once coding begins and enables a much smoother testing workflow.

Acceptance criteria should define “done” for the story. No behavior other than that specified by the acceptance criteria should be implemented. This ensures we are consistently delivering what was agreed to.

Stories too large

It’s common for teams using two week sprints to have stories that require five to ten days to complete. Large stories hide complexity, uncertainty, and dependencies.

Stories represent the smallest user observable behavior change.
To enable rapid feedback, higher quality acceptance criteria, and more predictable delivery, Stories should require no more than two days for a team to deliver.

No definition of “ready”

Teams should have a working agreement about the definition of “ready” for a story or task. Until the team agrees it has the information it needs, no commitments should be made and the story should not be added to the “ready” backlog.

Definition of Ready

- Story
  - Acceptance criteria aligned with the value statement agreed to and understood.
  - Dependencies noted and resolution process for each in place
  - Spikes resolved.

- Sub-task
  - Contract changes documented
  - Component acceptance tests defined

No definition of “Done”

Having an explicit definition of done is important to keeping WIP low and finishing work.

Definition of Done

- Sub-task
  - Acceptance criteria met
  - Automated tests verified
  - Code reviewed
  - Merged to Trunk
  - Demoed to team
  - Deployed to production

- Story
  - PO Demo completed
  - Acceptance criteria met
  - All tasks "Done"
  - Deployed to production

Team Workflow

Assigning tasks for the sprint

Work should always be pulled by the next available team member. Assigning tasks results in each team member working in isolation on a task list instead of the team focusing on delivering the next high value item. It also means that people are less invested in the work other people are doing. New work should be started only after helping others complete work in progress.

Co-dependant releases

Multi-component release trains increase batch size and reduce delivered quality. Teams cannot improve efficiency if they are constantly waiting. Handle dependencies with code, do not manage them with process. If you need a person to coordinate releases, things are seriously broken.

Handoffs to other teams

If the normal flow of work requires waiting on another team then batch sizes increase and quality is reduced. Teams should be organized so they can deliver their work without coordinating outside the team.

Early story refining

As soon as we decide a story has been refined to where we can begin developing it, the information begins to age because we will never fully capture everything we decided on. The longer a story is “ready” before we being working, the less context we retain from the conversation. Warehoused stories age like milk. Limit the inventory and spend more time on delivering current work.

Manual test as a stage gate

In this context, a test is a repeatable, deterministic activity to verify the releasability of the system. There are manual activities related to exploration of edge cases and how usable the application is for the intended consumer, but these are not tests.

There should be no manual validation as a step before we deploy a change. This includes, but is not limited to manual acceptance testing, change advisory boards (CAB), and manual security testing.

Meaningless retrospectives

Retrospectives should be metrics driven. Improvement items should be treated as business features.

Hardening / Testing / Tech Debt Sprints

Just no. These are not real things. Sprints represent work that can be delivered to production.

Moving “resources” on and off teams to meet “demand”

Teams take time to grow, they cannot be “constructed”. Adding or removing anyone from a team lowers the team’s maturity and average problem space expertise. Changing too many people on a team reboots the team.

One delivery per sprint

Sprints are planning increments, not delivery increments. Plan what will be delivered daily during the sprint.

Skipping demo

If the team has nothing to demo, demo that. Never skip demo.

Committing to distant dates

Uncertainty increases with time. Distant deliverables need detailed analysis.

Not committing to dates

Commitments drive delivery. Commit to the next Minimum Viable Feature.

Velocity as a measure of productivity

Velocity is planning metric. “We can typically get this much done in this much time.” It’s an estimate of relative capacity for new work that tends to change over time and these changes don’t necessarily indicate a shift in productivity. It’s also an arbitrary measure that varies wildly between organizations, teams and products. There’s no credible means of translating it into a normalized figure that can be used for meaningful comparison.

By equating velocity with productivity there is created an incentive to optimize velocity at the expense of developing quality software.

CD Anti-Patterns

Work Breakdown

Issue	Description	Good Practice
Unclear requirements	Stories without testable acceptance criteria	Work should be defined with acceptance tests to improve clarity and enable developer driven testing.
Long development Time	Stories take too long to deliver to the end user	Use BDD to decompose work to testable acceptance criteria to find smaller deliverables that can be completed in less than 2 days.

Workflow Management

Issue	Description	Good Practice
Rubber band scope	Scope that keeps expanding over time	Use BDD to clearly define the scope of a story and never expand it after it begins.
Focusing on individual productivity	Attempting to manage a team by reporting the “productivity” of individual team members. This is the fastest way to destroy teamwork.	Measure team efficiency, effectiveness, and morale
Estimation based on resource assignment	Pre-allocating backlog items to the people based on skill and hoping that those people do not have life events.	The whole team should own the team’s work. Work should be pulled in priority sequence and the team should work daily to remove knowledge silos.
Meaningless retrospectives	Having a retrospective where the outcome does not results in team improvement items.	Focus the retrospective on the main constraints to daily delivery of value.
Skipping demo	No work that can be demoed was completed.	Demo the fact that no work is ready to demo
No definition of “Done” or “Ready”	Obvious	Make sure there are clear entry gates for “ready” and “done” and that the gates are applied without exception
One or fewer deliveries per sprint	The sprint results in one or fewer changes that are production ready	Sprints are planning increments, not delivery increments. Plan what will be delivered daily during the sprint. Uncertainty increases with time. Distant deliverables need detailed analysis.
Pre-assigned work	Assigning the list of tasks each person will do as part of sprint planning. This results in each team member working in isolation on a task list instead of the team focusing on delivering the next high value item.	The whole team should own the team’s work. Work should be pulled in priority sequence and the team should work daily to remove knowledge silos.

Teams

Issue	Description	Good Practice
Unstable Team Tenure	People are frequently moved between teams	Teams take time to grow. Adding or removing anyone from a team lowers the team’s maturity and average expertise in the solution. Be mindful of change management
Poor teamwork	Poor communication between team members due to time delays or “expert knowledge” silos	Make sure there is sufficient time overlap and that specific portions of the system are not assigned to individuals
Multi-team deploys	Requiring more than one team to deliver synchronously reduces the ability to respond to production issues in a timely manner and delays delivery of any feature to the speed of he slowest teams.	Make sure all dependencies between teams are handled in ways that allow teams to deploy independently in any sequence.

Testing Process

Issue	Description	Good Practice
Outsourced testing	Some or all of acceptance testing performed by a different team or an assigned subset of the product team.	Building in the quality feedback and continuously improving the same is the responsibility of the development team.
Manual testing	Using manual testing for functional acceptance testing.	Manual tests should only be used for things that cannot be automated. In addition, manual tests should not be blockers to delivery but should be asynchronous validations.

2.2 - Pipeline & Application Architecture

A guide to improving your delivery pipeline and application architecture for Continuous Delivery

This guide provides steps and best practices for improving your delivery pipeline and application architecture. Please review the CD Getting Started guide for context.

1. Build a Deployment Pipeline

The first step is to create a single, automated deployment pipeline to production. Human intervention should be limited to approving stage gates where necessary.

Entangled Architecture - Requires Remediation

Characteristics

No clear ownership of components or quality
Delayed quality signal
Difficult to implement Continuous Delivery

Common Entangled Practices

Team Structure: Feature teams focused on cross-cutting deliverables
Development Process: Long-lived feature branches
Branching: Team branches with daily integration to trunk
Testing: Inverted test pyramid common
Pipeline: Focus on establishing reliable build/deploy automation
Deploy Cadence / Risk: Extended delivery cadence, high risk

Entangled Improvement Plan

Find architectural boundaries to divide sub-systems between teams, creating product teams. This will realign to a tightly coupled architecture.

Tightly Coupled Architecture - Transitional

Characteristics

Changes in one part can affect other parts unexpectedly
Sub-assemblies assigned to product teams
Requires a more complex integration pipeline

Common Tightly Coupled Practices

Team Structure: Product teams focused on decoupling sub-systems
Development Process: Continuous integration
Branching: Trunk-Based Development
Testing: Developer Driven Testing
Pipeline: Working towards continuous delivery
Deploy Cadence / Risk: More frequent deliveries, lower risk

Tightly Coupled Improvement Plan

Extract independent domain services with well-defined APIs
Consider wrapping infrequently changed, poorly tested components in APIs

Loosely Coupled Architecture - Goal

Characteristics

Components delivered independently
Reduced complexity
Improved quality feedback loops
Relies on clean team separations and mature testing practices

Common Loosely Coupled Practices

Team Structure: Product teams maintain independent components
Development Process: Continuous integration
Branching: Trunk-Based Development
Testing: Developer Driven Testing
Pipeline: One or more independently deployable CD pipelines
Deploy Cadence / Risk: On-demand or immediate delivery, lowest risk

2. Stabilize the Quality Signal

After establishing a production pipeline, focus on improving the quality signal:

Remove flaky tests from the pipeline
Identify causes for test instability and take corrective action
Bias towards testing enough, but not over-testing
Track pipeline duration and set a quality gate for maximum duration

3. Continuous Improvement

Use the Theory of Constraints (TOC) to continuously improve your delivery process:

Identify the system constraint
Decide how to exploit the constraint
Subordinate everything else to the above decisions
Elevate the constraint
If a constraint is broken, return to step one

Common constraints include:

Resource Constraints: Limited capacity of people or environments
Policy Constraints: Policies or practices that impede flow

Title	Author
Accelerate	Forsgren, Humble, & Kim - 2018
Engineering the Digital Transformation	Gruver - 2019
A Practical Approach to Large-Scale Agile Development	Gruver et al - 2012
Theory of Constraints	Goldratt

3 - Metrics Overview

An overview of key metrics for measuring and improving Continuous Delivery performance

Metrics are crucial for organizational improvement. Without measurement, improvement attempts are aimless. This guide outlines key metrics for Continuous Delivery (CD) and Continuous Integration (CI).

CD Execution Metrics

These metrics measure our ability to reliably and sustainably deliver high-quality changes through frequent, small batches.

Throughput

Key Metrics

Stability

Key Metrics

CI Execution Metrics

Continuous Integration is the foundation of Continuous Delivery. These metrics focus on amplifying quality feedback.

Key Metrics

Integration Frequency Guidelines

Mob programming: Several times a day
Pair programming: Several times a day per pair
Individual tasks: Several times a day per developer

Workflow Management Metrics

These metrics help manage and optimize the overall development workflow.

Key Metrics

Metrics Usage Guide

Use metrics in offsetting groups
Focus improvement efforts on the group of metrics as a whole, not individual measures
Refer to the Metrics Cheat Sheet for a high-level view of key metrics, their intent, and appropriate usage

Remember

Metrics, like any tool, must be used correctly to drive the improvement we need. Focusing on a single metric can lead to unintended consequences and suboptimal outcomes.

3.1 - Metrics Cheat Sheet

Organizational Metrics

These metrics are important for teams and management to track the health of the delivery system

Metric	Meaning	Goal of Measuring	Guardrail Metrics
Integration/Merge Frequency	How frequently code changes are integrated to the trunk for testing	Reduce the size of change to improve quality and reduce risk	Defect Rates should not increase
Build Cycle Time	Total duration from commit to production delivery	Improve the ability to deliver changes to improve feedback and reduce MTTR	Defect Rates should not increase
Change Fail %	The % of production deploys that are reverted	Improve the upstream quality processes	Development Cycle Time should not increase
Code Inventory	Lines of code added or removed that have not been delivered to production	Reduce the amount of code inventory and move closer to Just In Time delivery.	Change Fail % & Defect Rate should not increase
Defect Rate	Number of defects created during a set interval	Improve the quality processes in the delivery flow	Delivery Frequency should not reduce
Development Cycle Time	Time from when a story is started until marked “done”	Reduce the size of work to improve the feedback from the end user on the value of the work and to improve the quality of the acceptance criteria and testing	Defect Rate should not increase
MTTR	The time from when customer impact begins until it is resolved	Improve the stability and resilience of both the application and the system of delivery	Quality should not decrease
Delivery Frequency	The frequency that changes are delivered to production	Reduce the size of delivered change, improve the feedback loop on quality and increase the speed of value delivery.	Defect Rates should not degrade
Work in Progress	The number of items in progress on the team relative to the size of the team	Reduce the number of items in progress so that the team can focus on completing work vs/ being busy.	Delivery frequency should not degrade

Team Metrics

These metrics should only be used by teams to inform decision making. They are ineffective for measuring quality, productivity, or delivery system health.

Metric	Meaning	Goal of Measuring	Issues with Metric
Code Coverage	The % of code that us executed by test code	Prevent unexpected reduction of code coverage. Find code that should be better tested	When coverage goals are set, can generate tests that meet the goals but are ineffective as tests.
Velocity/Throughput	The average amount of the backlog delivered during a sprint by the team. Used by the product team for planning. There is no such thing as good or bad velocity.

3.2 - Average Build Downtime

The average length of time between when a build breaks and when it is fixed.

What is the intended behavior?

Keep the pipelines always deployable by fixing broken builds as rapidly as possible. Broken builds are the highest priority since they prevent production fixes from being deployed in a safe, standard way.

How to improve it

Refactor to improve testability and modularity.
Improve tests to locate problems more rapidly.
Decrease the size of the component to reduce complexity.
Add automated alerts for broken builds.
Ensure the proper team practice is in place to support each other in solving the problem as a team.

How to game it

Re-build the previous version.
Remove tests that are failing.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Integration Frequency decreases as additional manual or automated process overhead is added before integration to trunk.

3.3 - Build Cycle Time

The time from code commit to production deploy. This is the minimum time changes can be applied to production. This is referenced as “hard lead time” in Accelerate

What is the intended behavior?

Reduce pipeline duration to improve MTTR and improve test efficiency to give the team more rapid feedback to any issues. Long build cycle times delay quality feedback and create more opportunity for defect penetration.

How to improve it

Identify areas of the build that can run concurrently.
Replace end to end tests in the pipeline with virtual services and move end to end testing to an asynchronous process.
Break down large services into smaller sub-domains that are easier and faster to build / test.
Add alerts to the pipeline if a maximum duration is exceeded to inform test refactoring priorities.

How to game it

Reduce the number of tests running or test types executed.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Defect rates increase if quality gates are skipped to reduce build time.

3.4 - Change Fail Rate

The percentage of changes that result in negative customer impact, or rollback.

changeFailRate = failedChangeCount / changeCount

What is the intended behavior?

Reduce the percentage of failed changes.

How to improve it

Release more, smaller changes to make quality steps more effective and reduce the impact of failure.
Identify root cause for each failure and improve the automated quality checks.

How to game it

Deploy fixes without recording the defect.
Create defect review meetings and re-classify defects as feature requests.
Re-deploy the latest working version to increase deploy count.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Delivery frequency can decrease if focus is placed on “zero defect” changes.
Defect rates can increase as reduced delivery frequency increases code change batch size and delivery risk.

References

“Accelerate” Ch2: Measuring Performance - Nicole Forsgren PhD, Jez Humble & Gene Kim

3.5 - Code Coverage

A measure of the amount of code that is executed by test code.

What is the intended behavior?

Inform the team of risky or complicated portions of the code that are not sufficiently covered by tests. Care should be taken not to confuse high coverage with good testing.

How to improve it

Write tests for code that SHOULD be covered but isn’t
Refactor the application to improve testability
Remove unreachable code
Delete pointless tests
Refactor tests to test behavior rather than implementation details

How to game it

Tests are written for code that receives no value from testing.
Test code is written without assertions.
Tests are written with meaningless assertions.

Example: The following test will result in 100% function, branch, and line coverage with no behavior tested.

/* Return the sum of two integers */
/* Return null if one of that parms is not an integer */
function addWholeNumbers(a, b) {

  if (a % 1 === 0 && b % 1 === 0) {
    return a + b; 
  } else {
    return null;
  }
}

it('Should not return null of both numbers are integers' () => {
  /*
  * This call will return 4, which is not null. 
  * Pass 
  */
  expect(addWholeNumbers(2, 2)).not.toBe(null);
  
  /*
  * This returns "22" because JS sees a string will helpfully concatenate them.
  * Pass
  */
  expect(addWholeNumbers(2, '2')).not.toBe(null);

  /* 
  * The function will never return the JS `NaN` constant 
  * Pass
  */  
  expect(addWholeNumbers(1.1, 0)).not.toBe(NaN);
})

The following is an example of test code with no assertions. This will also produce 100% code coverage reporting but does not test anything because there are no assertions to cause the test to fail.

it('Should not return null if both numbers are integers' () => {
  addWholeNumbers(2, 2);
  addWholeNumbers(2, '2');
  addWholeNumbers(1.1, 0);
})

Guardrail Metrics

Test coverage should never be used as a goal or an indicator of application health. Measure outcomes. If testing is poor, the following metrics will show poor results.

Defect Rates will increase as poor-quality tests are created to meet coverage targets that do not reliably catch defects.
Development Cycle Time will increase as more emphasis is placed on improper testing methods (manual functional testing, testing teams, etc.) to overcome the lack of reliable tests.

3.6 - Code Integration Frequency

The average number of production-ready pull requests a team closes per day, normalized by the number of developers on the team. On a team with 5 developers, healthy CI practice is at least 5 per day.

What is the intended behavior?

Increase the frequency of code integration
Reduce the size of each change
Improve code review processes
Remove unneeded processes
Improve quality feedback

How to improve it

Decompose code changes into smaller units to incrementally deliver features.
Use BDD to aid functional breakdown.
Use TDD to design more modular code that can be integrated more frequently.
USe feature flags, branch by abstraction, or other coding techniques to control the release of new features.

How to game it

Meaningless changes integrated to trunk.
Breaking changes integrated to trunk.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality decreases if testing is skipped.

Recommended Practices

3.7 - Code Inventory

The lines of code that have been changed but have not been delivered to production. This can be measured at several points in the delivery flow, starting with code not merged to trunk.

What is the intended behavior?

Reduce the size of individual changes and reduce the duration of branches to improve quality feedback. We also want to eliminate stale branches that represent risk of lost change or merge conflicts that result in additional manual steps that add risk.

How to improve it

Improve continuous integration behavior where changes are integrated to the trunk and verified multiple times per day.

How to game it

Use forks to hide changes.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality can decrease as quality steps are skipped or batch size increases.

3.8 - Defect Rate

Defect rates are the total number of defects by severity reported for a period of time.

Defect count / Time range

What is the intended behavior?

Use defect rates and trends to inform improvement of upstream quality processes.

Defect rates in production indicate how effective our overall quality process is. Defect rates in lower environments inform us of specific areas where quality process can be improved. The goal is to push detection closer to the developer.

How to improve it

Track trends over time and identify common issues for the defects Design test design changes that would reduce the time to detect defects.

How to game it

Mark defects as enhancement requests
Don’t track defects
Deploy changes that do not modify the application to improve the percentage

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Delivery frequency is reduced if too much emphasis is place on zero defects. This can be self-defeating as large change batches will contain more defects.

3.9 - Delivery Frequency

How frequently per day the team releases changes to production.

What is the intended behavior?

Small changes deployed very frequently to exercise the ability to fix production rapidly, reduce MTTR, increase quality, and reduce risk.

How to improve it

Reduce Development Cycle Time.
Remove handoffs to other teams.
Remove manual processes.
Improve testing and move quality ownership into the team.
Move hard dependencies to soft dependencies with feature flags and service virtualization.
Focus on Continuous Integration with small changes integrated to the trunk continuously.
Use Trunk Based Development to reduce the risk of lost changes and process overhead.

How to game it

Re-deploying the same artifact repeatedly.
Building new artifacts that contain no changes.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Change Fail Rate increases as focus shifts to speed instead of quality.
Quality decreases if steps are skipped in refining work for the sake of output.

3.10 - Development Cycle Time

The average time from starting work until release to production.

What is the intended behavior?

Reduce the time it takes to deliver refined work to production to mitigate the effects of priorities changing and get rapid feedback on quality.

How to improve it

Decompose work so it can be delivered in smaller increments and by more team members.
Identify and remove process waste, handoffs, and delays in the construction process.
Improve test design.
Automate and standardize the build and deploy pipeline.

How to game it

Move things to “Done” status that are not in production.
Move items directly from “Backlog” to “Done” after deploying to production.
Split work into functional tasks that should be considered part of development (development task, testing task, etc.).

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality decreases if quality processes are skipped.
Standard deviation of the control chart can show issues being closed too rapidly.

References

“Accelerate” Ch2: Measuring Performance - Nicole Forsgren PhD, Jez Humble & Gene Kim

3.11 - Lead Time

This shows the average time it takes for a new request to be delivered. This is measured from the creation date to release date for each unit of work and includes Development Cycle Time.

What is the intended behavior?

Identify over utilized teams, backlogs that need more Product Owner attention, or in conjunction with velocity to help teams optimize their processes.

How to improve it

Relentlessly remove old items from the backlog. Improve team processes to reduce Development Cycle Time. Use Innersourcing to allow other teams to help when surges of work arrive. Re-assign, carefully, some components to another team to scale delivery.

How to game it

Requests can be tracked in spreadsheet or other locations and then added to the backlog just before development. This can be identified by decreased customer satisfaction.
Reduce feature refining rigour.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality is reduced if less time is spent refining and defining testable requirements.

References

InnerSourcing.

3.12 - MTTR

Mean Time to Repair is the average time between when a incidents is detected and when it is resolved.

“Software delivery performance is a combination of three metrics: lead time, release frequency, and MTTR. Change fail rate is not included, though it is highly correlated.”

“Accelerate” uses Lead Time for Development Cycle Time.

What is the intended behavior?

Improve the ability to more rapidly resolve system instability and service outages.

How to improve it

Make sure the pipeline alway deployable.
Keep build cycle time short to allow roll-forward.
Implement feature flags for larger feature changes to allow the them to be deactivated without re-deploying.
Identify stability issues and prioritize them in the backlog.

How to game it

Updating support incidents to “closed” before service is restored.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality decreases if issues re-occur due to lack of improving pipeline quality gates.

References

“Accelerate” Ch2: Measuring Performance - Nicole Forsgren PhD, Jez Humble & Gene Kim

3.13 - Quality

Quality is measured as the percentage of finished work that is unused, unstable, unavailable, or defective according to the end user.

What is the intended behavior?

Continuously improve the quality steps in the construction process, reduce the size of delivered change, and increase the speed of feedback from the end user. Improving this cycle improves roadmap decisions.

How to improve it

Add automated checks to the pipeline to prevent re-occurrence of root causes.
Only begin new work with testable acceptance criteria.
Accelerate feedback loops at every step to alert to quality, performance, or availability issues.

How to game it

Log defects as new features

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

[Delivery frequency may be reduced if more manual quality steps are added
Build cycle time may increase as additional tests are added to the pipeline
Lead time can increase as more time is spent on business analysis

3.14 - Velocity / Throughput

The average amount of the backlog delivered during a sprint by the team. Used by the product team for planning. There is no such thing as good or bad velocity. This is commonly misunderstood to be a productivity metric. It is not.

What is the intended behavior?

After a team stabilizes, the standard deviation should be low. This will enable realistic planning of future deliverables based on relative complexity. Find ways to increase this over time by reducing waste, improving planning, and focusing on teamwork.

How to improve it

Reduce story size so they are easier to understand and more predictable.
Minimize hard dependencies. Each hard dependency reduces the odds of on-time delivery by 50%.
Swarm stories by decomposing them into tasks that can be executed in parallel so that the team is working as a unit to deliver faster.

How to game it

Cherry pick easy, low priority items.
Increase story points
Skip quality steps.
Prematurely sign-off work only to have defects reported later.

Guardrail Metrics

Metrics to use in combination with this metric to prevent unintended consequences.

Quality defect ratio goes up as more defects are reported.
WIP increases as teams start more work to look more busy.

References

Harvard Business Review: Six Myths of Product Development Scrum.org: Velocity

3.15 - WIP

Work in Progress (WIP) is the total work that has been started but not completed. This includes all work, defects, tasks, stories, etc.

What is the intended behavior?

Focus the team on finishing work and delivering it rather than switching between tasks but not finishing them.

How to improve it

The team should focus on finishing items closest to being ready for production.
- Prioritize code review over starting new work
- Prioritize pairing to solve a problem over starting new work
Set and do not exceed WIP limits for the team.
- Total WIP should not exceed team size.
Keep the Kanban board visible at all times to monitor WIP

How to game it

Update incomplete work to “done” before it is delivered to production.
Create stories for each step of development instead of for value to be delivered.
Do not update work to “in progress” when working on it.

4 - Team Workflow

Working together as a team is how we move things from “In Progress” to “Done”, as rapidly as possible in value sequence. It’s important for minimizing WIP that the team looks at the backlog as the team’s work and does not pre-assign work to individuals.

Make Work Visible

To create and maintain the flow of delivery, we need the following:

Definition of Done
A way to visualize the workflow, virtual or physical, with a prioritized backlog that has not been refined too far in the future.

Plan Work

Unplanned work is anything coming into the backlog that has not been committed to, or prioritized. This can include feature requests, support tickets, etc.

Common struggles teams face with unplanned work can be:

Do Work

Completed work meets the Definition of Ready when work begins, the Definition of Done when work is delivered, and can be completed in less than two days.

Process smells identified for completing work include:

Context switching
Ineffective demos that prevent early feedback
Multiple teams own pieces of the process (Build, Test, Deploy, etc.)
Status and visibility of work is unclear
Siloed work on the team

Improve Work

In order to plan and complete work effectively, there must be an improvement process in place. The improvement process is centered around feedback loops.

Challenges associated with the improvement process:

Infrequent or nonexistent demos
Infrequent or unactionable retrospectives

Measuring Your Workflow

A good measure to implement in your team’s workflow is WIP. Limiting work in progress can help reduce constraints in your workflow.

Development cycle time is a key measure of success when trying to optimize and automate your team’s workflow.

4.1 - Code Review

Recommended Practices

Small changes allow for faster code review and enhance the feedback loops.
Everyone on the team is capable of performing code review.
Code reviews are the second highest priority for a team behind blocked issues and ahead of WIP.

Tips

Automate coding standards instead of reviewing for them.
Focus the review on the tests and code readability. The tests should meet the acceptance criteria agreed upon by the team.
Keep pull requests small. Look into Work Decomposition for guidance.
Use synchronous code review to remove communication delays.
As the person being reviewed, remember the 10 Commandments of Code Review
- Thou shalt not take it personally
- Thou shalt not marry thy code
- Thou shalt consider all feedback
- Thou shalt articulate thy rationale
- Thou shalt be willing to compromise
- Thou shalt contribute to others’ code reviews
- Thou shalt treat submitters how thou would like to be treated
- Thou shalt not be intimidated by the number of comments
- Thou shalt not repeat the same mistakes
- Thou shalt embrace the nits

References

The 10 Commandments of Navigating Code Reviews

4.2 - Source Management

Use Trunk Based Development

All branches originate from the trunk
All branches merge into the trunk
Branches, if used, are very short-lived
- The smaller the PR, the easier it is to identify issues. The smaller the change, the less risk associated with that change.
The trunk can always be built and deployed without breaking production.
- When needed, use techniques such as Branch by Abstraction or feature flags to ensure backward compatibility.
The change includes all appropriate automated tests to validate that the change is deliverable.
- Unit tests
- Functional test
- Contract tests
- etc.

Branching vs. Forking

Use the right pattern for the right reason. Branches are the primary flow for CI and are critical for allowing the team to have visibility to work in progress that the team is responsible for completing. Forks are how proposed, unplanned changes are made from outside the team to ensure quality control and to reduce confusion from unexpected branches.

Use forks for:
- Contribution from a contributor outside the team to ensure proper quality controls are followed and to prevent cluttering up the team’s repository with external contributions that may be abandoned.
Use branches for:
- All internal work to keep that work visible to the team.

Tips

Story Slicing helps break development work into more easily consumable, testable chunks.
You don’t have to wait for a story/feature to be complete as long as you have tested that won’t break production.
Pull requests should be small and should be prioritized over starting any new development.

Common Issues

Trunk-based development and continuous integration often take workflow adjustments on the team. The main reasons teams struggle with CI are:

Test architecture
Work that is too big and/or lacks proper refinement
Issues with source code ownership (one repo owned by more than one team)
Workflow management within the team

References

FAQ

4.3 - Source Ownership

Delivery and quality are significantly impacted by teams sharing ownership of the source code. This adds process overhead to ensure everyone knows what’s happening in the code and dilutes quality responsibility.

Recommended Practices

Utilize automated pipelines to help validate that the product remains releasable before and after any code is merged to the trunk.
Limit ownership of a repository to a single “Two Pizza Team” that decides what code to merge.
Give all developers on the team access to merge code to the trunk. Give read access to everyone else.
Use an innersourcing policy so that people outside of the team know how to contribute to your product.

Tips

Teams looking to create an InnerSourcing policy can start by applying their Definition of Done to any external contributions.
No contributions will bypass the team’s quality process.
Automated pipelines validate that PRs from internal and external contributors conform to quality standards.
All team members have access to merge to the trunk.
InnerSourcing and/or external contributions fork the repository they do not branch.
Teams no larger than 10 people, including all roles.

References

See the CD Common Problems page to learn about team structure problems and many others to avoid in your journey.

4.4 - Definition of Done

Is it DONE, DONE DONE, or is it DONE DONE DONE?

All teams need a Definition of Done. The Definition of Done is an agreement made between the team that a unit of work isn’t complete without meeting certain conditions.

Recommended Practices

We use the Definition of Done most commonly for user stories. The team and product owner must agree that the story has met all criteria for it to be considered done.

A definition of done can include anything a team cares about, but must include these criteria:

All tests passed
All acceptance criteria have been met
Code reviewed by team member and merged to trunk
Demoed to team/stakeholders as close to prod as possible
All code associated with the acceptance criteria deployed to production

Once your team has identified all criteria that a unit of work needs to be considered done, you must hold yourself accountable to your Definition of Done.

Value

As a development team, we want to understand our team’s definition of done, so that we can ensure a unit of work is meeting the criteria acceptable for it to be delivered to our customers.

Acceptance Criteria

Identify what your team cares about as a Definition of Done.
Use your Definition of Done as a tool to ensure quality stories are being released into production.
Revisit and evaluate your Definition of Done.

4.5 - Retrospectives

Retrospectives are critical for teams that are serious about continuous improvement. They allow the team an opportunity to take a moment to inspect and adapt how they work. The importance of this cannot be overstated. Entropy is always at work, so we must choose to change so that change doesn’t choose us.

Recommended Practices

Successful Retrospectives

A successful retrospective has five parts:

Go over the mission of the team and the purpose of retrospective.
The team owns where they are right now using Key Performance Indicators (KPIs) they’ve agreed on as a team.
The team identifies whether experiments they are running are working or not.
1. If an experiment is working, the team works to standardize the changes as part of daily work.
2. If an experiment is not working, the team either adjusts the experiment based on feedback or abandons the experiment to try something else.
3. Both are totally acceptable and expected results. In either case, the learnings should be shared publicly so that anyone in the organization can benefit from them.
The team determines whether they are working towards the right goal and whether the experiments they are working on are moving them towards it.
1. If answer to either of the questions is “No.” then the team adjusts as necessary.
Open and honest conversation about wins and opportunities throughout.

Example Retro Outline

Go over the team’s mission statement and the purpose of retrospective (2 min)
Go over the team’s Key Performance Indicators and make sure everyone knows where we are (5-10 min)
Go over what experiments the team decided to run and what we expected to happen (5 minutes)
What did we learn this week? (10-15 minutes)
- Should we modify any team documents? (2 minutes)
What went well this week? (5-10 minutes)
What sinks our battleship? (5-10 minutes)
Are we working towards the right things? What are we going to try this week? How will we measure it? (10-15 minutes)

Organizing Retros

There are some important things to consider when scheduling a retrospective.

Ensure Psychological Safety
1. If the team feels like they can’t speak openly on honestly, they won’t.
2. Any issues with psychological safety must be addressed before any real progress can be made.
Make them Regular
1. Agree to a time, day, frequency as a team to meet.
Include everyone responsible for delivery
1. Ideally this will include business colleagues (PO), operations, testing, and developers involved in the process.
2. If there are more than 10-12 people in the meeting, your team is probably too big.
Co-location concerns
1. If the team is split across timezones, then accommodations should be made so that the team can effectively communicate.
2. If the time separation is extreme (i.e. India/US), then in may be better to have each hemisphere retro separately and compare notes asynchronously.
3. Schedule meetings to be inclusive of the most remote. Don’t schedule rooms with bad audio/no video if there are remote participants. Have it via a remote meeting solution (Zoom, etc.)

Tips

Create cards on whatever board you are using to track your work for action items that come out of retrospective
Treating team improvement as a deliverable will help the team treat them more seriously.
Do not work on more than a few actions/experiments at a time
If the retrospective has remote attendees, ask that everyone turn on their cameras so that the team can look everyone in the eyes.
Outcome over output: If the format of retro isn’t helping you improve, change it or seek help on how to make it better. The teams that cancel retro are almost always the teams that need it most.

Known Impediments

“Typical” Retrospectives

Normally, a scrum-like retro involves 3 questions about the previous iteration:

What went well?
What could we improve?
What are some actions we can take?

This is pretty open ended format that is very simple to go over in a training class. The challenge is the nuance of facilitating the format.

While it can be effective, what we have found is that this particular format can actually stunt the improvement of many teams when used incorrectly. And since the format is so open ended, that’s extremely easy to do.

Retrospectives that follow the above format are something that many teams struggle with. They can…

Feel Ineffective, where the same issues crop up again and again without resolution.
End with a million action items that never get done or tracked.
“Improve” things that don’t actually move the needle on team productivity or happiness
End up as a gripe session where there are no actionable improvements identified.

This is such a waste of time. I'd rather be coding...

It can be extremely frustrating to team members when it feels like retrospectives are just another meeting that they have to go to. If that ever becomes the case, that should signal a huge red flag! Something is wrong!

Psychological Safety

If the team feels like they are going to be judged, punished, or generally negatively affected by participating in retrospective, then they are going to keep their opinions to themselves. Without the safety to have their voices heard or take moderate, hypothesis driven, risk, the team will not improve as fast as they can (if at all).

However, if leadership feels like they are being disrespected, they aren’t being listened to/considered, or feel like they are going to be negatively impacted by the outcomes of the team they are more likely to restrain a team from their full potential.

It’s a delicate balancing act that takes trust, respect, and empathy from all sides to come to win-win solutions.

4.6 - Unplanned Work

Unplanned work is any interruption that prevents us from finishing something as planned. There are times when unplanned work is necessary and understandable, but you should be wary of increased risk, uncertainty, and reduced predictability.

Cost of Delay

Work that has not been prioritized is work that has not been planned. When there are competing features, requests, support tickets, etc., it can be difficult to prioritize what should come first.

Most of the time, teams prioritize based on what the customer wants, what the stakeholders want, etc.

Cost of Delay makes it easier to decide priorities based on value and urgency. How much money are we costing (or saving) the organization if Feature A is delivered over Feature B?

Capacity Planning

The most common pitfall that keeps teams from delivering work is unrealistic capacity planning.

Teams that plan for 100% of their capacity are unable to fit unknowns into their cadence, whether that be unplanned work, spikes, or continuous experimentation and learning.

Planned capacity should fall between 60% and 80% of a team’s max capacity.

Tips

Plan for unplanned work. Pay attention to the patterns that present themselves, and analyze what kind of unplanned work is making it to your team’s backlog.
Make work visible, planned and unplanned, and categorize unplanned work based on value and urgency.

4.7 - Visualizing Workflow

Making work visible to ourselves, as well as our stakeholders is imperative in our workflow management process. People are visual beings. Workflows give everyone a sense of ownership and accountability.

Make use of a Kanban board

Kanban boards help you to make work and problems visible and improve workflow efficiency.

Kanban boards are a recommended practice for all agile development methods. Kanban signals your availability to do work. When an individual pulls something from the backlog into progress, they are committing to being available to do the work the card represents.

With Kanban boards, your team knows who’s working on what, what the status of that work is, and how long that work has been in progress.

Building a Kanban Board

To make a Kanban board you need to create lanes on your board that represent your team’s workflow. Adding work in progress (WIP) limits to swim-lanes will enhance the visibility of your team’s workflow.

The team only works on cards that are in the “Ready to Start” lane and team members always pick from the top. No “Cherry Picking”.

The following is a good starting point for most teams.

Backlog
Ready to Start
Development
Ready to Review
Blocked
Done

Tips

Track everything:

Stories, tasks, spikes, etc.
Improvement items
Training development
Extra meetings

Work is work, and without visibility to all of the team’s work it’s impossible to identify and reduce the waste created by unexpected work.

Bring visibility to dependencies across teams, to help people anticipate what’s headed their way, and prevent delays from unknowns and invisible work.

References

Making Work Visible - Dominica DeGrandis

4.8 - Work in Progress

Why Limit WIP?

Work in Progress is defined as work that has started but is not yet finished. Limiting WIP helps teams reduce context switching, find workflow issues, and keep teams focused on collaboration and finishing work.

How do we limit WIP?

Start with one lane on your board.
Set your WIP limit to N+2 (“N” being the number of people contributing to that lane)
Continue setting WIP lower.
Once the WIP limit is reached, no more cards can enter that lane until one exits.

Capacity Utilization

There is a direct correlation between WIP and capacity utilization. Attempting to load people and resources to 100% capacity utilization creates wait times. Unpredictable events equal variability, which equals capacity overload. The more individuals and resources used, the higher the cost and risk.

In order to lessen work in progress, be aggressive in prioritization, push back when necessary, and set hard WIP limits. Select a WIP limit that is doable but challenges you to say no some of the time.

Conflicting Priorities

When we start a new task before finishing an older task, our work in progress goes up and things take longer. Business value that could have been realized sooner gets delayed because of too much WIP.

Be wary of falling back into the old habit of starting everything because of the pressure to say yes to everything.

Look at priority ways of working:

Assigned priority
Cost of delay
First-in, first-out

Tips

Swarming Stories

Having more than one person work on a task at the same time avoids situations where team understanding is mostly limited to a subset of what’s being built. With multiple people involved early, there is less chance that rework will be needed later.

By having more than one developer working on a task, you are getting a real-time code review.

Story assignment

Visually distinguish important information.

Who’s working on what?
Has this work been in progress for too long?
Is this work blocked from progressing?
Have we reached our WIP limit?

References

Making Work Visible - Dominica DeGrandis

5 - Designing Tests for CD

There are common patterns to show how much of each kind of test is generally recommended. The most used are the Test Pyramid and the Test Trophy. Both are trying to communicate the same thing: design a test suite that is fast, gives you confidence, and is not more expensive to maintain than the value it brings.

Testing Principles

Balance cost and confidence
Move failure detection as close to the developer as possible
Increase the speed of feedback
- CI to take less than 10 minutes.

Recommended Test Pattern

Most of the tests are integration tests and emphasize maximizing deterministic test coverage in process with the development cycle, so developers can find errors sooner. E2E & functional tests should primarily focus on happy/critical path and tests that absolutely require a browser/app.

When executing continuous delivery, test code is a first class citizen that requires as much design and maintenance as production code. Flakey tests undermine confidence and should be terminated with extreme prejudice.

Testing Matrix

Feature	Static	Unit	Integration	Functional	Visual Regression	Contract	E2E
Deterministic	Yes	Yes	Yes	Yes	Yes	No	No
PR Verify, Trunk Verify	Yes	Yes	Yes	Yes	Yes	No	No
Break Build	Yes	Yes	Yes	Yes	Yes	No	No
Test Doubles	Yes	Yes	Yes	Yes	Yes	See Definition	No
Network Access	No	No	localhost only	localhost only	No	Yes	Yes
File System Access	No	No	No	No	No	No	Yes
Database	No	No	localhost only	localhost only	No	Yes	Yes

Testing Anti-patterns

“Ice cream cone testing” is the anti-pattern where the most expensive, fragile, non-deterministic tests are prioritized over faster and less expensive deterministic tests because it “feels” right.

Google Test Blog: Just Say No to More End-to-End Tests

Testing Best Practices

General testing best practices are documented here. Best practices specific to test types are documented within each test type page.

Test Pattern Resources

5.1 - Testing Terms Glossary

Testing terms and they are notoriously overloaded. If you ask 3 people what integration testing means you will get 4 different answers. This ambiguity within an organization slows down the engineering process as the lack of ubiquitous language causes communication errors. For us to help each other improve our quality processes, it is important that we align on a common language. In doing so, we understand that many may not agree 100% on the definitions we align to. That is ok. It is more important to be aligned to consensus than to be 100% in agreement. We’ll iterate and adjust as needed.

Note: Our definitions are based on the following sources:

Testing Categories by Martin Fowler
The Practical Test Pyramid by Ham Vocke
xUnit Test Patterns * Refactoring Test Code by Gerard Meszaros

Glossary

Deterministic Test

A deterministic test is any test that always returns the same results for the same beginning state and action. Deterministic tests should always be able to run in any sequence or in parallel. Only deterministic tests should be executed in a CI build or automatically block delivery during CD.

Non-deterministic Test

A non-deterministic test is any test that may fail for reasons unrelated to adherence to specification. Reasons for this could include network instability, availability of external dependencies, state management issues, etc.

Static Test

A static test is a test that evaluates non-running code against rules for known good practices to check for security, structure, or practice issues.

Unit Test

Unit tests are deterministic tests that exercise a discrete unit of the application, such as a function, method, or UI component, in isolation to determine whether it behaves as expected.

Integration Test

An integration test is a deterministic test to verify how the unit under test interacts with other units without directly accessing external sub-systems. For the purposes of clarity, “integration test” is not a test that broadly integrates multiple sub-systems. That is an E2E test.

Contract Test

A contract test is used to validate the test doubles used in a network integration test. Contract tests are run against the live external sub-system and exercises the portion of the code that interfaces to the sub-system. Because of this, they are non-deterministic tests and should not break the build, but should trigger work to review why they failed and potentially correct the contract.

A contact test validates contract format, not specific data.

Functional Test

A functional test is a deterministic test that verifies that all modules of a sub-system are working together. They should avoid integrating with other sub-systems as this tends to reduce determinism. Instead, test doubles are preferred. Examples could include testing the behavior of a user interface through the UI or testing the business logic of individual services through the API.

End to End Test

End to end tests are typically non-deterministic tests that validate the software system along with its integration with external interfaces. The purpose of end-to-end Test is to exercise a complete production-like scenario. Along with the software system, it also validates batch/data processing from other upstream/downstream systems. Hence, the name “End-to-End”. End to End Testing is usually executed after functional testing. It uses actual production like data and test environment to simulate real-time settings.

Customer Experience Alarms

Customer Experience Alarms are a type of active alarm. It is a piece of software that sends requests to your system much like a user would. We use it to test the happy-path of critical customer workflows. These requests happen every minute (ideally, but can be as long as every 5 minutes). If they fail to work, or fail to run, we emit metrics that cause alerts. We run these in all of our environments, not just production, to ensure that they work and we catch errors early.

Test Doubles

Test doubles are one of the main concepts we use to create fast, independent, deterministic and reliable tests. Similar to the way Hollywood uses a _stunt double* to film dangerous scenes in a movie to avoid the costly risk a high paid actor gets hurt, we use a test double in early test stages to avoid the speed and dollar cost of using the piece of software the test double is standing in for. We also use test doubles to force certain conditions or states of the application we want to test. Test doubles can be used in any stage of testing but in general, they are heavily used during the initial testing stages in our CD pipeline and used much less in the later stages. There are many different kinds of test doubles such as stubs, mocks, spies, etc.

5.2 - E2E Testing

Understanding and implementing End-to-End (E2E) testing in software development

End-to-end tests validate the entire software system, including its integration with external interfaces. They exercise complete production-like scenarios, typically executed after functional testing.

Types of E2E Tests

Vertical E2E Tests

Target features under the control of a single team. Examples:

Favoriting an item and persisting across refresh
Creating a new saved list and adding items to it

Horizontal E2E Tests

Span multiple teams. Example:

Going from homepage through checkout (involves homepage, item page, cart, and checkout teams)

Note

Due to their complexity, horizontal tests are unsuitable for blocking release pipelines.

Recommended Best Practices

E2E tests should be the least used due to their cost in run time and in maintenance required.

Focus on happy-path validation of business flows
E2E tests can fail for reasons unrelated to the coding issues. Capture the frequency and cause of failures so that efforts can be made to make them more stable.
Vertical E2E tests should be maintained by the team at the start of the flow and versioned with the component (UI or service).
CD pipelines should be optimized for the rapid recovery of production issues. Therefore, horizontal E2E tests should not be used to block delivery due to their size and relative failure surface area.
A team may choose to run vertical E2E in their pipeline to block delivery, but efforts must be made to decrease false positives to make this valuable.

Alternate Terms

“Integration test” and “end-to-end test” are often used interchangeably.

Resources

Example

@Test(priority = 1, dependsOnMethods = { "navigate" })
@Parameters({ "validUserId" })
public void verifyValidUserId(@Optional(TestConstants.userId) String validUserId) throws Exception {
    // Valid UserId Test

    // Act
    homePage.getUserData(validUserId);
    TestUtil.explicitWait(wait, By.xpath(TestConstants.NAME_XPATH));
    
    // Assert
    Assert.assertEquals(homePage.getName(), TestConstants.NAME, TestConstants.NAME_CONFIRM);
    Assert.assertEquals(homePage.getManagerName(), TestConstants.MANAGER_NAME,
        TestConstants.MANAGER_NAME_CONFIRM);
    Assert.assertEquals(homePage.getVpName(), TestConstants.VP_NAME, TestConstants.VP_NAME_CONFIRM);
    Assert.assertEquals(homePage.getOrgName(), TestConstants.ORG_NAME, TestConstants.ORG_NAME_CONFIRM);
    Assert.assertEquals(homePage.getDirName(), TestConstants.DIR_NAME, TestConstants.DIR_NAME_CONFIRM);
    Assert.assertEquals(homePage.getCcName(), TestConstants.CC_NAME, TestConstants.CC_NAME_CONFIRM);
}
  

5.3 - Functional Testing

Understanding and implementing Functional Testing in software development

Functional testing is a deterministic test that verifies all modules of a sub-system are working together. It avoids integrating with other sub-systems, preferring test doubles instead.

Overview

Functional testing verifies a system’s specification and fundamental requirements systematically and deterministically. It introduces an actor (typically a user or service consumer) and validates the ingress and egress of that actor within specific consumer environments.

Key Points

Covers broad-spectrum behavioral tests (UI interactions, presentation-logic, business-logic)
Side-effects are mocked and don’t cross boundaries outside the system’s control
Differs from E2E tests which have no mocks

Recommended Best Practices

Write tests from the perspective of an “actor” (user interacting with UI or service interacting with API)
Avoid real I/O to reduce flakiness and ensure deterministic side-effects
Use test doubles when the system under test needs to interact with an out-of-context sub-system

Alternate Terms

Component test

Resources

Examples

🚧 Under Construction 🚧

Recommended Tooling

Platform	Tools
Android	Google Truth/JUnit 5, Android Espresso
iOS	XCTest, XCUITest
Web	Testcafe
Java BE	TestNG, JUnit5
JS/node BE	Framework: jest Assertion & Mocking: expect (jest), supertest, nock, apollo Code Coverage: istanbul/nyc

5.4 - Test Doubles

Understanding and implementing Test Doubles in software testing

Test doubles are used to create fast, independent, deterministic, and reliable tests. They stand in for real components, similar to how stunt doubles are used in movies.

Types of Test Doubles

Key Concepts

Test Double: Generic term for any production object replacement in testing
Dummy: Passed around but never used; fills parameter lists
Fake: Has a working implementation, but not suitable for production
Stub: Provides canned answers to calls made during the test
Spy: A stub that records information about how it was called
Mock: Pre-programmed with expectations, forming a specification of expected calls

Resources

Example

@Before
public void init() throws Exception {
    userService = Mockito.spy(userService);
    ObjectMapper mapper = new ObjectMapper();
    spyData = mapper.readValue(new File(TestConstants.DATA_FILE_ROOT + "user_spy.json"), User.class);
    Mockito.doReturn(spyData).when(userService).getUserInfo(TestConstants.userId);
}

@Test
public void verifySpyUserDetails() throws Exception {
    User user = userService.getUserInfo(TestConstants.userId);
    verify(userService).getUserInfo(TestConstants.userId);
    verify(userService, times(1)).getUserInfo(TestConstants.userId);

    Assert.assertEquals(spyData.getManager(), user.getManager());
    Assert.assertEquals(spyData.getVp(), user.getVp());
    Assert.assertEquals(spyData.getOrganization(), user.getOrganization());
    Assert.assertEquals(spyData.getDirector(), user.getDirector());
}

@After
public void cleanUp() {
    reset(userService);
}
  

Recommended Frameworks

Platform Independent Mocking Frameworks

Framework	Reasoning
JSON-Server	Simple, great for scaffolding; Follows REST conventions; Stateful
Mountebank	Allows for more than just HTTP (multi-protocol); Simple to use and configure; Large language support

GraphQL

Framework	Reasoning
GraphQL-Faker	Supports proxying existing GraphQL APIs; Simple GraphQL directive-based data mocking; Uses faker.js under the hood
GraphQL-Tools	Built-in utilities for mocking collections (MockList); Great documentation and interoperability with existing GraphQL (NodeJS) solutions

Platform Specific

Javascript

Framework	Reasoning
expect(jest)	For all generic assertions/mocking
jest-dom	For DOM assertions
supertest	For in-process test a http server
nock	For http server endpoint assertion/mocking with NodeJS

Note

For FE mocking, consider other frameworks as necessary, such as msw or mirage

Android

Framework	Reasoning
MockK (Kotlin projects)	Provides a common when this →then that mocking API in an Idiomatic Kotlin DSL; Built-in support for mocking top-level functions, extensions, static objects; Detailed documentation with examples
MockWebServer	Process local mock server; Embedded in tests, no separate mock execution; Simplistic but powerful API that can support state

iOS

iOS Approach

For iOS, we prefer using Apple test frameworks with homegrown solutions on top. This approach helps manage rapid API changes and reduces dependency on potentially discontinued third-party solutions.

Java (BE)

Framework	Reasoning
Powermock	Superset of Mockito; Provides Static mocking functionality
Mockito	Standard mocking tool; Has annotations for easy creation of many mocks at test construction

5.5 - Testing Best Practices

General

Recommendation	Benefits Gained
Use case-centric tests	Lower cost to maintain, confidence
TDD & BDD	Lower cost to maintain, confidence, stability
Naming conventions	Time to develop, lower cost to maintain
Testing your tests	Lower cost to maintain, confidence, stability
Follow test-type specific recommendations, shifting left on testing	Lower cost to maintain, faster speed to execute, less time to develop, confidence, stability

Use Case Coverage

One of the main points behind testing is to be able to code with confidence. Code coverage is one way developers have traditionally used to represent how confident they feel about working on a given code base. That said, how much confidence is needed will likely vary by team and the type of application being tested. E.g. if working on a life saving med tech piece of software, you probably want all of the confidence in the world. The following discusses how code coverage, if misused, can be misleading and create a false sense of confidence in the code being worked on and as a result, hurt quality. Recommendations on how to manage code coverage in a constructive way will be presented, along with concrete approaches on how to implement them.

In simple terms, coverage refers to a measurement of how much of your code is executed while tests are running. As such, it’s entirely possible achieve 100% coverage by running through your code without really testing for anything, which is what opens the door for coverage having the potential of hurting quality if you don’t follow best practices around it. A recommended practice is to look at coverage from the perspective of the set of valid use cases supported by your code. For this, you would follow an approach similar to what follows:

Start writing code and writing tests to cover for the use cases you’re supporting with your code.
Refine this by going over the tests and making sure valid edge cases and alternative scenarios are covered as well.
When done, look at your code’s coverage report and identify gaps in your testing
For each gap, decide if the benefit of covering it (odds of it failing and impact if it does) outweighs the cost (how complicated / time consuming would it be to cover it)
Write more tests where appropriate

This practices shifts the value of coverage from being a representation of your code’s quality to it being a tool for finding untested parts of your code. When looking at coverage through this lens, you might also uncover parts of the code with low coverage because it’s not supporting a valid use case. We recommend tests are not written for this, instead this code should be removed from the code base if at all possible.

You might ask yourself “How do I know I have good coverage? What’s the magic number?”. We believe there’s no magic number, as it’ll all depend on your teams’ needs. If you are writing tests for the use cases you build into your application, your team feels very confident when modifying the code base, and you’re post-production error rate is very low, your coverage is probably fine, whatever the numbers say. In the end, forcing a coverage percentage is known to have the potential of hurting your quality. By chasing after every single code path, you can very well end up missing the use cases that if gone wrong, will hurt the most. Another consideration is the false sense of confidence you can get by high coverage numbers obtained by “gaming the system”, or as Martin Fowler said, “The trouble is that high coverage numbers are too easy to reach with low quality testing” (Fowler, 2012). We do recognize there is such a thing as too little coverage. If your coverage is very low (e.g. < 50%) there might be something off, like having a ton of unnecessary code you might want to get rid of, or your tests just not hitting all the critical use cases in your application. There are methods you can employ to make sure there are no instances of “gaming the system” in your test code. One of these is to create linting rules that look for these practices and fail the build when it finds them. We recommend using plugins like eslint-plugin-jest to make sure things like not expecting (asserting) or disabling of tests cause the build to break.

{
  "rules": {
    "jest/no-disabled-tests": "warn",
    "jest/expect-expect": "error",
    "jest/no-commented-out-tests": "error",
    "jest/valid-describe": "warn",
    "jest/valid-expect": "error"
  }
}

Another recommendation when managing your code coverage is to track when it goes down. Generally it shouldn’t and if / when it does, it should be explainable and trigger a build failure. Along this same line, raising the bar whenever coverage is increased is a good practice as it ensures the level of coverage in other areas is maintained as they were. We recommend automating this so that whenever your coverage percentage increases, so do your minimum thresholds. Once you have reached a certain level of coverage through the methods discussed above (e.g. covering for use cases, taking care of valid edge cases when appropriate, etc) we don’t recommend you actively work on increasing your code coverage percentages. Instead, the way we recommend coverage to go up is as a side effect of building good software. This means that, as you increase your delivery frequency while monitoring your key stability metrics (e.g post-production defects, performance or service degradations, etc) you should see your code coverage increase.

Test-First Approach: BDD and TDD

Defining tests prior to writing code is the best way to lock in behavior and produce clean code. BDD and TDD are complementary processes to accomplish this goal and we recommend teams use both to first uncover requirements (BDD) and then do development against these requirements (TDD).

BDD

Behavior Driven Development is the process of defining business requirements as testable acceptance criteria and then implementing them using a test-first development approach. Examples and references for BDD can be found in the playbook on BDD.

When coding tests, the test statements should clearly describe what is being executed so that we can create a shared understanding of what’s getting build by all stakeholders. Tests are the living documentation for what the application is doing and test results should be effective on-boarding documentation.

TDD

Test-driven development is the practice of writing a failing test before the implementation of a feature or bug fix. Red -> Green -> Refactor refers to the TDD process of adding a failing (red) test, implementing that failing test to make it pass (green) and then cleaning up the code after that (refactor). This approach to testing gives you confidence as it avoids any false positives and also serves as a design mechanism to help you write code that is decoupled and free of unnecessary extra code. TDD also drives up code coverage organically due to the fact that each use case gets a new test added.

People often confuse writing tests in general with TDD. Writing tests after implementing a use case is not the same as TDD, that would be test oriented application development (TOAD) and like a toad, it has many warts. The process for toad would be green, green then refactor at a later date, maybe. The lack of a failing test in that process opens the door for false positive tests and often ends up taking more time as the code and tests end up needing to both be refactored. In addition, the design of an api is not considered as things are developed from the bottom up, not from the top down. This can lead to tight coupling, unnecessary logic and other forms of tech debt in the codebase.

Naming Conventions

Test names should generally be descriptive and inclusive of what is being tested. A good rule of thumb when deciding a test name is to follow the “given-when-then” or “arrange-act-assert” conventions focusing on the “when” and “act” terms respectively. In both of these cases there is an implied action or generalized assertion that is expected, a test name should include this implication explicitly with an appropriate result effect description. For example:

    // Jest Example
    // "input validator with valid inputs should contain a single valid field caller receives success state"
    describe("input validator", () => {
      describe("with valid inputs", () => {
        it("should contain a single valid field caller receives success state", () => {});
      });
    });
  

    // JUnit Example
    // "input validator with valid inputs should contain a single valid field caller receives success state"

    @DisplayName("input validator") 
    public class InputValidationTest { 
        @Nested @DisplayName("with valid inputs") 
        class ValidScenarios { 
            @Test @DisplayName("should contain a single valid field caller receives success state") 
                public void containsSingleValidField() { 
                // 
            } 
        } 
    }
  

Casing

For test environments that require method names to describe its tests and suites it is recommended that they follow their language and environment conventions. See formatting under static testing for further best practices.

Grouping

Where possible suites and their respective tests should be grouped to allow for higher readability and identification; If the environment supports it nested groups is also a useful and good practice to employ. For example a logical nesting of “unit-scenario-expectation” allows for encapsulating multiple scenarios that could potentially apply to a unit under test. For example:

    describe("unit-under-test", () => {
        describe("scenario-for-unit", () => {
            test("expectation-for-scenario", () => {});
        });
    });
  

    @DisplayName("unit-under-test")
    class ExampleTest {
        @Nested @DisplayName("scenario-for-unit")
        class ExampleScenario {
            @Test @DisplayName("expectation-for-scenario")
            void expectationForScenario() {
                //
            }
        }
    }
  

Continuously Assess your Tests Effectiveness

Prevent common anti-patterns like disabling, skipping, or commenting test cases or coverage gathering Make sure it’s still covering for valid use cases

Shift Left

“Write tests, not too many, mostly integration.” - Kent C. Dodds

5.6 - Contract Testing

A contract test is used to validate the test doubles used in a network integration test. Contract tests are run against the live external sub-system and exercises the portion of the code that interfaces to the sub-system. Because of this, they are non-deterministic tests and should not break the build, but should trigger work to review why they failed and potentially correct the contract.

A contract test validates contract format, not specific data.

– Testing Glossary

Provider

Providers are responsible for validating that all API changes are backwards compatible unless otherwise indicated by changing API versions. Every build should validate the API contract to ensure no unexpected changes occur.

Consumer

Consumers are responsible for validating that they can consume the properties they need (see Postel’s Law) and that no change breaks their ability to consume the defined contract.

Recommended Best Practices

Provider contract tests are typically implemented as unit tests of the schema and response codes of an interface. As such they should be deterministic and should run on every commit, pull request, and verification of the trunk.
Consumer contract tests should avoid testing the behavior of a dependency, but should focus on comparing that the contract double still matches the responses from the dependency. This should be running on a schedule and any failures reviewed for cause. The frequency of the test run should be proportional to the volatility of the interface.
When dependencies are tightly aligned, consumer-driven contracts should be used
- The consuming team writes automated tests with all consumer expectations
- They publish the tests for the providing team
- The providing team runs the CDC tests continuously and keeps them green
- Both teams talk to each other once the CDC tests break
Provider Responsibilities:
- Providers should publish machine-readable documentation of their interface to facilitate consumer testing and discoverability.
- Even better, publish a dedicated technical compatibility kit that is tested on every build that provides a trusted virtual service to eliminate the need for consumer contract testing.

Resources

Examples

🚧 Under Construction 🚧

5.7 - Customer Experience Alarms

Customer Experience Alarms are a type of active alarm. It is a piece of software that sends requests to your system much like a user would. We use it to test the happy-path of critical customer workflows. These requests happen every minute (ideally, but can be as long as every 5 minutes). If they fail to work, or fail to run, we emit metrics that cause alerts. We run these in all of our environments, not just production, to ensure that they work and we catch errors early.

– Testing Glossary

These are different than having log-based alarms because we can’t guarantee that someone is working through all of the golden-path workflows for our system at all times. If we rely entirely on logs, we wouldn’t know if the golden workflows are accurate when we deploy at 3am on a Saturday due to an automated process.

These tests have a few important characteristics:

They are run in all environments, including production.
They aren’t generated from UI workflows, but rather from direct API access
They ideally run every minute.
If they don’t work (in production) they page someone. Even at 3am.

Alternate Terms

Synthetic Probes (Google)
Canary (Amazon, although it doesn’t mean what Canary means here)

5.8 - Integration Testing

An integration test is a deterministic test to verify how the unit under test interacts with other units without directly accessing external sub-systems. For the purposes of clarity, “integration test” is not a test that broadly integrates multiple sub-systems. That is an E2E test.

– Testing Glossary

Some examples of an integration test are validating how multiple units work together (sometimes called a “sociable unit test”) or validating the portion of the code that interfaces to an external network sub-system while using a test double to represent that sub-system.

Validating the behavior of multiple units with no external sub-systems

Validating the portion of the code that interfaces to an external network sub-system

When designing network integration tests, it’s recommended to also have contract tests running asynchronously to validate the service test doubles.

Recommended Best Practices

Integration tests provide the best balance of speed, confidence, and cost when building tests to ensure your system is properly functioning. The goal of testing is to give developers confidence when refactoring, adding features or fixing bugs. Integration tests that are decoupled from the implementation details will give you this confidence without giving you extra work when you refactor things. Too many unit tests, however, will lead to very brittle tests. If you refactor code (i.e. change the implementation w/out changing the functionality) the goal should be to NOT break any tests and ideally not even touch them at all. If lots of tests are breaking when you refactor, it’s probably a sign of too many unit tests and not enough integration tests.

Tests should be written from the perspective of how the actor experiences it.
Avoid hasty abstractions. Duplication in tests is not the enemy. In fact, it’s often better to have duplicated code in tests than it is to have complex abstractions. Tests should be damp, not DRY.
Design tests that alert to failure as close to defect creation as possible.
“Don’t poke too many holes in reality.” Only use mocks or test doubles when absolutely necessary to maintain determinism in your test. Justin Searls has a great talk about this.
Flakey tests need to be corrected to prevent false positives that degrade the ability of the tests to act as an effective code gate.
Write tests from the actor’s perspective and don’t introduce a test user. (e.g. When I give this input, I expect this outcome)
- End-User - when building a UI, what response will each input provide to the user?
- Consumer - when building a library or service, what output will be expected for a given input?
- Test User - a non-existent user/consumer that exists just for the purpose of writing a test. Avoid this type of user. Kent Dodds has a great post about this user.
Don’t test implementation details. Tests should focus on what the outcomes are, not how the outcomes occurred.
- Examples of testing implementation details include:
  - internal state
  - private methods/properties etc
  - things a user won’t see/know about.
Integration tests are normally run with unit tests.

Service Integration Tests

Service integration tests are focused on validating how the system under test responds to information from an external service and that service contracts can be consumed as expected. They should be deterministic and should not test the behavior of the external service. The integration can be from UI to service or service to service. A typical service integration test is a set of unit tests focused on interface schema and response codes for the expected interaction scenarios.

Use virtual services or static mocks instead of live services to ensure the test is repeatable and deterministic.
Implement contract tests to continuously validate the virtual service or mock is current.
Don’t over-test. When validating service interactions, testing that a dependency returns a specific value is testing the behavior of the dependency instead of the behavior of the SUT.

Database Integration Tests

Test data management is one of the more complex problems, so whenever possible using live data should be avoided.

Good practices include:

In-memory databases
Personalized datasets
Isolated DB instances
Mocked data transfer objects

Front End Driven Integration Tests

Don’t use tools like Enzyme that let you peek behind the curtain.
Follow the Accessibility order of operations to get a reference to elements (in prioritized order):
1. Things accessible to all users (Text, placeholder, label, etc)
2. Accessibility features (role, title, alt tag, etc)
3. Only after exhausting the first 2, then use test ID or CSS/XPath selectors as an escape hatch. But remember, the user doesn’t know about these so try to avoid them.

Alternate Terms

Sociable Unit Test

Alternate Definitions

When integrating multiple sub-systems into a larger system: this is an End to End Test.
When testing all modules within a sub-system through the API or user interface: this is a Functional Test.

Resources

Examples

    describe("retrieving Hygieia data", () => {
      it("should return counts of merged pull requests per day", async () => {
        const successStatus = 200;
        const result = await hygieiaConnector.getResultsByDay(
          hygieiaConnector.hygieiaConfigs.integrationFrequencyRoute,
          testConfig.HYGIEIA_TEAMS[0],
          testConfig.getTestStartDate(),
          testConfig.getTestEndDate()
        );

        expect(result.status).to.equal(successStatus);
        expect(result.data).to.be.an("array");
        expect(result.data[0]).to.haveOwnProperty("value");
        expect(result.data[0]).to.haveOwnProperty("dateStr");
        expect(result.data[0]).to.haveOwnProperty("dateTime");
        expect(result.team).to.be.an("object");
        expect(result.team).to.haveOwnProperty("totalAllocation");
      });

      it("should return an empty array if the team does not exist", async () => {
        const result = await hygieiaConnector.getResultsByDay(
          hygieiaConnector.hygieiaConfigs.integrationFrequencyRoute,
          0,
          testConfig.getTestStartDate(),
          testConfig.getTestEndDate()
        );
        expect(result.status).to.equal(successStatus);
        expect(result.data).to.be.an("array");
        expect(result.data.length).to.equal(0);
      });
    });
  

Recommended Tooling

Integration Tooling is the same as recommended for Unit Tests

5.9 - Static Testing

A static test is a test that evaluates non-running code against rules for known good practices to check for security, structure, or practice issues.

– Testing Glossary

Static code analysis has many key purposes.

It warns of excessive complexity in the code that will degrade the ability to change it safely.
Identifies issues that could expose vulnerabilities
Shows anti-patterns that violate good practices
Alerts to issues with dependencies that may prevent delivery, create a vulnerability, or even expose the company to lawsuits.
It catches errors

Principles

When implementing any test, the test should be designed to provide alerts as close to the moment of creation as possible.
Static analysis, many scans can be run realtime in IDEs. Others during the build or as a pre-commit scan. Others require tooling that can only be used on the CI server. Whatever the test, drive it left.
Recheck everything on CI while verifying HEAD

Types of static tests

Linting: This automates catching of common errors in code and the enforcement of best practices
Formatting: Enforcement of code style rules. It removes subjectivity from code reviews
Complexity: Are code blocks too deep or too long? Complexity causes defects and simple code is better.
Type checking: Type checking can be a key validation to prevent hard to identify defects replacing certain classes of tests and logic otherwise required (e.g. unit tests validating internal APIs)
Security: Checking for known vulnerabilities and coding patterns that provide attack vectors are critical
Dependency scanning :
- Are your dependencies up to date?
- Has the dependency been hijacked?
- Are there known security issues in this version that require immediate resolution?
- Is it licensed appropriately?

Recommended Best Practices

IDE plugins to identify problems in realtime
Pre-commit hooks to prevent committing problems
Verification during PR and during the CI build on the HEAD to verify that earlier verification happened and was effective.
Discourage disabling of static tests (e.g. skipping tests, ignoring warnings, ignoring code on coverage evaluation, etc)
Write custom rules (lint, formatting, etc) for common code review feedback

Recommended Tooling

Platform	Tools
Android	SonarQube, Lint, ktLink
iOS	SonarQube, SwiftLint
Web	Linter: eslint Formatter: prettier Scanner: SonarQube
Java BE	Linter/Formatter: sonar, PMD
JS/node BE	Linter: eslint Formatter: prettier Scanner: SonarQube

5.10 - Unit Testing

Unit tests are deterministic tests that exercise a discrete unit of the application, such as a function, method, or UI component, in isolation to determine whether it behaves as expected.

– Testing Glossary

When testing the specs of functions, prefer testing public API (methods, interfaces, functions) to private API: the spec of private functions and methods are meant to change easily in the future, and unit-testing them would amount to writing a Change Detector Test, which is an anti-pattern.

The purpose of unit tests are to:

Verify the functionality of a unit (method, class, function, etc.) in isolation
Good for testing hi-complexity logic where there may be many permutations (e.g. business logic)
Keep Cyclomatic Complexity low through good separations of concerns and architecture

Principles

Unit tests are low-level and focus on discrete units of the application
All dependencies are typically replaced with test-doubles to remove non-determinism
Unit tests are fast to execute
Test Suite is ran after every code change

Recommended Best Practices

Run a subset of your test suite based on the part of the code your are currently working on
- Following TDD practices plus the watch functionality of certain testing frameworks is an easy way to achieve this
Pre-commit hooks to run the test suite before committing code to version control
- Verification during PR and during the CI build on the HEAD to verify that earlier verification happened and was effective.
Discourage disabling of static tests (e.g. skipping tests, ignoring warnings, ignoring code on coverage evaluation, etc)
Write custom rules (lint, formatting, etc) for common code review feedback

Resources

Examples

    // Example from lodash
    describe('castArray', () => {
        it('should wrap non-array items in an array', () => {
            const values = falsey.concat(true, 1, 'a', { a: 1 });
            const expected = lodashStable.map(values, (value) => [value]);
            const actual = lodashStable.map(values, castArray);

            expect(actual).toEqual(expected);
        });

        it('should return array values by reference', () => {
            const array = [1];
            expect(castArray(array)).toBe(array);
        });

        it('should return an empty array when no arguments are given', () => {
            expect(castArray()).toEqual([]);
        });
    });
  

    @Test
    // Mock the userService
    public void verifyMockedUserDetails() throws Exception {

      // ===============Arrange===============
      ObjectMapper mapper = new ObjectMapper();
      User userMockData = mapper.readValue(new File(TestConstants.DATA_FILE_ROOT + "user_mock.json"), User.class);

      // This code mocks the getUserInfo method for userService
      // Any call made to the getUserInfo will not make actual method call instead
      // returns the userMockData
      Mockito.when(userService.getUserInfo(TestConstants.userId)).thenReturn(userMockData);

      // ===============Act===============
      RequestBuilder requestBuilder = MockMvcRequestBuilders.get("/user/" + TestConstants.userId)
      .accept(MediaType.APPLICATION_JSON);

      MvcResult mvcResponse = mockMvc.perform(requestBuilder).andReturn();
      String responsePayload = mvcResponse.getResponse().getContentAsString();
      String status = JsonPath.parse(responsePayload).read("$.STATUS");
        Map<String, String> userMap = JsonPath.parse(responsePayload).read("$.payload");

      // ===============Assert===============
      JSONAssert.assertEquals(TestConstants.PARTIAL_MOCK_SUCCESS_PAYLOAD, responsePayload, false); // disable strict
      // validate the expected userMockData is matching with actual userMap Data
      Assert.assertEquals(TestConstants.SUCCESS, status);
      Assert.assertEquals(userMockData.getManager(), userMap.get("manager"));
      Assert.assertEquals(userMockData.getVp(), userMap.get("vp"));
      Assert.assertEquals(userMockData.getOrganization(), userMap.get("organization"));
      Assert.assertEquals(userMockData.getDirector(), userMap.get("director"));
      Assert.assertEquals(userMockData.getCostcenter(), userMap.get("costcenter"));
    }
  

Recommended Tooling

Platform	Tools
Android	Framework: JUnit5 Assertion: Google Truth
iOS	XCTest
Web	Framework: jest Assertion & Mocking: expect (jest), jest-dom, others as necessary Code Coverage: instanbul/nyc (jest)
Java BE	Framework: TestNG, JUnit5 Code Coverage: sonar (sonarlint) Mocking: Powermock, Mockitoi Assertion: REST Assured, Truth, TestNG/JUnit5
JS/node BE	Framework: jest Assertion & Mocking: expect (jest) - generic, supertest or nock - http server endpoint, apollo - graphql server testing Code Coverage: instanbul/nyc (jest)

6 - Work Decomposition

Tips for breaking down work to “small enough”.

Reducing the batch size of delivered work is one of the most important things we can do to drive improved workflow, quality, and outcomes. Why?

We have fewer assumptions in the acceptance criteria because we had to define how to test them. The act of defining them as tests brings out questions. “How can we validate that?”
We are less subject to hope creep. We can tell within a day that we bit off more than we thought and can communicate that.
When we deliver and discover the story was wrong, we’ve invested less in money, time, and emotional attachment so we can easily pivot.
It makes us predictable
It helps to reset our brains on what “small” is. What many people consider small turns out to be massive once they see what small really is.

The following playbooks have proven useful in helping teams achieve this outcome.

6.1 - From Roadmap to User Story

A guide to aligning priorities and breaking down work across multi-team products

Aligning priorities across multi-team products can be challenging. This guide outlines how to effectively break down work from program-level roadmaps to team-level user stories.

Program Roadmap

Key Point

Establishing and understanding goals and priorities is crucial for an effective work breakdown process.

Stakeholders and leadership teams must define high-level initiatives and their priorities
Work can then be dispersed among product teams
Leadership teams can be composed of a core group of product owners

Product Roadmap

The program roadmap should break down into the product roadmap, which includes the prioritized list of epics for each product.

The leadership team should define:

Product vision
Roadmap
Dependencies for each product

Team Backlog

The team backlog should comprise the prioritized epics from the product roadmap.

Effective Work Breakdown

The core group needed to effectively break down high-level requirements includes:

Product owners
Tech leads
Project managers

Product teams should use processes effective for Work Decomposition to break down epics into:

Smaller epics
Stories
Tasks

6.2 - Work Decomposition

A guide to effectively breaking down work into manageable, deliverable units

Effective work decomposition is crucial for delivering value faster with less rework. This guide outlines the process and best practices for breaking down work from ideas to tasks.

Prerequisites

Before implementing the work breakdown flow, ensure your team has:

Definition of Ready
Definition of Done
Backlog refinement cadence with appropriate team members and stakeholders

Work Breakdown Process

Goal

Decompose work into small batches that can be delivered frequently, multiple times a week.

Key Tips for Work Decomposition

Known poor quality should not flow downstream
Plan refinement meetings when people are mentally alert
Good acceptance criteria come from good communication
Focus on outcomes, not volume, during refinement

Stages of Work Breakdown

1. Intake/Product Ideas

Ideas become epics with defined outcomes, clear goals, and value
Epics become a list of features

Common struggles:

2. Refining Epics/Features into Stories

Stories are observable changes with clear acceptance criteria, completable in less than two days.

Typical problems:

3. Refining Stories into Development Tasks

Tasks are independently deployable changes, mergeable to trunk daily
Breaking stories into tasks allows teams to swarm work and deliver value faster
Teams need to understand what makes a good task

Measuring Success

Key Metric

Track the team’s Development Cycle Time to judge improvements in decomposition.

Ideal characteristics:

Stories take 1-2 days to deliver
No rework
No delays waiting for explanations
No dependencies on other stories or teams

6.3 - Behavior Driven Development

Behavior Driven Development is the collaborative process where we discuss the intent and behaviors of a feature and document the understanding in a declarative, testable way. These testable acceptance criteria should be the Definition of Done for a user story. BDD is not a technology or automated tool. BDD is the process of defining the behavior. We can then automate tests for those behaviors.

Example:

Feature: I need to smite a rabbit so that I can find the Holy Grail

Scenario: Use the Holy Hand Grenade of Antioch
Given I have the Holy Hand Grenade of Antioch
When I pull the pin
And I count to 3
But I do not count to 5
And I lob it towards my foe
And the foe is naughty in my sight
Then my foe should snuff it

Recommended Practices

Gherkin is the domain specific language that allows acceptance criteria to be expressed in “Arrange, Act, Assert” in a way that is understandable to all stakeholders. Example:

Feature: As an hourly associate I want to be able to log my arrival time so that I can be
paid correctly.

Scenario: Clocking in
Given I am not clocked in
When I enter my associate number
Then my arrival time will be logged
And I will be notified of the time

Scenario: Clocking out
Given I am clocked in
When I enter my associate number
And I have been clocked in for more than 5 minutes
Then I will be clocked out
And I will be notified of the time

Scenario: Clocking out too little time
Given I am clocked in
When I enter my associate number
And I have been clocked in for less than 5 minutes
Then I will receive an error

Using Acceptance Criteria to Negotiate and Split

With the above criteria, it may be acceptable to remove the time validation and accelerate the delivery of the time logging ability. After delivery, we may learn that the range validation isn’t required. If true, we’ve saved money and time by NOT delivering unneeded features. First, we deliver the ability to clock in and see if we really do need the ability to verify.

Feature: As an hourly associate I want to be able to log my arrival time so that I can be
paid correctly.

Scenario: Clocking in
Given I am not clocked in
When I enter my associate number
Then my arrival time will be logged
And I will be notified of the time

Scenario: Clocking out
Given I am clocked in
When I enter my associate number
And I have been clocked in for more than 5 minutes
Then I will be clocked out
And I will be notified of the time

If, in production, we discover that the sanity check is required to prevent time clock issues, we can quickly add that behavior.

Feature: As an hourly associate I want to be prevented from clocking out immediately after
clocking in.

Scenario: Clocking out more than 5 minutes after arrival
Given I am clocked in
And I have been clocked in for more than 5 minutes
When I enter my associate number
Then I will be clocked out
And I will be notified of the time

Scenario: Clocking out less than 5 minutes after arrival
Given I am clocked in
And I have been clocked in for less than 5 minutes
When I enter my associate number
Then I will receive an error

Tips

Scenarios should be written from the point of view of the consumer. If the consumer; either a user, UI, or another service.
Scenarios should be focused on a specific function and should not attempt to describe multiple behaviors.
If a story has more than 6 acceptance criteria, it can probably be split.
No acceptance test should contain more than 10 conditions. In fact, much less is recommended.
Acceptance tests can be used to describe a full end-to-end user experience. They are also recommended for describing the behavior of a single component in the flow of the overall behavior.

References

Gherkin Reference
BDD Primer - Liz Keogh
Better Executable Specifications - Dave Farley
A Real-world Example of BDD - Dave Farley
ATDD - How to Guide - Dave Farley

6.4 - Task Decomposition

What does a good task look like?

A development task is the smallest independently deployable change to implement acceptance criteria.

Recommended Practices

Create tasks that are meaningful and take less than two days to complete.

Given I have data available for Integration Frequency
Then score entry for Integration Frequency will be updated for teams

Task: Create Integration Frequency Feature Flag.
Task: Add Integration Frequency as Score Entry.
Task: Update Score Entry for Integration Frequency.

Use Definition of Done as your checklist for completing a development task.

Tips

If a task includes integration to another dependency, add a simple contract mock to the task so that parallel development of the consumer and provider will result in minimal integration issues.
Decomposing stories into tasks allows teams to swarm stories and deliver value faster

6.5 - Contract Driven Development

Contract Driven Development is the process of defining the contract changes between two dependencies during design and prior to construction. This allows the provider and consumer to work out how components should interact so that mocks and fakes can be created that allow the components to be developed and delivered asynchronously.

Recommended Practices

For services, define the expected behavior changes for the affected verbs along with the payload. These should be expressed as contract tests, the unit test of an API, that both provider and consumer can use to validate the integration independently.

For more complicated interaction that require something more than simple canned responses, a common repository that represents a fake of the new service or tools like Mountebank or WireMock can be used to virtualize more complex behavior. It’s important that both components are testing the same behaviors.

Contract tests should follow Postel’s Law: "Be conservative in what you do, be liberal in what you accept from others".

Tips

For internal services, define the payload and responses in the developer task along with the expected functional test for that change.
For external services, use one of the open source tools that allow recording and replaying responses.
Always create contract tests before implementation of behavior.

6.6 - Defining Product Goals

Product Goals

Product goals are a way to turn your vision for your product into easy to understand objectives that can be measured and achieved in a certain amount of time.

Increased transparency into product metrics

Measurable Outcome: Increased traffic to product page

When generating product goals, you need to understand what problem you are solving, who you are solving it for, and how you measure that you achieved the goals.

Initiatives

Product goals can be broken down into initiatives, that when accomplished, deliver against the product strategy.

Provide one view for all product KPIs.

Ensure products have appropriate metrics associated with them.

Initiatives can then be broken down into epics, stories, tasks, etc. among product teams, with high-level requirements associated.

Epics

An epic is a complete business feature with outcomes defined before stories are written. Epics should never be open ended buckets of work.

I want to be able to review the CI metrics trends of teams who have completed a
DevOps Dojo engagement.

Tips

Product goals need a description and key results needed to achieve them.
Initiatives need enough information to help the team understand the expected value, the requirements, measure of success, and the time frame associated to completion.

6.7 - Definition of Ready

Is it REALLY Ready?

A Definition of Ready is a set of criteria decided by the team that defines when work is ready to begin. The goal of the Definition of Ready to help the team decide on the level of uncertainty that they are comfortable with taking on with respect to their work. Without that guidance, any work is fair game. That is a recipe for confusion and disaster.

Recommended Practices

When deciding on a Definition of Ready, there are certain minimum criteria that should always be there. These are:

Description of the value the work provides (Why do we want to do this?)
Testable Acceptance Criteria (When do we know we’ve done what we need to?)
The team has reviewed and agreed the work is ready (Has the team seen it?)

However, the context of a team can make many other criteria applicable. Other criteria could include:

Wireframes for new UI components
Contracts for APIs/services we depend on
All relevant test types identified for subtasks
Team estimate of the size of the story is no more than 2 days

The Definition of Ready is a living document that should evolve over time as the team works to make their delivery system more predictable. The most important thing is to actually enforce the Definition of Ready. If it’s not enforced, it’s completely useless.

If any work in “Ready to Start” does not meet the Definition of Ready, move it back to the Backlog until it is refined.
Any work that is planned for a sprint/iteration must meet the Definition of Ready. Do not accept work that isn’t ready!
If work needs to be expedited, it needs to go through the same process. (Unless there is immediate production impact, of course)

Tips

Using Behavior Driven Development is one of the best ways to define testable acceptance criteria.
Definition of Ready is also useful for support tickets or other types of work that the team can be responsible for. It’s not just for development work!
It’s up to everyone on the team, including the Product Owner, to make sure that non-ready work is refined appropriately.
The recommended DoR for CD is that any story can be completed, either by the team or a single developer, in 2 days or less

6.8 - Spikes

Spikes are an exploration of potential solutions for work or research items that cannot be estimated. They should be time-boxed in short increments (1-3 days).

Recommended Practices

Since all work has some amount of uncertainty and risk, spikes should be used infrequently when the team has no idea on how to proceed with a work item. They should result in information that can be used to better refine work into something valuable, for some iteration in the future.

Spikes should follow a Definition of Done, with acceptance criteria, that can be demoed at the end of its timebox.

A spike should have a definite timebox with frequent feedback to the team on what’s been learned so far. It can be tempting to learn everything about the problem and all of the solutions before trying anything, but the best way to learn is to learn using the problem in front of us right now. Batching learning is worse than batching other kinds of work because effective learning requires applying the learning immediately or it’s lost.

Tips

Use spikes sparingly, only when high uncertainty exists.
Spikes should be focused on discovery and experimentation.
Stay within the parameters of the spike. Anything else is considered a waste.

6.9 - Story Slicing

Story slicing is the activity of taking large stories and splitting them into smaller, more predictable deliveries. This allows the team to deliver higher priority changes more rapidly instead of tying those changes to others that may be of lower relative value.

Recommended Practices

Stories should be sliced vertically. That is, the story should be aligned such that it fulfills a consumer request without requiring another story being deployed. After slicing, they should still meet the INVEST principle.

Example stories:

As an hourly associate I want to be able to log my arrival time so that I can be
 paid correctly.

As a consumer of item data, I want to retrieve item information by color so that
 I can find all red items.

Stories should not be sliced along tech stack layer or by activity. If you need to deploy a UI story and a service story to implement a new behavior, you have sliced horizontally.

Do not slice by tech stack layer

UI “story”
Service “story”
Database “story”

Do not slice by activity

Coding “story”
Review “story”
Testing “story”

Tips

If you’re unsure if a story can be sliced thinner, look at the acceptance tests from the BDD activity and see if it makes sense to defer some of the tests to a later release.
While stories should be sliced vertically, it’s quite possible that multiple developers can work the story with each developer picking up a task that represents a layer of the slice.
Minimize hard dependencies in a story. The odds of delivering on time for any activity are 1 in 2^n where n is the number of hard dependencies.

7 - 24 Capabilities to Drive Improvement

“Our research has uncovered 24 key capabilities that drive improvements in software delivery performance in a statistically significant way. Our book details these findings.”

Excerpt From: Nicole Forsgren PhD, Jez Humble & Gene Kim. Accelerate

Continuous Delivery Capabilities

Use version control for all production artifacts

Version control is the use of a version control system, such as GitHub or Subversion, for all production artifacts, including application code, application configurations, system configurations, and scripts for automating build and configuration of the environment.

Automate your deployment process

Deployment automation is the degree to which deployments are fully automated and do not require manual intervention.

Implement continuous integration

Continuous integration (CI) is the first step towards continuous delivery. This is a development practice where code is regularly checked in, and each check-in triggers a set of quick tests to discover serious regressions, which developers fix immediately. The CI process creates canonical builds and packages that are ultimately deployed and released.

Use trunk-based development methods

Trunk-based development has been shown to be a predictor of high performance in software development and delivery. It is characterized by fewer than three active branches in a code repository; branches and forks having very short lifetimes (e.g., less than a day) before being merged into trunk; and application teams rarely or never having code lock periods when no one can check in code or do pull requests due to merging conflicts, code freezes, or stabilization phases.

Implement test automation

Test automation is a practice where software tests are run automatically (not manually) continuously throughout the development process. Effective test suites are reliable—that is, tests find real failures and only pass releasable code. Note that developers should be primarily responsible for creation and maintenance of automated test suites.

Support test data management

Test data requires careful maintenance, and test data management is becoming an increasingly important part of automated testing. Effective practices include having adequate data to run your test suite, the ability to acquire necessary data on demand, the ability to condition your test data in your pipeline, and the data not limiting the amount of tests you can run. We do caution, however, that teams should minimize, whenever possible, the amount of test data needed to run automated tests.

Shift left on security

Integrating security into the design and testing phases of the software development process is key to driving IT performance. This includes conducting security reviews of applications, including the Infosec team in the design and demo process for applications, using pre-approved security libraries and packages, and testing security features as a part of the automated testing suite.

Implement continuous delivery (CD)

CD is a development practice where software is in a deployable state throughout its lifecycle, and the team prioritizes keeping the software in a deployable state over working on new features. Fast feedback on the quality and deployability of the system is available to all team members, and when they get reports that the system isn’t deployable, fixes are made quickly. Finally, the system can be deployed to production or end users at any time, on demand.

Architecture Capabilities

Use a loosely coupled architecture

This affects the extent to which a team can test and deploy their applications on demand, without requiring orchestration with other services. Having a loosely coupled architecture allows your teams to work independently, without relying on other teams for support and services, which in turn enables them to work quickly and deliver value to the organization.

Architect for empowered teams

Our research shows that teams that can choose which tools to use do better at continuous delivery and, in turn, drive better software development and delivery performance. No one knows better than practitioners what they need to be effective.

Product and Process Capabilities

Gather and implement customer feedback

Our research has found that whether organizations actively and regularly seek customer feedback and incorporate this feedback into the design of their products is important to software delivery performance.

Make the flow of work visible through the value stream

Teams should have a good understanding of and visibility into the flow of work from the business all the way through to customers, including the status of products and features. Our research has found this has a positive impact on IT performance.

Work in small batches

Teams should slice work into small pieces that can be completed in a week or less. The key is to have work decomposed into small features that allow for rapid development, instead of developing complex features on branches and releasing them infrequently. This idea can be applied at the feature and the product level. (An MVP is a prototype of a product with just enough features to enable validated learning about the product and its business model.) Working in small batches enables short lead times and faster feedback loops.

Foster and enable team experimentation

Team experimentation is the ability of developers to try out new ideas and create and update specifications during the development process, without requiring approval from outside of the team, which allows them to innovate quickly and create value. This is particularly impactful when combined with working in small batches, incorporating customer feedback, and making the flow of work visible.

Lean Management and Monitoring Capabilities

Have a lightweight change approval process

Our research shows that a lightweight change approval process based on peer review (pair programming or intra-team code review) produces superior IT performance than using external change approval boards (CABs).

Monitor across application and infrastructure to inform business decisions

Use data from application and infrastructure monitoring tools to take action and make business decisions. This goes beyond paging people when things go wrong.

Check system health proactively

Monitor system health, using threshold and rate-of-change warnings, to enable teams to preemptively detect and mitigate problems.

Improve processes and manage work with work-in-progress (WIP) limits

The use of work-in-progress limits to manage the flow of work is well known in the Lean community. When used effectively, this drives process improvement, increases throughput, and makes constraints visible in the system.

Visualize work to monitor quality and communicate throughout the team

Visual displays, such as dashboards or internal websites, used to monitor quality and work in progress have been shown to contribute to software delivery performance.

Cultural Capabilities

Support a generative culture (as outlined by Westrum)

This measure of organizational culture is based on a typology developed by Ron Westrum, a sociologist who studied safety-critical complex systems in the domains of aviation and healthcare. Our research has found that this measure of culture is predictive of IT performance, organizational performance, and decreasing burnout. Hallmarks of this measure include good information flow, high cooperation and trust, bridging between teams, and conscious inquiry.

Encourage and support learning

Is learning, in your culture, considered essential for continued progress? Is learning thought of as a cost or an investment? This is a measure of an organization’s learning culture.

Support and facilitate collaboration among teams

This reflects how well teams, which have traditionally been siloed, interact in development, operations, and information security.

Provide resources and tools that make work meaningful

This particular measure of job satisfaction is about doing work that is challenging and meaningful, and being empowered to exercise your skills and judgment. It is also about being given the tools and resources needed to do your job well.

Support or embody transformational leadership

Transformational leadership supports and amplifies the technical and process work that is so essential in DevOps. It is comprised of five factors: vision, intellectual stimulation, inspirational communication, supportive leadership, and personal recognition.

8 - Value Stream Mapping

A guide to conducting a Value Stream Mapping Workshop to optimize your development process.

The Value Stream Mapping Workshop uncovers all steps from idea conception to production, aiming to identify removable steps, bottlenecks, and high-defect areas.

Overview

Value Stream Mapping helps teams:

Identify and remove unnecessary steps
Uncover waiting periods between steps
Highlight steps with high defect rates

The outcome guides the design of an improved value stream, prioritizing changes to reduce waste in the current flow.

Prerequisites

An established process for value delivery (for a “to be” value stream)
Participation from all stakeholders in the value stream
Understanding of key terms:
- Wait time/non-value time
- Process time/value add time
- Percent Complete/Accurate (%C/A)

Recommended Practices

Start mapping from delivery and move backward to ensure no steps are missed.

Process

1. Identify the Source

Example

Team Demo

For each source of Requests, determine:

Average process time
Involved stakeholders
Percentage of work rejected by the next step

2. Identify Rework Loops

Rework loops are interruptions where steps need correction.

3. Identify Wait Time

Calculate wait time between steps, considering your team’s cadence.

Outcomes

Process time/wait time of your flow
Visual representation of the value stream(s)
Potential constraints (represented as kaizen bursts)

Tips

Regularly review and update the value stream map
Consider all potential flows for team processes

Value Proposition

Understanding how to value stream map team processes helps identify delivery constraints and improvement opportunities.

Acceptance Criteria

Value stream all processes associated with delivering value
Create actionable improvement items from the exercise

9 - Cloud Native Checklist

Cloud Native checklist

Capability	Yes / No
Domain Context diagram current with dependencies shown
Exception logging
Logs stream or self-purge
Dynamically configurable log levels
Database connections self-heal
Dependency connections self-heal
Service auto-restarts on failure
Automated resource and performance monitoring
Have NFRs & SLAs defined for each service
Automated alerting for SLAs and NFRs
No manual install steps
Utilize Correlation ID
Load balanced
Automated smoke tests after each deployment
Heartbeat responds in less than 1 minute after startup
No start-up ordering required
Minimal critical dependencies
Graceful degradation for non-critical dependencies
Circuit breakers and request throttles in place

Principles and Practices

While practices may change over time, principles are expected to be less volatile.

Small, autonomous, highly-cohesive services

Prefer event-driven, asynchronous communications between services.
Prefer eventual consistency / replication of select data elements over shared data structures.
Be cautious about creating shared binary dependencies across services.
Services are able to be checked out and run locally using embedded DBs, and/or mocked endpoint dependencies as necessary.

Hypermedia-driven service interactions

Model resources on the domain.
Use embedded links to drive resource state transitions.
HATEOAS Reference

Modeled around business concepts

Produce a system context diagram to understand your system boundaries. Consider following c4 architecture diagramming techniques.
Follow Domain Driven Design practices to understand your domain early in development, and model your domain in your code.
Use bounded contexts to isolate service boundaries and converse with canonical-model-based systems.

Hide internal implementation details

Model bounded contexts
Use packaging to scope components.
Services own their data & hide their databases.
No database-driven integration.
Technology-agnostic APIs (ReST).

Decentralize everything

Self-service whenever possible.
Teams own their services (but also consider internal open source practices).
Align teams to the organization.
Prefer choreography over orchestration.
Dumb middleware, smart endpoints.
Deployable to cloud and local (DC/store) environments

Deploy independently

Coexist versioned endpoints.
Prefer targeted releases of individual services over habitual mass-installs of several services at once.
Avoid tightly bound client/server stub generation.
One service per host.
Blue/green release testing techniques.
Consumer-driven upgrade decisions.

Isolate failure

Don’t treat remote calls like local calls.
Set timeouts appropriately (consider TCP connect and read timeouts in the 90ish-percentiles)
Apply bulk-heading & circuit breaker patterns to limit fallout of failure.
Understand and design for what should happen during network partitioning (network failures)
Use redundancy & load balancing

Highly observable

Monitored endpoints.
Use synthetic transactions to simulate real user behavior.
Aggregate logs and statistics.
Use correlation IDs to trace calls throughout the system.

Culture of automation

Automated developer driven testing: unit, functional, contract, integration, performance, & etc.
Deploy the same way everywhere.
Implement continuous delivery practices.
Trunk based development over branching by feature/team/release to promote continuous integration practices.
In the face of a lack of automation/provisioning/monitoring, prefer a properly structured monolith over many segregated smaller services.

Flow Improvement Learning Path

Behavior-Driven Development

Continuous Integration

Conway’s law

Domain-Driven Design

Pipeline Steps

Test-Driven Development

Three Ways

Value Stream Mapping

Wastes

1 - Glossary

Continuous Delivery

Continuous Deployment

Continuous Integration

Hard Dependency

Soft Dependency

Story Points

Toil

Unplanned Work

Vertical Sliced Story

WIP

2 - Starting CD

Overview

CD Pipeline

Goals

CD Maturity

Minimums

Good

Continuous Integration (CI)

CI Working Agreement

Desired Outcomes

Continuous Delivery/Deploy

Recommended Practices

Note

Pipeline Best Practices

Key Metrics

Tips

Further Reading

2.1 - Common Blockers

Work Breakdown

Stories without testable acceptance criteria

Stories too large

No definition of “ready”

No definition of “Done”

Team Workflow

Assigning tasks for the sprint

Co-dependant releases

Handoffs to other teams

Early story refining

Manual test as a stage gate

Meaningless retrospectives

Hardening / Testing / Tech Debt Sprints

Moving “resources” on and off teams to meet “demand”

One delivery per sprint

Skipping demo

Committing to distant dates

Not committing to dates

Velocity as a measure of productivity

CD Anti-Patterns

Work Breakdown

Workflow Management

Teams

Testing Process

2.2 - Pipeline & Application Architecture

1. Build a Deployment Pipeline

Entangled Architecture - Requires Remediation

Entangled Architecture

Characteristics

Common Entangled Practices

Entangled Improvement Plan

Tightly Coupled Architecture - Transitional

Tightly Coupled Architecture

Characteristics

Common Tightly Coupled Practices

Tightly Coupled Improvement Plan

Loosely Coupled Architecture - Goal

Loosely Coupled Architecture

Characteristics

Common Loosely Coupled Practices

2. Stabilize the Quality Signal