Improving the flow of delivery requires a combination of tooling, process improvement, and teamwork. Continuous delivery is an excellent tool for finding what needs to be improved next.
This is the multi-page printable view of this section. Click here to print.
Flow Improvement
- 1: Getting Started with CD
- 2: Glossary
- 3: Common Blockers
- 4: DevOps Learning Path
- 5: 24 Capabilities to Drive Improvement
- 6: Team Workflow
- 6.1: Code Review
- 6.2: Source Branching
- 6.3: Workflow Process
- 6.4: Definition of Done
- 6.5: Feedback Loops
- 6.6: Retrospectives
- 6.7: Unplanned Work
- 6.8: Visualizing Workflow
- 6.9: Work in Progress
- 7: Work Decomposition
- 7.1: Work Decomposition
- 7.2: Task Decomposition
- 7.3: Behavior Driven Development
- 7.4: Complexity and Estimation
- 7.5: Contract Driven Development
- 7.6: Defining Product Goals
- 7.7: Definition of Ready
- 7.8: From Program to User Story
- 7.9: Spikes
- 7.10: Story Slicing
- 8: Cloud Native Checklist
- 9: Value Stream Mapping
1 - Getting Started with CD
Introduction to CD
Continuous delivery is the ability to deliver the latest changes on-demand. CD is not build/deploy automation. It is the continuous flow of changes to the end-user with no human touchpoints between code integrating to the trunk and delivery to production. This can take the form of triggered delivery of small batches or the immediate release of the most recent code change.
CD is not a reckless throwing of random change into production. Instead, it is a disciplined team activity of relentlessly automating all of the validations required for a release candidate, improving the speed and reliability of quality feedback, and collaborating to improve the quality of the information used to develop changes.
CD is based on and extends the extreme programming practice of continuous integration. There is no CD without CI.
The path to continuous integration and continuous delivery may seem daunting to teams that are just starting out. We offer this guide to getting started with a focus on outcome metrics to track progress.
Continuous Delivery is far more than automation. It is the entire cycle of identifying value, delivering the value, and verifying with the end user that we delivered the expected value. The shorter we can make that feedback loop, the better our bottom line will be.
Goals
Both CI and CD are behaviors intended to improve certain goals. CI is very effective at uncovering issues in work decomposition and testing within the team’s processes so that the team can improve them. CD is effective at uncovering external dependencies, organizational process issues, and test architecture issues that add waste and cost.
The relentless improvement of how we implement CD reduces overhead, improves quality feedback, and improves both the outcomes of the end-user and the work/life balance of the team.
CD Maturity
It has been common for organizations to apply “maturity models” to activities such as CD. However, this has been found to lead to cargo culting and aligning goals to the process instead of the outcomes. Understanding what capabilities you have and what capabilities need to be added to fully validate and operate changes are important, but the goals should always align to improving the flow of value delivery to the end-user. This requires analyzing every process from idea to delivery and identifying what should be removed, what should be automated, and how we can continuously reduce the size of changes delivered.
There should never be an understanding that we are “mature” or “immature” with delivery. We can always improve. However, there should be an understanding of what competency looks like.
Minimums
- Each developer integrates tested changes to the trunk at least daily.
- Changes always use the same process to deliver.
- There are no process differences between deploying a feature or a fix.
- There are no manual quality gates.
- All test and production environments use the same artifact.
- If the release cadence requires release branches, then the release branches deploy to all test environments and production.
Good
- New work requires less than 2 days from start to delivery
- All changes deliver from the trunk
- The time from committing change and delivery to production is less than 60 minutes
- Less than 5% of changes require remediation
- The time to restore service is less than 60 minutes.
Continuous Integration
This working agreement for CI focuses on developing teamwork and delivering quality outcomes while removing waste.
- Branches originate from the trunk.
- Branches are deleted in less than 24 hours.
- Changes must be tested and not break existing tests before merging to the trunk.
- Changes are not required to be “feature complete”.
- Helping the team complete work in progress (code review, pairing) is more important than starting new work.
- Fixing a broken build is the team’s highest priority.
Desired outcomes:
- More frequent integration of smaller, higher quality, lower risk changes.
- More efficient and effective test architecture
- Lean code review process
- Reduced Work In Progress (WIP)
Continuous Delivery/Deploy
- Increased delivery frequency
- Increased stability
- Improved deploy success
- Reduced development cycle time
- Improved time to restore service
- Reduced process waste
- Smaller, less risky production releases.
- Small, cohesive, high morale, high-performing product teams with business domain expertise.
CD Anti-Patterns
Work Breakdown
Issue | Description | Good Practice |
---|---|---|
Unclear requirements | Stories without testable acceptance criteria | Work should be defined with acceptance tests to improve clarity and enable developer driven testing. |
Long development Time | Stories take too long to deliver to the end user | Use BDD to decompose work to testable acceptance criteria to find smaller deliverables that can be completed in less than 2 days. |
Workflow Management
Issue | Description | Good Practice |
---|---|---|
Rubber band scope | Scope that keeps expanding over time | Use BDD to clearly define the scope of a story and never expand it after it begins. |
Focusing on individual productivity | Attempting to manage a team by reporting the “productivity” of individual team members. This is the fastest way to destroy teamwork. | Measure team efficiency, effectiveness, and morale |
Estimation based on resource assignment | Pre-allocating backlog items to the people based on skill and hoping that those people do not have life events. | The whole team should own the team’s work. Work should be pulled in priority sequence and the team should work daily to remove knowledge silos. |
Meaningless retrospectives | Having a retrospective where the outcome does not results in team improvement items. | Focus the retrospective on the main constraints to daily delivery of value. |
Skipping demo | No work that can be demoed was completed. | Demo the fact that no work is ready to demo |
No definition of “Done” or “Ready” | Obvious | Make sure there are clear entry gates for “ready” and “done” and that the gates are applied without exception |
One or fewer deliveries per sprint | The sprint results in one or fewer changes that are production ready | Sprints are planning increments, not delivery increments. Plan what will be delivered daily during the sprint. Uncertainty increases with time. Distant deliverables need detailed analysis. |
Pre-assigned work | Assigning the list of tasks each person will do as part of sprint planning. This results in each team member working in isolation on a task list instead of the team focusing on delivering the next high value item. | The whole team should own the team’s work. Work should be pulled in priority sequence and the team should work daily to remove knowledge silos. |
Teams
Issue | Description | Good Practice |
---|---|---|
Unstable Team Tenure | People are frequently moved between teams | Teams take time to grow. Adding or removing anyone from a team lowers the team’s maturity and average expertise in the solution. Be mindful of change management |
Poor teamwork | Poor communication between team members due to time delays or “expert knowledge” silos | Make sure there is sufficient time overlap and that specific portions of the system are not assigned to individuals |
Multi-team deploys | Requiring more than one team to deliver synchronously reduces the ability to respond to production issues in a timely manner and delays delivery of any feature to the speed of he slowest teams. | Make sure all dependencies between teams are handled in ways that allow teams to deploy independently in any sequence. |
Testing Process
Issue | Description | Good Practice |
---|---|---|
Outsourced testing | Some or all of acceptance testing performed by a different team or an assigned subset of the product team. | Building in the quality feedback and continuously improving the same is the responsibility of the development team. |
Manual testing | Using manual testing for functional acceptance testing. | Manual tests should only be used for things that cannot be automated. In addition, manual tests should not be blockers to delivery but should be asynchronous validations. |
Recommended Practices
While implementation is contextual to the product, there are key steps that should be done whenever starting the CD journey.
- Value Stream Map: This is a standard Lean tool to make visible the development process and highlight any constraints the team has. This is a critical step to begin improvement. Build a road map of the constraints and use a disciplined improvement process to remove the constraints.
- Align to the Continuous Integration team working agreement and use the impediments to feed the team’s improvement process.
- We always branch from Trunk.
- Branches last less than 24 hours.
- Changes must be tested and not break existing tests.
- Changes are not required to be “feature complete”.
- Code review is more important than starting new work.
- Fixing a broken build is the team’s highest priority.
- Build and continuously improve a single CD automated pipeline for each repository. There should only be a single configuration for each repository that will deploy to all test and production environments.
A valid CD process will have only a single method to build and deploy any change. Any deviation for emergencies indicates an incomplete CD process that puts the team and business at risk and must be improved.
Pipeline
Focus on hardening the pipeline. Its job is to block bad changes. The team’s job is to develop its ability to do that. Only use the emergency process. If a process will not be used to resolve a critical outage, it should not be happening in the CD pipeline.
Integrate outside the pipeline. Virtualize inside the pipeline. Direct integration is not a good testing method for validating behavior because the data returned is not controlled. It IS a good way to validate service mocks. However, if done in the pipeline it puts fixing production at risk if the dependency is unavailable.
There should be one or fewer stage gates. Until release and deploy are decoupled, one approval for production. No other stage gates.
Developers are responsible for the full pipeline. No handoffs. Handoffs cause delays and dilute ownership. The team owns its pipelines and the applications they deploy from birth to death.
Short CI Cycle Time
CI cycle time should be less than 10 minutes from commit to artifact creation. CD cycle time should be less than 60 minutes from commit to Production.
Integrate outside the pipeline. Virtualize inside the pipeline
Direct integration to stateful dependencies (end-to-end testing) should be avoided in the pipeline. Tests in the pipeline should be deterministic and the larger the number of integration points the more difficult it is to manage state and maintain determinism. It is a good way to validate service mocks. However, if done in the pipeline it puts fixing production at risk if the dependency is unstable/unavailable.
All test automation pre-commit
Tests should be co-located with the system under test and all acceptance testing should be done by the development team. The goal is not 100% coverage. The goal is efficient, fast, effective testing.
No manual steps There should be no manual intervention after the code is integrated into the trunk. Manual steps inject defects.
Tips
Use trunk merge frequency, development cycle time, and delivery frequency to uncover pain points. The team has complete control merge frequency and development cycle time and can uncover most issues by working to improve those two metrics.
Make sure to keep all metrics visible and refer to them often to help drive the change.
See CD best practices and CD anti-patterns for more tips on effectively introducing CICD improvements to your team processes.
References
- Continuous Delivery
- Jez Humble: Continuous Delivery sounds great, but it won’t work here @ DevOn Summit 2017
2 - Glossary
- Continuous Delivery
- Continuous Deployment
- Continuous Integration
- Dependency (Hard)
- Dependency (Soft)
- Story Points
- Toil
- Unplanned Work
- Vertical Sliced Story
- WIP
Continuous Delivery
The ability to deliver the latest changes to production on demand.
Continuous Deployment
Delivering the latest changes to production as they occur.
Continuous Integration
A development process where each developer integrates tested, changes to trunk very frequently or at least once per day. Trunk is kept ready to deploy at all times.
Continuous integration requires that every time somebody commits any change, the entire application is built and a comprehensive set of automated tests is run against it. Crucially, if the build or test process fails, the development team stops whatever they are doing and fixes the problem immediately. The goal of continuous integration is that the software is in a working state all the time.
Continuous integration is a practice, not a tool. It requires a degree of commitment and discipline from your development team. You need everyone to check in small incremental changes frequently to mainline and agree that the highest priority task on the project is to fix any change that breaks the application. If people don’t adopt the discipline necessary for it to work, your attempts at continuous integration will not lead to the improvement in quality that you hope for.
Excerpt From: Jez Humble & David Farley. “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation.”
Dependency (Hard)
A hard dependency is something that must be in place before a feature is delivered. In most cases, a hard dependency can be converted to a soft dependency with feature flags.
Dependency (Soft)
A soft dependency is something that must be in place before a feature can be fully functional, but does not block the delivery of code.
Story Points
A measure of the relative complexity of delivering a story. Historically, 1 story point was 1 “ideal day”. An ideal day is a day where there are no distractions, the code is flowing, and we aren’t waiting on anything. No such day actually exists. ;)
Story points should only be used for planned work. Unplanned work or spikes should not be story pointed after the fact. Doing so artificially inflates the average capacity of the team and results in teams over-committing to delivery.
Toil
The repetitive, predictable, constant stream of tasks related to maintaining an application.
SRE Workbook: Eliminating Toil
Unplanned Work
Any work that the team inserts before the current planned work. Critical defects and “walk up” requests are unplanned work. It’s important that the team track all unplanned work and the reason so that steps can be taken by the team to reduce the future impact.
Vertical Sliced Story
A story should represent a response to a request that can be deployed independently of other stories. It should be aligned across the tech stack so that no other story needs to be deployed in concert to make the function work.
Examples:
- Submitting a search term and returning results.
- Requesting user information from a service and receiving a response.
WIP
Work in progress is any work that has been started but not delivered to the end-user
3 - Common Blockers
The following are very frequent issues that teams encounter when working to improve the flow of delivery.
Work Breakdown
Stories without testable acceptance criteria
All stories should be defined with declarative and testable acceptance criteria. This dramatically reduces the amount of waiting and rework once coding begins and enables a much smoother testing workflow.
Acceptance criteria should define “done” for the story. No behavior other than that specified by the acceptance criteria should be implemented. This prevents scope creep and gold-plating and makes delivery much more consistent.
Stories too large
Commonly stories are between 5 and 10 days in duration. This far exceeds to 1 to 2 day intended duration of stories. Large stories hide complexity, uncertainty, and dependencies.
- Stories represent the smallest user observable behavior change.
- To enable rapid feedback, higher quality acceptance criteria, and more predictable delivery, Stories should require no more than two days for a team to deliver.
No definition of “ready”
Teams should have a working agreement about the definition of “ready” for a story or task. Until the team agrees it has the information it needs, no commitments should be made and the story should not be added to the “ready” backlog.
Definition of Ready
- Story
- Acceptance criteria aligned with the value statement agreed to and understood.
- Dependencies noted and resolution process for each in place
- Spikes resolved.
- Sub-task
- Contract changes documented
- Component acceptance tests defined
No definition of “Done”
Having an explicit definition of done is important to keeping WIP low and finishing work.
Definition of Done
- Sub-task
- Acceptance criteria met
- Automated tests verified
- Code reviewed
- Merged to Trunk
- Demoed to team
- Deployed to production
- Story
- PO Demo completed
- Acceptance criteria met
- All tasks "Done"
- Deployed to production
Workflow Management
- Rubber band scope
- No definition of “Done”
- No definition of “Ready”
- Focusing on individual tasks completed
- Assigning tasks
- Estimation based on resource assignment
Assigning tasks for the sprint
Work should always be pulled by the next available team member. Assigning tasks results in each team member working in isolation on a task list instead of the team focusing on delivering the next high value item. New work should be started only after helping others complete work in progress.
Co-dependant releases
Multi-component release trains increase batch size and reduce delivered quality. Teams cannot improve efficiency if they are constantly waiting. Handle dependencies with code, do not manage them with process.
Handoffs to other teams
If the normal flow of work requires waiting on another team then batch sizes increase and quality is reduced.
Excessive backlog
Large product backlogs extend lead time and reduce customer satisfaction. Large backlogs also indicate over utilized teams or ineffective backlog review.
- Continuously review backlog for expired requests and remove them.
- Reassign components to less utilized teams.
Early story refining
Stories refined too far in advance create overproduction waste. Odds are that they will require re-refining. Time is better spent delivering the current priorities.
Manual test as a stage gate
- Manual testing is neither repeatable nor deterministic.
- Use continuous exploratory testing to find missing tests that should be added.
Meaningless retrospectives
Retrospectives should be metrics driven. Improvement items should be treated as business features.
Hardening / Testing / Tech Debt Sprints
Just no. These are not real things. Sprints represent work that can be delivered to production.
Moving “resources” on and off teams to meet “demand”
Teams take time to grow, they cannot be “constructed”. Adding or removing anyone from a team lowers the team’s maturity and average problem space expertise. Changing too many people on a team reboots the team.
One delivery per sprint
Sprints are planning increments, not delivery increments. Plan what will be delivered daily during the sprint.
Skipping demo
If the team has nothing to demo, demo that. Never skip demo.
Committing to distant dates
Uncertainty increases with time. Distant deliverables need detailed analysis.
Not committing to dates
Commitments drive delivery. Commit to the next Minimum Viable Feature.
Velocity as a measure of productivity
Velocity is planning metric. “We can typically get this much done in this much time.” It’s an estimate of relative capacity for new work that tends to change over time and these changes don’t necessarily indicate a shift in productivity. It’s also an arbitrary measure that varies wildly between organizations, teams and products. There’s no credible means of translating it into a normalized figure that can be used for meaningful comparison.
By equating velocity with productivity there is created an incentive to optimize velocity at the expense of developing quality software.
4 - DevOps Learning Path
These are the core skills we recommend everyone learn to execute CD.
Behavior-Driven Development
Every step in CD requires clear, testable acceptance criteria as a prerequisite. BDD is not test automation. BDD is the discussion that informs acceptance test driven development.
- Videos
- What is BDD Dave Farley co-author of Continuous Delivery - 16:28 min
- Acceptance Testing By Dave Farley co-author of Continuous Delivery - 14:49 min
- Recommended Reading
- BDD In Action by John Ferguson Smart
- Behavior-Driven Development with Cucumber: Better Collaboration for Better Software by Richard Lawrence, Paul Rayner
Continuous-Integration
Continuous integration is a requirement for CD. It requires very frequent integration of non-breaking code.
- Videos
- Top 10 Rules For Continuous Integration Dave Farley - 17 min.
- Continuous-Integration-Practices on Linkedin Learning. Instructed by Ernest Mueller and James Wickett - 4 min.
- Continuous-Integration on Linkedin Learning. Instructor by Laura Stone - 4 min.
- Recommended Reading
- Continuous Integration: Improving Software Quality and Reducing Risk by Paul M. Duvall, Steve Matyas, Andrew Glover.
Conway’s law
“Any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.” - Melvin Conway
Loosely coupled teams create loosely coupled systems. The opposite is also true.
- Videos
- Don’t Forget Conway’s Law Sarah Novotny - 8:50 mins.
- Recommended Reading
- Conway’s Law and System Design by Sam Newman.
- How to Design Our Organization and Architecture with Conway’s Law in Mind co-author Gene Kim.
Domain-Driven Design
This is another key design tool both for organizational and system design. This is a critical skill for developing microservices.
- Videos
- What is DDD Eric Evans - 57:06 min.
- Software Architecture: Domain-Driven Design LinkedIn Training Course.
- Recommended Reading
- What Is Domain-Driven Design? by Vladik Khononov.
Pipeline Steps
Architecting a system of delivery is about designing efficient quality gates for the system’s context.
- Videos
- Understanding A DevOps Pipeline David Farley - 13:24 mins.
- Recommended Reading
- Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation Jez Humble and David Farley
Test-Driven Development (Design)
TDD highly correlates with application architecture that is easy to maintain and easy to upgrade.
- Videos
- Does TDD Lead to Better Software Design? Dave Farley co-author of Continuous Delivery - 18:32 min.
- Three Mindsets of TDD Dave Farley co-author of Continuous Delivery - 18:57 min.
- TDD and DDD with .NET Core and VSCode - 1 hour
- Recommended Reading
- Test Driven Development: By Example by Kent Beck.
Three Ways
The core principles that define DevOps:
- Consider the system of delivery as a whole
- Amplify feedback loops
- Continuously learn and improve the delivery system
- Videos
- The 3 Ways of The Phoenix Project co-author Gene Kim - 3:30 mins.
- Recommended Reading
- The Three Ways: The Principles Underpinning DevOps by Gene Kim
- The DevOps Handbook co-author Gene Kim
Value Stream Mapping
The primary process analysis tool used to help identify and attack constraints to delivery.
- Videos
- How we used Value Stream Mapping to accelerate DevOps adoption Marcus Robinson - 45:26 min.
- Recommended Reading
- Value Stream Mapping: How to Visualize Work and Align Leadership for Organizational Transformation - Karen Martin and Mike Osterling
Wastes
Our goal is to remove waste daily. We must first learn to recognize it.
- Videos
- The 7 Types of Waste in Software Development Alex Green - 10:34 mins.
- Recommended Reading
- Making Work Visible by Dominica DeGrandis.
- The Art of Lean Software Development by Curt Hibbs; Mike Sullivan; Steve Jewett.
5 - 24 Capabilities to Drive Improvement
“Our research has uncovered 24 key capabilities that drive improvements in software delivery performance in a statistically significant way. Our book details these findings.”
- Excerpt From: Nicole Forsgren PhD, Jez Humble & Gene Kim. Accelerate
Continuous Delivery Capabilities
Use version control for all production artifacts
Version control is the use of a version control system, such as GitHub or Subversion, for all production artifacts, including application code, application configurations, system configurations, and scripts for automating build and configuration of the environment.
Automate your deployment process
Deployment automation is the degree to which deployments are fully automated and do not require manual intervention.
Implement continuous integration
Continuous integration (CI) is the first step towards continuous delivery. This is a development practice where code is regularly checked in, and each check-in triggers a set of quick tests to discover serious regressions, which developers fix immediately. The CI process creates canonical builds and packages that are ultimately deployed and released.
Use trunk-based development methods
Trunk-based development has been shown to be a predictor of high performance in software development and delivery. It is characterized by fewer than three active branches in a code repository; branches and forks having very short lifetimes (e.g., less than a day) before being merged into trunk; and application teams rarely or never having code lock periods when no one can check in code or do pull requests due to merging conflicts, code freezes, or stabilization phases.
Implement test automation
Test automation is a practice where software tests are run automatically (not manually) continuously throughout the development process. Effective test suites are reliable—that is, tests find real failures and only pass releasable code. Note that developers should be primarily responsible for creation and maintenance of automated test suites.
Support test data management
Test data requires careful maintenance, and test data management is becoming an increasingly important part of automated testing. Effective practices include having adequate data to run your test suite, the ability to acquire necessary data on demand, the ability to condition your test data in your pipeline, and the data not limiting the amount of tests you can run. We do caution, however, that teams should minimize, whenever possible, the amount of test data needed to run automated tests.
Shift left on security
Integrating security into the design and testing phases of the software development process is key to driving IT performance. This includes conducting security reviews of applications, including the Infosec team in the design and demo process for applications, using pre-approved security libraries and packages, and testing security features as a part of the automated testing suite.
Implement continuous delivery (CD)
CD is a development practice where software is in a deployable state throughout its lifecycle, and the team prioritizes keeping the software in a deployable state over working on new features. Fast feedback on the quality and deployability of the system is available to all team members, and when they get reports that the system isn’t deployable, fixes are made quickly. Finally, the system can be deployed to production or end users at any time, on demand.
Architecture Capabilities
Use a loosely coupled architecture
This affects the extent to which a team can test and deploy their applications on demand, without requiring orchestration with other services. Having a loosely coupled architecture allows your teams to work independently, without relying on other teams for support and services, which in turn enables them to work quickly and deliver value to the organization.
Architect for empowered teams
Our research shows that teams that can choose which tools to use do better at continuous delivery and, in turn, drive better software development and delivery performance. No one knows better than practitioners what they need to be effective.
Product and Process Capabilities
Gather and implement customer feedback
Our research has found that whether organizations actively and regularly seek customer feedback and incorporate this feedback into the design of their products is important to software delivery performance.
Make the flow of work visible through the value stream
Teams should have a good understanding of and visibility into the flow of work from the business all the way through to customers, including the status of products and features. Our research has found this has a positive impact on IT performance.
Work in small batches
Teams should slice work into small pieces that can be completed in a week or less. The key is to have work decomposed into small features that allow for rapid development, instead of developing complex features on branches and releasing them infrequently. This idea can be applied at the feature and the product level. (An MVP is a prototype of a product with just enough features to enable validated learning about the product and its business model.) Working in small batches enables short lead times and faster feedback loops.
Foster and enable team experimentation
Team experimentation is the ability of developers to try out new ideas and create and update specifications during the development process, without requiring approval from outside of the team, which allows them to innovate quickly and create value. This is particularly impactful when combined with working in small batches, incorporating customer feedback, and making the flow of work visible.
Lean Management and Monitoring Capabilities
Have a lightweight change approval process
Our research shows that a lightweight change approval process based on peer review (pair programming or intra-team code review) produces superior IT performance than using external change approval boards (CABs).
Monitor across application and infrastructure to inform business decisions
Use data from application and infrastructure monitoring tools to take action and make business decisions. This goes beyond paging people when things go wrong.
Check system health proactively
Monitor system health, using threshold and rate-of-change warnings, to enable teams to preemptively detect and mitigate problems.
Improve processes and manage work with work-in-progress (WIP) limits
The use of work-in-progress limits to manage the flow of work is well known in the Lean community. When used effectively, this drives process improvement, increases throughput, and makes constraints visible in the system.
Visualize work to monitor quality and communicate throughout the team
Visual displays, such as dashboards or internal websites, used to monitor quality and work in progress have been shown to contribute to software delivery performance.
Cultural Capabilities
Support a generative culture (as outlined by Westrum)
This measure of organizational culture is based on a typology developed by Ron Westrum, a sociologist who studied safety-critical complex systems in the domains of aviation and healthcare. Our research has found that this measure of culture is predictive of IT performance, organizational performance, and decreasing burnout. Hallmarks of this measure include good information flow, high cooperation and trust, bridging between teams, and conscious inquiry.
Encourage and support learning
Is learning, in your culture, considered essential for continued progress? Is learning thought of as a cost or an investment? This is a measure of an organization’s learning culture.
Support and facilitate collaboration among teams
This reflects how well teams, which have traditionally been siloed, interact in development, operations, and information security.
Provide resources and tools that make work meaningful
This particular measure of job satisfaction is about doing work that is challenging and meaningful, and being empowered to exercise your skills and judgment. It is also about being given the tools and resources needed to do your job well.
Support or embody transformational leadership
Transformational leadership supports and amplifies the technical and process work that is so essential in DevOps. It is comprised of five factors: vision, intellectual stimulation, inspirational communication, supportive leadership, and personal recognition.
6 - Team Workflow
6.1 - Code Review
As Wikipedia puts it, “Code review is systematic examination of computer source code. It is intended to find and fix mistakes overlooked in the initial development phase, improving both the overall quality of software and the developers’ skills.”
Recommended Practices
- Small changes allow for faster code review and enhance the feedback loops.
- Everyone on the team is capable of performing code review.
- Code reviews are the second highest priority for a team behind blocked issues and ahead of WIP.
Tips
- Automated Code Review processes like linting and static code analysis.
- Code review that there are tests that meet the acceptance criteria agreed upon by the team.
- Keep pull requests small. Look in to Work Decomposition for guidance.
- As the person being reviewed, remember the 10 Commandments of Code Review
- Thou shalt not take it personally
- Thou shalt not marry thy code
- Thou shalt consider all feedback
- Thou shalt articulate thy rationale
- Thou shalt be willing to compromise
- Thou shalt contribute to others’ code reviews
- Thou shalt treat submitters how thou would like to be treated
- Thou shalt not be intimidated by the number of comments
- Thou shalt not repeat the same mistakes
- Thou shalt embrace the nits
References
Value
- Finds issues before deployment, saving time and money.
- Increased Quality.
- Decreased Change Failure Rate.
Acceptance Criteria
- Automated checks for standards and complexity.
- Code is reviewed for testing and clarity.
- Pull requests are small and last no more than a day.
- CI tests run upon opening and modifying pull requests.
6.2 - Source Branching
Trunk-based development, is a requirement for Continuous Integration.
Recommended Practices
- All changes begin with a branch from trunk and integrate back to the trunk.
- Branches should live no longer than 24 hours. The smaller the PR, the easier it is to identify issues. The smaller the change, the less risk associated with that change.
- Pull requests reviewed by a second party are a compliance requirement.
- Trunk can always be built and deployed without breaking production.
- When needed, use techniques like the Branch by Abstraction pattern or feature flags to ensure backwards compatibility.
- All changes to trunk include all appropriate automated tests.
- Branching vs. Forking: It is important that the right process be use for the right reason. Branches are the primary flow for CI
and are critical for allowing the team to have visibility to work in progress that the team is responsible for completing. Forks
are how proposed, unplanned changes are made from outside the team to ensure quality control and to reduce confusion from
unexpected branches.
- Forks should be used for:
- Contribution from a contributor outside the team to ensure proper quality controls are followed.
- Contribution that may be abandoned and that lost work will not impact team goals.
- Branches should be for:
- All planned work done inside the team to prevent lost work due to illness or emergency.
- Forks should be used for:
Tips
- Story Slicing helps break development work into more easily consumable, testable chunks.
- You don’t have to wait to be story/feature complete as long as you have tested and won’t break production.
- Pull requests should be small and should be prioritized over starting new development.
Common Issues
Trunk based continuous integration often takes workflow adjustments on the team. The main reasons teams struggle with CI are:
- Test architecture
- Work that is too big and / or lacks proper refinement
- Issues with source code ownership (one repo owned by more than one team)
- Workflow management within the team
References
Value
As a team, the use of trunk-based development enhances our ability to deliver small, value adding, functional enhancements to trunk while also decreasing the time it takes to get feedback on our changes.
Acceptance Criteria
- All Development begins with branching from trunk.
- The artifact resulting from changes to trunk is running in production.
- Branches live an average of less than 24 hours.
- Team members hold each other accountable to the size and frequency of integrations.
- Repositories only have short-lived branches and trunk.
FAQ
- Pre-requisites for TBD
- Benefits of using TBD
- Releasing From Trunk
- Handling Infrequent Releases
- Handling Bug Fixes
- Handling Incomplete Code/Features
- Validating Release Quality
- Transitioning to Trunk Based Development
- Trunk Based Development- You’re doing it wrong
6.3 - Workflow Process
Workflow management is the process the team uses to move things from “In Progress” to “Done”, as rapidly as possible in value sequence. It’s important for minimizing WIP that the team looks at the backlog as the team’s work and does not pre-assign work to individuals.
Workflow Management Process
In order to streamline business tasks, minimize room for error, and increase overall efficiency, teams need to have the following prerequisites.
- Definition of Done
- Kanban Board, virtual or physical, with a prioritized backlog
Plan Work
Unplanned work is anything coming into the backlog that has not been committed to, or prioritized. This can include feature requests, support tickets, etc.
Common struggles teams face with unplanned work can be:
Do Work
Completed work meets the Definition of Ready when work begins, the Definition of Done when work is delivered, and can be completed in less than two days.
Process smells identified for completing work include:
- Context switching
- Ineffective demos, or lack thereof
- Multiple teams owning pieces of the process
- Status and visibility of work is unclear
- Siloed work
Improve Work
In order to plan and complete work effectively, there must be an improvement process in place. The improvement process is centered around feedback loops.
Challenges associated with the improvement process:
Measuring Your Workflow
A good measure to implement in your team’s workflow is WIP. Limiting work in progress can help reduce constraints in your workflow.
Development cycle time is a key measure of success when trying to optimize and automate your team’s workflow.
6.4 - Definition of Done
Is it DONE, is it DONE DONE, or is it DONE DONE DONE?
All teams need a Definition of Done. The Definition of Done is an agreement made between the team that a unit of work isn’t actually complete without meeting certain conditions.
Recommended Practices
We use the Definition of Done most commonly for user stories. The team and product owner must agree that the story has met all criteria for it to be considered done.
A definition of done can include anything a team cares about, but must include these criteria:
- All tests passed
- All acceptance criteria have been met
- Code reviewed by team member and merged to trunk
- Demoed to team/stakeholders as close to prod as possible
- All code associated with the acceptance criteria deployed to production
Once your team has identified all criteria that a unit of work needs to be considered done, you must hold yourself accountable to your Definition of Done.
Value
As a development team, we want to understand our team’s definition of done, so that we can ensure a unit of work is meeting the criteria acceptable for it to be delivered to our customers.
Acceptance Criteria
- Identify what your team cares about as a Definition of Done.
- Use your Definition of Done as a tool to ensure quality stories are being released into production.
- Revisit and evaluate your Definition of Done.
6.5 - Feedback Loops
The most important part of the workflow process are feedback loops, and how they affect the speed and quality of value delivery.
The ultimate goal is to deliver quality software to our customers. Instead of speculating how an end-user might consume your teams’ product, feedback loops improve your existing workflow so that you may meet your customer’s needs rapidly and with less waste.
Examples of Critical Feedback Loops
Feedback loops are as follows: You produce something, measure information on that production, and use that information to improve.
-
How well does the team understand the requirements?
Teams need to work with leadership to flush out requirements. How well those requirements are understood, can be shown by how often developers are requesting additional information, or how often the team is committing code.
-
How fast can the team detect defects?
Defects are weaknesses in the system. The systematic approach of detecting where defects are occurring, and how far downstream they are, directly affects a team’s Mean Time to Detect.
-
How effective are our tests?
Testing is one of the most effective feedback loops a team can have in place. Automated tests for example, provide feedback about your system in seconds.
-
How well does what we’re producing, match the users’ actual needs?
Understanding if we’re meeting the needs of the consumer is critical feedback.
How fast can we determine that the customer is using the feature, and is happy with it? The longer the duration between the time we’ve started work, to the time we find out information, the more expensive it is.
Tips
- Use value stream mapping to uncover feedback loops, not just bottlenecks between specific steps.
- Focus on feedback loops that involve human communication, not just system alerts.
- Not all feedback loops are positive. Amplify feedback loops that promote positive change.
Value
As a development team, we want to identify and shorten our feedback loops, so that we can analyze and optimize our workflow processes.
6.6 - Retrospectives
Retrospectives are critical for teams that are serious about continuous improvement. They allow the team an opportunity to take a moment to inspect and adapt how they work. The importance of this cannot be overstated. Entropy is always at work, so we must choose to change so that change doesn’t choose us.
Recommended Practices
Successful Retrospectives
A successful retrospective has five parts:
- Go over the mission of the team and the purpose of retrospective.
- The team owns where they are right now using Key Performance Indicators (KPIs) they’ve agreed on as a team.
- The team identifies whether experiments they are running are working or not.
- If an experiment is working, the team works to standardize the changes as part of daily work.
- If an experiment is not working, the team either adjusts the experiment based on feedback or abandons the experiment to try something else.
- Both are totally acceptable and expected results. In either case, the learnings should be shared publicly so that anyone in the organization can benefit from them.
- The team determines whether they are working towards the right goal and
whether the experiments they are working on are moving them towards it.
- If answer to either of the questions is “No.” then the team adjusts as necessary.
- Open and honest conversation about wins and opportunities throughout.
Example Retro Outline
- Go over the team’s mission statement and the purpose of retrospective (2 min)
- Go over the team’s Key Performance Indicators and make sure everyone knows where we are (5-10 min)
- Go over what experiments the team decided to run and what we expected to happen (5 minutes)
- What did we learn this week? (10-15 minutes)
- Should we modify any team documents? (2 minutes)
- What went well this week? (5-10 minutes)
- What sinks our battleship? (5-10 minutes)
- Are we working towards the right things? What are we going to try this week? How will we measure it? (10-15 minutes)
Organizing Retros
There are some important things to consider when scheduling a retrospective.
- Ensure Psychological Safety
- If the team feels like they can’t speak openly on honestly, they won’t.
- Any issues with psychological safety must be addressed before any real progress can be made.
- Make them Regular
- Agree to a time, day, frequency as a team to meet.
- Include everyone responsible for delivery
- Ideally this will include business colleagues (PO), operations, testing, and developers involved in the process.
- If there are more than 10-12 people in the meeting, your team is probably too big.
- Co-location concerns
- If the team is split across timezones, then accommodations should be made so that the team can effectively communicate.
- If the time separation is extreme (i.e. India/US), then in may be better to have each hemisphere retro separately and compare notes asynchronously.
- Schedule meetings to be inclusive of the most remote. Don’t schedule rooms with bad audio/no video if there are remote participants. Have it via a remote meeting solution (Zoom, etc.)
Tips
- Create cards on whatever board you are using to track your work for action items that come out of retrospective
- Treating team improvement as a deliverable will help the team treat them more seriously.
- Do not work on more than a few actions/experiments at a time
- If the retrospective has remote attendees, ask that everyone turn on their cameras so that the team can look everyone in the eyes.
- Outcome over output: If the format of retro isn’t helping you improve, change it or seek help on how to make it better. The teams that cancel retro are almost always the teams that need it most.
Known Impediments
“Typical” Retrospectives
Normally, a scrum-like retro involves 3 questions about the previous iteration:
- What went well?
- What could we improve?
- What are some actions we can take?
This is pretty open ended format that is very simple to go over in a training class. The challenge is the nuance of facilitating the format.
While it can be effective, what we have found is that this particular format can actually stunt the improvement of many teams when used incorrectly. And since the format is so open ended, that’s extremely easy to do.
Retrospectives that follow the above format are something that many teams struggle with. They can…
- Feel Ineffective, where the same issues crop up again and again without resolution.
- End with a million action items that never get done or tracked.
- “Improve” things that don’t actually move the needle on team productivity or happiness
- End up as a gripe session where there are no actionable improvements identified.
This is such a waste of time. I'd rather be coding...
It can be extremely frustrating to team members when it feels like retrospectives are just another meeting that they have to go to. If that ever becomes the case, that should signal a huge red flag! Something is wrong!
Psychological Safety
If the team feels like they are going to be judged, punished, or generally negatively affected by participating in retrospective, then they are going to keep their opinions to themselves. Without the safety to have their voices heard or take moderate, hypothesis driven, risk, the team will not improve as fast as they can (if at all).
However, if leadership feels like they are being disrespected, they aren’t being listened to/considered, or feel like they are going to be negatively impacted by the outcomes of the team they are more likely to restrain a team from their full potential.
It’s a delicate balancing act that takes trust, respect, and empathy from all sides to come to win-win solutions.
6.7 - Unplanned Work
Unplanned work is any interruption that prevents one from finishing something or from stopping at a better breaking point. It increases uncertainty in the system, and makes the system less predictable as a result.
There are times when unplanned work is necessary and understandable, but you should be weary of increased risk, uncertainty, and reduced predictability.
Cost of Delay
Work that has not been prioritized is work that has not been planned. When there are competing features, requests, support tickets, etc., it can be difficult to prioritize what should come first.
Most of the time, teams prioritize based on what the customer wants, what the stakeholders want, etc.
Cost of Delay makes it easier to decide priority based on value and urgency.
How much money are we costing (or saving) the organization if Feature A is delivered over Feature B?
Capacity Planning
The most common pitfall that keeps teams from delivering work is unrealistic capacity planning.
Teams that plan for 100% of their capacity are unable to fit unknowns into their cadence, whether that be unplanned work, spikes, or continuous experimentation and learning.
Planned capacity should fall between 60% and 80% of a team’s max capacity.
Tips
- Plan for unplanned work. Pay attention to the patterns that present themselves, and analyze what kind of unplanned work is making it to your team’s backlog.
- Make work visible, planned and unplanned, and categorize unplanned work based on value and urgency.
Value
As a development team, we want to understand how to plan for unplanned work, so that we can reduce risk and uncertainty for our deliverables.
6.8 - Visualizing Workflow
Making work visible to ourselves, as well as our stakeholders is imperative in our workflow management process. People are visual beings. Workflows give everyone a sense of ownership and accountability.
Make use of a Kanban board
Kanban boards help you to make work and problems visible and improve workflow efficiency.
Kanban boards are a recommended practice for all agile development methods. Kanban signals your availability to do work. When an individual pulls something from the backlog into progress, they are committing to being available to do the work the card represents.
With Kanban boards, your team knows who’s working on what, what the status of that work is, and how long that work has been in progress.
Building a Kanban Board
To make a Kanban board you need to create lanes on your board that represent your team’s workflow. Adding work in progress (WIP) limits to swim-lanes will enhance the visibility of your team’s workflow.
The team only works on cards that are in the “Ready to Start” lane and team members always pick from the top. No “Cherry Picking”.
The following is a good starting point for most teams.
- Backlog
- Ready to Start
- Development
- Ready to Review
- Blocked
- Done
Tips
Track everything:
- Stories, tasks, spikes, etc.
- Improvement items
- Training development
- Extra meetings
Work is work, and without visibility to all of the team’s work it’s impossible to identify and reduce the waste created by unexpected work.
Bring visibility to dependencies across teams, to help people anticipate what’s headed their way, and prevent delays from unknowns and invisible work.
Value
As a development team, we want to visualize our workflow, so that we may improve workflow efficiency.
Acceptance Criteria
- Use a visual board
- Show any and all work
References
Making Work Visible - Dominica DeGrandis
6.9 - Work in Progress
Why Limit WIP?
Work in Progress is defined as work that has started but is not yet finished. Limiting WIP helps teams reduce context switching, find workflow issues, and keeps teams focused on collaboration and finishing work.
How do we limit WIP?
Limiting Work in Progress is an experiment. Start with one lane on your board.
Set your WIP limit to n+2(“n” being the number of people contributing to that lane)
Continue setting WIP lower.
Once the WIP limit is reached, no more cards can enter that lane until one exits.
Capacity Utilization
There is a direct correlation between WIP and capacity utilization. Attempting to load people and resources to 100% capacity utilization creates wait times. Unpredictable events equal variability, which equals capacity overload. The more individuals and resources used, the higher the cost and risk.
In order to lessen work in progress, be aggressive in prioritization, push back when necessary, and set hard WIP limits. Select a WIP limit that is doable but challenges you to say no some of the time.
Conflicting Priorities
When we start a new task before finishing an older task, our work in progress goes up and things take longer. Business value that could have been realized sooner gets delayed because of too much WIP.
Be wary of falling back into the old habit of starting everything because of the pressure to say yes to everything.
Look at priority ways of working:
- Assigned priority
- Cost of delay
- First-in, first-out
Tips
Swarming Stories
Having more than one person work on a task at the same time avoids situations where team understanding is mostly limited to a subset of what’s being built. With multiple people involved early, there is less chance that rework will be needed later.
By having more than one developer working on a task, you are getting a real-time code review.
Story assignment
Visually distinguish important information.
- Who’s working on what?
- Has this work been in progress for too long?
- Is this work blocked from progressing?
- Have we reached our WIP limit?
Value
As a team, we want to limit our WIP, so that we may deliver the most valuable thing first.
Acceptance Criteria
- Set a WIP limit within reason and follow it shamelessly.
- Work on one thing at a time.
References
Making Work Visible - Dominica DeGrandis
7 - Work Decomposition
7.1 - Work Decomposition
In order to effectively understand and implement the work breakdown flow, the team needs to have the following prerequisites and understandings.
- Definition of Ready
- Definition of Done
- Backlog refinement cadence with the appropriate team members and stakeholders involved
Work Breakdown Process
The goal of the work breakdown process is to decompose work into small batches that can be delivered frequently, multiples times a week, in order to deliver value faster with less rework.
The general work breakdown process involves:
It is important that the team keep these tips in mind when decomposing work:
- Known poor quality should not flow downstream. This includes acceptance criteria that require interpretation. If the acceptance criteria cannot be understood by the whole team then we are developing defects, not value.
- Refining work requires significant brainpower and is the primary quality process. Meetings should be planned around this. Hold them when people are mentally alert and time box them to prevent mental fatigue.
- Good acceptance criteria come from good communication. Avoid the following anti-patterns:
- Someone outside the team writes acceptance criteria and hands it to the team. Since the team was not involved with the conversation, there’s no chance to uncover assumptions and the team has less investment in the outcomes.
- One person on the team writes acceptance criteria. Same problem is above.
- Each team member is assigned work based on their expertise. This removes communication and also ensures that people are only focused on understanding their tasks. Again, the team as a whole isn’t invested in the outcomes. This typically results in finger pointing when something fails. Also, if someone is unavailable, the rest of the team lacks context to pick it up.
- Refining should be focused on outcomes, not volume. If we have a 1 hour meeting and 10 stories to refine, it’s better to have one fully refined story we can work on than 10 partially refined stories that we’ll “figure out during development”. Stop refining a story when we agree on the acceptance criteria or agree it’s blocked and needs more information. Only then should we move to the next story. Stop the meeting at the scheduled time.
Intake/Product Ideas
Ideas become epics with defined outcomes, clear goals and value. Epics become a list of features.
Common struggles for teams when breaking down ideas into epics and features:
Refining Epics/Features into Stories
Stories are observable changes that have clear acceptance criteria, and can be completed in less than two days. Stories are made up of one or more tasks.
Typic al problems teams experience with decomposition are:
- Stories are too big
- Stories are too complex
- Stories lack testable acceptance criteria
- Lack of dependency knowledge
- Managing research tasks
Refining Stories into Development Tasks
Tasks are independently deployable changes that can be merged to trunk daily.
Breaking stories down into tasks gives teams the ability to swarm work and deliver value faster.
In order for teams to visualize tasks required to implement scenarios, they need to understand what a good task looks like.
Measuring Success
Tracking the team’s Development Cycle Time is the best way to judge improvements to decomposition. Stories should take 1-2 days to deliver and should not have rework, delays waiting for explanations, or dependencies on other stories or teams.
7.2 - Task Decomposition
What does a good task look like?
A development task is the smallest independently deployable change to implement acceptance criteria.
Recommended Practices
Create tasks that are meaningful and take less than two days to complete.
Given I have data available for Integration Frequency
Then score entry for Integration Frequency will be updated for teams
Task: Create Integration Frequency Feature Flag.
Task: Add Integration Frequency as Score Entry.
Task: Update Score Entry for Integration Frequency.
Use Definition of Done as your checklist for completing a development task.
Tips
- If a task includes integration to another dependency, add a simple contract mock to the task so that parallel development of the consumer and provider will result in minimal integration issues.
- Decomposing stories into tasks allows teams to swarm stories and deliver value faster
7.3 - Behavior Driven Development
Behavior Driven Development is the collaborative process where we discuss the intent and behaviors of a feature and document the understanding in a declarative, testable way. These testable acceptance criteria should be the Definition of Done for a user story. BDD is not a technology or automated tool. BDD is the process for defining the behavior. We can then write automated tests for those behaviors.
Example:
Feature: I need to smite a rabbit so that I can find the Holy Grail
Scenario: Use the Holy Hand Grenade of Antioch
Given I have the Holy Hand Grenade of Antioch
When I pull the pin
And I count to 3
But I do not count to 5
And I lob it towards my foe
And the foe is naughty in my sight
Then my foe should snuff it
Recommended Practices
Gherkin is the domain specific language that allows acceptance criteria to be expressed in “Arrange, Act, Assert” in a way that is understandable to all stakeholders. Example:
Feature: As an hourly associate I want to be able to log my arrival time so that I can be
paid correctly.
Scenario: Clocking in
Given I am not clocked in
When I enter my associate number
Then my arrival time will be logged
And I will be notified of the time
Scenario: Clocking out
Given I am clocked in
When I enter my associate number
And I have been clocked in for more than 5 minutes
Then I will be clocked out
And I will be notified of the time
Scenario: Clocking out too little time
Given I am clocked in
When I enter my associate number
And I have been clocked in for less than 5 minutes
Then I will receive an error
Using Acceptance Criteria to Negotiate and Split
With the above criteria, it may be acceptable to remove the time validation and accelerate delivery of the time logging ability. After delivery, the validation may not be required. If true, we’ve saved money and time by NOT delivering unneeded features. First, we deliver the ability to clock in and see if we really do need the ability to verify.
Feature: As an hourly associate I want to be able to log my arrival time so that I can be
paid correctly.
Scenario: Clocking in
Given I am not clocked in
When I enter my associate number
Then my arrival time will be logged
And I will be notified of the time
Scenario: Clocking out
Given I am clocked in
When I enter my associate number
And I have been clocked in for more than 5 minutes
Then I will be clocked out
And I will be notified of the time
If, in production, we discover that the sanity check is required to prevent time clock issues, we can quickly add that behavior.
Feature: As an hourly associate I want to be prevented from clocking out immediately after
clocking in.
Scenario: Clocking out more than 5 minutes after arrival
Given I am clocked in
And I have been clocked in for more than 5 minutes
When I enter my associate number
Then I will be clocked out
And I will be notified of the time
Scenario: Clocking out less than 5 minutes after arrival
Given I am clocked in
And I have been clocked in for less than 5 minutes
When I enter my associate number
Then I will receive an error
Tips
- Scenarios should be written from the point of view of the consumer. If the consumer; either a user, UI, or another service.
- Scenarios should be focused on a specific function and should not attempt to describe multiple behaviors.
- If a story has more than 6 acceptance criteria, it can probably be split.
- No acceptance test should contain more than 10 conditions. In fact, much less is recommended.
- Acceptance tests can be used to describe a full end to end user experience. They are also recommended for describing the behavior of a single component in the flow of the overall behavior.
References
- Gherkin Reference
- BDD Primer - Liz Keogh
- Better Executable Specifications - Dave Farley
- A Real world Example of BDD - Dave Farley
- ATDD - How to Guide - Dave Farley
7.4 - Complexity and Estimation
When refining work, teams should focus on reducing complexity, minimizing dependencies, and estimating based on complexity and effort, not time.
Small things can be estimated more accurately than big things because the margin of error is lower and dependencies are clear. Eliminating or reducing hard dependencies is critical because the probability that something will be delivered late doubles for every hard dependency. Those could include database changes, coordination with other teams, or changes that are tightly coupled with another component.
Recommended Practices
Decompose stories using Behavior Driven Development. This not only helps with feature discovery and with uncovering dependencies, but also aids with story slicing since each acceptance test is naturally a thin, vertical slice.
Prior to refining, use relative sizing to project order of magnitude estimates for delivery. However, these should not be used for commitments. Committing to unrefined deliveries increases team allocation with re-work and “Date Driven Development”, reduces quality, and lowers end user satisfaction.
To avoid hard dependencies, first slice stories as small as possible to minimize the number of possible dependencies. After that, attempt to make any hard dependencies “soft” with feature flags, API versioning, or other coding solutions.
Tips
- Use Cynefin to aid in estimating complexity.
- If the team does not agree with the estimate, refine further. Avoid “averaging” the team’s estimate.
- Track estimates against actual to see how consistent the team is.
7.5 - Contract Driven Development
Contract Driven Development is the process of defining the contract changes between two dependencies during design and prior to construction. This allows the provider and consumer to work out how components should interact so that mocks and fakes can be created that allow the components to be developed and delivered asynchronously.
Recommended Practices
For services, define the expected behavior changes for the affected verbs along with the payload. These should be expressed as contract tests, the unit test of an API, that both provider and consumer can use to validate the integration independently.
For more complicated interaction that require something more than simple canned responses, a common repository that represents a fake of the new service or tools like Mountebank or WireMock can be used to virtualize more complex behavior. It’s important that both components are testing the same behaviors.
Contract tests should follow Postel’s Law:
"Be conservative in what you do, be liberal in what you accept from others"
.
Tips
- For internal services, define the payload and responses in the developer task along with the expected functional test for that change.
- For external services, use one of the open source tools that allow recording and replaying responses.
- Always create contract tests before implementation of behavior.
7.6 - Defining Product Goals
Product Goals
Product goals are a way to turn your vision for your product into easy to understand objectives that can be measured and achieved in a certain amount of time.
Increased transparency into product metrics
Measurable Outcome: Increased traffic to product page
When generating product goals, you need to understand what problem you are solving, who you are solving it for, and how you measure that you achieved the goals.
Initiatives
Product goals can be broken down into initiatives, that when accomplished, deliver against the product strategy.
Provide one view for all product KPIs.
Ensure products have appropriate metrics associated with them.
Initiatives can then be broken down into epics, stories, tasks, etc. among product teams, with high-level requirements associated.
Epics
An epic is a complete business feature with outcomes defined before stories are written. Epics should never be open ended buckets of work.
I want to be able to review the CI metrics trends of teams who have completed a
DevOps Dojo engagement.
Tips
- Product goals need a description and key results needed to achieve them.
- Initiatives need enough information to help the team understand the expected value, the requirements, measure of success, and the time frame associated to completion.
7.7 - Definition of Ready
Is it REALLY Ready?
A Definition of Ready is a set of criteria decided by the team that defines when work is ready to begin. The goal of the Definition of Ready to help the team decide on the level of uncertainty that they are comfortable with taking on with respect to their work. Without that guidance, any work is fair game. That is a recipe for confusion and disaster.
Recommended Practices
When deciding on a Definition of Ready, there are certain minimum criteria that should always be there. These are:
- Description of the value the work provides (Why do we want to do this?)
- Testable Acceptance Criteria (When do we know we’ve done what we need to?)
- The team has reviewed and agreed the work is ready (Has the team seen it?)
However, the context of a team can make many other criteria applicable. Other criteria could include:
- Wireframes for new UI components
- Contracts for APIs/services we depend on
- All relevant test types identified for subtasks
- Team estimate of the size of the story is no more than 2 days
The Definition of Ready is a living document that should evolve over time as the team works to make their delivery system more predictable. The most important thing is to actually enforce the Definition of Ready. If it’s not enforced, it’s completely useless.
- If any work in “Ready to Start” does not meet the Definition of Ready, move it back to the Backlog until it is refined.
- Any work that is planned for a sprint/iteration must meet the Definition of Ready. Do not accept work that isn’t ready!
- If work needs to be expedited, it needs to go through the same process. (Unless there is immediate production impact, of course)
Tips
- Using Behavior Driven Development is one of the best ways to define testable acceptance criteria.
- Definition of Ready is also useful for support tickets or other types of work that the team can be responsible for. It’s not just for development work!
- It’s up to everyone on the team, including the Product Owner, to make sure that non-ready work is refined appropriately.
- The recommended DoR for CD is that any story can be completed, either by the team or a single developer, in 2 days or less
7.8 - From Program to User Story
Aligning priorities across multi-team products can be challenging. However, the process used at the team level to decompose work functions just as well at the program level.
Program Roadmap
In order to have an effective work breakdown process, goals and priorities need to be established and understood.

Stakeholders and leadership teams must define the high-level initiatives, and their priorities, so that work may be dispersed among product teams.
Leadership teams can be made up of a core group of product owners.
Product Roadmap
The program roadmap should breakdown into the product roadmap, which includes the prioritized list of epics for each product.

The leadership team should define the product vision, roadmap, and dependencies for each product.
Team Backlog
The team backlog should be comprised of the prioritized epics from the product roadmap.

The core group needed to effectively break down high level requirements so that the team may decompose work includes product owners, tech leads, and project managers.
Product teams should use the processes effective for Work Decomposition, to breakdown epics into smaller epics, stories, and tasks.
7.9 - Spikes
Spikes are an exploration of potential solutions for work or research items that cannot be estimated. They should be time-boxed in short increments (2-3 days).
Recommended Practices
Since all work has some amount of uncertainty and risk, spikes should be used infrequently when the team has no idea on how to proceed with a work item. They should result in information that can be used to better refine work into something valuable, for some iteration in the future.
Spikes should follow a Definition of Done, with acceptance criteria, that can be demoed at the end of its timebox.
A spike should have a definite timebox, usually within 1-3 days. At the end of this timebox, the team should be able to decide how, when, and even if the work can be considered for upcoming iterations.
Tips
- Use spikes sparingly, only when high uncertainty exists.
- Spikes should be focused on discovery and experimentation.
- Stay within the parameters of the spike. Anything else is considered a waste.
7.10 - Story Slicing
Story slicing is the activity of taking large stories and splitting them into smaller, more predictable deliveries. This allows the team to deliver higher priority changes more rapidly instead of tying those changes to others that may be of lower relative value.
Recommended Practices
Stories should be sliced vertically. That is, the story should be aligned such that it fulfills a consumer request without requiring another story being deployed. After slicing, they should still meet the INVEST principle.
Example stories:
As an hourly associate I want to be able to log my arrival time so that I can be
paid correctly.
As a consumer of item data, I want to retrieve item information by color so that
I can find all red items.
Stories should not be sliced along tech stack layer or by activity. If you need to deploy a UI story and a service story to implement a new behavior, you have sliced horizontally.
Do not slice by tech stack layer
- UI “story”
- Service “story”
- Database “story”
Do not slice by activity
- Coding “story”
- Review “story”
- Testing “story”
Tips
-
If you’re unsure if a story can be sliced thinner, look at the acceptance tests from the BDD activity and see if it makes sense to defer some of the tests to a later release.
-
While stories should be sliced vertically, it’s quite possible that multiple developers can work the story with each developer picking up a task that represents a layer of the slice.
-
Minimize hard dependencies in a story. The odds of delivering on time for any activity are
1 in 2^n
wheren
is the number of hard dependencies.
8 - Cloud Native Checklist
Cloud Native checklist
Capability | Yes / No |
---|---|
Domain Context diagram current with dependencies shown | |
Exception logging | |
Logs stream or self-purge | |
Dynamically configurable log levels | |
Database connections self-heal | |
Dependency connections self-heal | |
Service auto-restarts on failure | |
Automated resource and performance monitoring | |
Have NFRs & SLAs defined for each service | |
Automated alerting for SLAs and NFRs | |
No manual install steps | |
Utilize Correlation ID | |
Load balanced | |
Automated smoke tests after each deployment | |
Heartbeat responds in less than 1 minute after startup | |
No start-up ordering required | |
Minimal critical dependencies | |
Graceful degradation for non-critical dependencies | |
Circuit breakers and request throttles in place |
Principles and Practices
While practices may change over time, principles are expected to be less volatile.
Small, autonomous, highly-cohesive services
- Prefer event-driven, asynchronous communications between services.
- Prefer eventual consistency / replication of select data elements over shared data structures.
- Be cautious about creating shared binary dependencies across services.
- Services are able to be checked out and run locally using embedded DBs, and/or mocked endpoint dependencies as necessary.
Hypermedia-driven service interactions
- Model resources on the domain.
- Use embedded links to drive resource state transitions.
- HATEOAS Reference
Modeled around business concepts
- Produce a system context diagram to understand your system boundaries. Consider following c4 architecture diagramming techniques.
- Follow Domain Driven Design practices to understand your domain early in development, and model your domain in your code.
- Use bounded contexts to isolate service boundaries and converse with canonical-model-based systems.
Hide internal implementation details
- Model bounded contexts
- Use packaging to scope components.
- Services own their data & hide their databases.
- No database-driven integration.
- Technology-agnostic APIs (ReST).
Decentralize everything
- Self-service whenever possible.
- Teams own their services (but also consider internal open source practices).
- Align teams to the organization.
- Prefer choreography over orchestration.
- Dumb middleware, smart endpoints.
- Deployable to cloud and local (DC/store) environments
Deploy independently
- Coexist versioned endpoints.
- Prefer targeted releases of individual services over habitual mass-installs of several services at once.
- Avoid tightly bound client/server stub generation.
- One service per host.
- Blue/green release testing techniques.
- Consumer-driven upgrade decisions.
Isolate failure
- Don’t treat remote calls like local calls.
- Set timeouts appropriately (consider TCP connect and read timeouts in the 90ish-percentiles)
- Apply bulk-heading & circuit breaker patterns to limit fallout of failure.
- Understand and design for what should happen during network partitioning (network failures)
- Use redundancy & load balancing
Highly observable
- Monitored endpoints.
- Use synthetic transactions to simulate real user behavior.
- Aggregate logs and statistics.
- Use correlation IDs to trace calls throughout the system.
Culture of automation
- Automated developer driven testing: unit, functional, contract, integration, performance, & etc.
- Deploy the same way everywhere.
- Implement continuous delivery practices.
- Trunk based development over branching by feature/team/release to promote continuous integration practices.
- In the face of a lack of automation/provisioning/monitoring, prefer a properly structured monolith over many segregated smaller services.
References
9 - Value Stream Mapping
The purpose of Value Stream Mapping Workshop is to understand all of the steps needed to deliver value from conception to production. We can then use it as a tool to identify constraints and propose improvements to the value stream.
Prerequisites
-
Everyone who has a touch point in the value stream should be present for the exercise. This includes, but is not limited to developers, managers, product owners, and representatives from external teams that have required steps between conception and production.
-
Understand terms associated with value stream mapping.
- Wait time/non-value time: Time between processes where activity is not occurring.
- Process time/value add time: Time spent executing a step in the value stream.
- Percent Complete/Accurate: Percentage of work that is not rejected by the next step in the process. i.e. If code fails code review 20% of the time, the %C/A is 80%.
Recommended Practices
When value stream mapping your team, start from production and move backwards each step. You are less likely to miss steps in the process.
Identify source of request
Example Team Demo
For each source of Requests
- What is the average process time for this step?
- Who is involved in this step?
- What percentage of work is rejected by the next step in the process?
Your team will need to identify these things for each step in the process. Don’t forget to identify where your intake process is originated, whether that be stakeholder conversations, service desk, etc.
Identify Rework Loops
After your team has completed the initial value stream map, they have most likely identified a few rework loops. Rework loops are interruptions in the value stream where steps have to be corrected.
In this example, the team had to fix code review comments 10% of the time before it could be reviewed and merged to master.
Identify Wait Time
Once your team has completed the above steps, you will go back through the value stream to identify the wait time between each step in the process. Make sure to take your cadence into account when calculating.
Add your total process time/wait time to get an average lead time. Understand that the value stream is an estimate/average based on your teams feedback.
Outcomes
- Process time/wait time of your flow.
- Visual representation of the value stream(s) of the team.
- Possible constraints to your flow based on process time/wait time, rework loops, and percent complete/accurate. You can present these on your VSM as kaizen bursts.

Tips
- Review and maintain value stream map to show wins associated to your team’s improvement plan.
- Take into account all potential flows for team processes, and value stream those as well.
Value
As a team, we want to understand how to value stream map our team processes, so that we may understand our constraints to delivery and identify ways to improve.
Acceptance Criteria
- Value stream all things associated to delivering value.
- Create action items of improvement from exercise.