There are so many risks when we make changes to software systems. A defect can cause a minor frustration, cause a loss of 440 million USD in 45 minutes, and put Knight Capital Group out of business or worse. Recently we’ve seen software defects that resulted in human deaths, whether it is the Therac-25 incident where patients were given massive overdoses of radiation or a defective flight control system on the 737 Max that contributed to the deaths of 346 passengers. While these are extreme examples, we all have been personally impacted by a software glitch. Why is changing software so risky?
Traditionally risks in software have been mitigated with manual human-based controls. These controls can take the shape of change requests, documentation, checklists, approvals, and code freezes. This class of controls is inadequate. These controls do not scale at the same rate as the complex system they are designed to govern. The rate of complexity in our systems outpaces the ability of humans to understand, reason, and determine the risks involved with any changes to the system. We need more scalable solutions to reduce these risks. The capability to make risk-free changes is critical for companies that want to modernize their information technology (IT) or digitally transform their customer experience.
The core foundation of any digital transformation or IT modernization is to have the capability to deliver quality software. At Lean TECHniques we promote the engineering practices of Extreme Programming (XP), not because we are dogmatic, but because we haven’t found a better way.
Over the years these techniques have been misinterpreted and controversial while at the same time becoming generally accepted. Today it is difficult to find anyone suggesting that we should have zero automated tests. Prescribing test driven development (TDD) by creating those tests before developing the code will start a passionate debate.
Using examples and scenarios to define the requirements for a feature is also generally accepted in the software industry. However, the suggestion that those same examples and scenarios should be used as automated specifications (aka tests) may cause a revolt from either developers, quality assurance teams, or both.
Everyone has had different experiences with these techniques—some good and some not so good. So when working with a team, we’ve found it’s more important to start with our desired outcome framed as a challenge. We don’t tell a team we are going to use a specific technique; we simply ask, “How are we going to make this change in a risk-free way?”
This simple question focuses the team on the goal and their energy on solving the problem. They don’t get defensive or share a story about the time TDD didn’t work. There is nothing to get defensive about. The team wants to deliver quality products that delight their customers.
Every team surprises me. They don’t argue about a specific practice. They accept the purpose of the practice to make progress towards the goal. Their solutions are iterative. They know they want a complete automated test suite, but they may settle on automated tests around a new feature or the most complex feature that is at risk of breaking. The team will blend manual procedures with automated tasks until they have time to script away the boring work.
In rare cases, the team may be paralyzed by the term “risk-free”. They are concerned about being perfect, not better. When this occurs, we simply introduce a constraint. It won’t be risk-free, but it will reduce risk within the scope of the constraint. Here’s an example that introduces time, audience, and feature constraints: How can we reduce the risk of change in the price calculator (feature) for people in Iowa (audience) over the next 2 days (time)?
Once the team has accepted the challenge of creating changes in a risk-free way, the job of the leader is to continue the momentum and keep continuous improvement moving forward. This can be done in a number of ways. A technique that has proven effective is to reserve a portion of the team’s time towards improving their work. Generally this is 10-20% of their time. In theory this makes sense; however, this block of time will be the first thing that is sacrificed when the team encounters time pressure. To combat this behavior we will have the team’s leader ask this question to kick off every demo: “What improvements did the team make with the 10-20% of time we invested in improving the work?” As the first question asked of the team, it sends a clear signal that the most important work the team is doing is to improve their work.
Helping a team achieve the capability of making risk-free changes can be started by:
Framing the outcome as a challenge
Creating the environment that supports time to find a solution to the challenge
Ultimately, there will always be risk whenever modifications are made to a system. Our goal is to create incentives, habits, and the environment that support the ideal state of reaching the outcome for making risk-free changes in software.