Breaking The Quality–Speed
Compromise
In
the hotly contested commodity business of assembling computers, Dell
enjoys a 50% cost advantage over its competitors.[1]
This commanding advantage comes from Dell’s exceptional responsiveness
to customers, flawless operations, and remarkable speed of execution.
Conventional wisdom once held that the low cost producer could not
provide customized high quality products. But Dell decided that its
customers could have it all – low cost computers with the latest
technology, built to order and delivered in a week.
In the book
Hardball, George Stalk notes that when an industry imposes a compromise
on its customers, the company that breaks the compromise stands to gain
a significant competitive advantage.[2]
For example, the airline industry imposes a big compromise on
travelers: if you want low cost tickets, you have to make your plans
early and pay a stiff penalty to change them. Southwest Airlines breaks
this compromise: its customers can apply the cost of unused tickets to a
future flight without a change fee.
In the software
development industry, we impose many compromises on our customers. We
tell them that high quality software takes a lot of time; we ask them to
decide exactly what they want when they don’t really know; we make it
clear that changes late in the development process will be very
expensive. There’s a significant competitive advantage waiting for
companies that can break these compromises. In particular, I’d like to
focus on breaking the compromise between quality and speed, because many
companies have achieved great leverage by competing on the basis of
time.
When I teach classes
on Lean Software Development, the first thing we do is draw value stream
maps of existing software development processes. Starting with a
customer request, the class draws each step that the request goes
through as it is turned into deployed software which solves the
customer’s problem. The average time for each step is noted, as well as
the time between steps, giving a picture of the total time it takes to
respond to a customer.
Next the class
determines how much of the time between request and deployment is spent
actually working on the problem. Typically, less than 20% of the total
time is spent doing work on the request; for 80+% of the time the
request is waiting in some queue. For starters, driving down this queue-time will let us deliver software much faster without compromising
quality.
But reducing wait
time is not the only opportunity for faster software development.
Typically the value stream maps in my classes show a big delay just
before deployment, at a step which is usually called ‘verification’.
Now, I don’t have any problem with verification just before deployment,
but when I ask, “Do you find any defects during verification?” the
answer is always “Yes.” Therein lies the problem. When a computer hits
the end of Dell’s assembly line, it is powered on and it is expected to
work. The verification step is not the time to find defects; by the
time software hits verification, it should work.
The way to get rid
of the big delay at verification is to move testing closer to coding –
much closer. In fact, testing should happen immediately upon coding; if
possible the test should have been written before the code. New code
should be integrated into the overall system several times a day, with a
suite of automated unit tests run each time. Acceptance tests for a
feature should pass as soon as the feature is complete, and regression
testing should be run on the integrated code daily or perhaps weekly.
Of course, this
testing regime is not feasible with manual testing, automated unit and
acceptance tests are required. While this may have been impractical a
few years ago, the tools exist today to make automated testing
practical. Obviously not all tests can be automated and not all
automated test suites are fast enough to run frequently. But there are
many ways to make automated testing more effective; for example, each
layer is usually tested separately – ie. the business rules are tested
below the GUI with most database calls mocked out.
In most of the value
stream maps I see in my classes, there is a huge opportunity to move
tests far forward in the process and catch defects at their source.
Many companies spend a great deal of time tracking, prioritizing, and
fixing a long queue of defects. Far better to never let a defect into
the queue in the first place.
There is another
area of my classes’ value stream maps that raises a flag. Toward the
beginning of the map there is usually a step called ‘requirements’ which
often interacts with a queue of change requests. Dealing with change
requests takes a lot of time and approved changes create significant
churn. There has been a feeling that if only we could get the
requirements right, this ‘change churn’ would go away. But I generally
find that the real problem is that the requirements were specified too
early, when it was not really clear what was needed. The way to reduce
requirements churn is to delay the detailed clarification of
requirements, moving this step much closer to coding. This greatly
reduces the change request queue, because you don’t need to change a
decision that has not yet been made!
Toward the end of my
classes, we draw a future value stream map, and invariably the new value
stream maps show a dramatically shortened cycle time, the result of
eliminating wait time, moving tests forward, and delaying detailed
specification of requirements. We usually end up with a process in
which cross-functional teams produce small, integrated, tested,
deployment-ready packages of software at a regular cadence.
This kind of
software development process exposes another compromise: conventional
wisdom says that changes late in the development cycle are costly. If
we are developing small bits of code without full knowledge of
everything that the system will require, then we are going to have to be
able to add new features late in the development process at about the
same cost as incorporating them earlier.
The cost of adding
or changing features depends on three things: the size of the change,
the number of dependencies in the code, and whether or not the change is
structural. Since we just agreed to keep development chunks small,
let’s also agree to keep changes small. Then let’s agree that we are
going to get the structural stuff right – including proper layering,
modularization that fits the domain, appropriate scalability, etc.
We are left to
conclude that the cost of non-structural change depends on the
complexity of the code. There are several measurements of complexity,
including the number of repetitions (the target is zero), the use of
patterns (which reduce complexity), and McCabe scores (the number of
decisions in a module). It has been shown that code with low complexity
scores has the fewest defects, so a good measure of complexity is the
number of defects.
Which brings us back
to our testing regime. The most important thing we can do to break the
compromises we impose on customers is to move testing forward and put it
in-line with (or prior to) coding. Build suites of automated unit and
acceptance tests, integrate code frequently, run the tests as often as
possible. In other words, find and fix the defects before they even
count as defects.
Companies that
respond to customers a lot faster than their industry average can expect
to grow three times faster and enjoy twice the profits of their
competitors.[3]
So there is a lot of competitive advantage available for the software
development organization that can break the speed–quality
compromise, and compete on the basis of time.