The Economic Efficiency Of Small Batch Software Testing

Fallback HeadshotYuval Yeret
Reading time: minutes

Yiddish proverb: A fool should not be shown an unfinished job.

Bug line chart

Editor’s note: Guest author Yuval Yeret is a senior enterprise agility coach and head of the US office for AgileSparks, an international lean agile consulting company with offices in Boston, Israel and India.

In software development, the mindset of the Yiddish proverb is pervasive. Why waste time running premature test cycles when the product will change before it is finished and the tests will have to be rerun? Let’s just finish developing the functionality and then we’ll test it.

This philosophy may drive product development teams towards “late testing” where testing is performed towards the end of the cycle. Teams are trying to take an economic view towards product development. They are trying to minimize their product development costs while maximizing the value they deliver. To them, any wasted investment in testing takes away from investment unto the value delivered.

Taking an economic view is the right approach, but we’re missing a major part of the formula here.

Let’s take a step back. What these teams are saying is, “we incur overhead every time we test so let’s minimize that overhead by testing less often. Since we have to test at the end anyway, let’s just defer testing to that point.”

This is called “transaction cost.” Product developers pay this cost every time they go through the process. Running a testing cycle includes a transaction cost element that consists of the deployment of the test candidate on a software build, populating test data and running through regression testing. The transaction cost includes the cost of testing new functionality.

The transaction cost element of the cost is fixed while the new functionality testing cost is variable and dependent on the amount of new functionality in this test candidate compared to the last one tested. The amount of functionality we wait for before we run a test cycle is called “batch size.”

Here is how batch size works in regards to testing:

A batch size of one means we run a test cycle for every small set of functionality added. A batch size of a day would mean running a test cycle every day. A batch size of a month would mean running a test cycle every month. A batch size of a release would mean running a test cycle just before every release. If we look at just the transaction costs and new functionality testing costs, it is clear that it is preferable to use a big batch size since that minimizes the overall costs.

economic batch

Dealing With One Zombie At A Time

If we’re missing a major part of the formula here, what is it?

Let me share a story from my days as a development team lead. I was leading a team working on some complex Linux file system layer that was straddling the divide between user space and kernel space. We had a simulator that we could use to test some of our code. But other aspects required the real appliance in the lab for testing. Other aspects required running the appliances over a simulated wide area network (WAN).

We had an agreement that we don’t check in code without rebasing with the main branch, running the simulator and fixing all issues. That meant that despite the fact that we were developing complex code, fixing most of our defects was pretty easy since we could pinpoint them quickly.

Like dealing with one zombie at a time. No big deal, right?

Our problem was with the defects that the simulator could not find. Those defects lurked in the darkness, waiting for the test cycle to arrive. They were joined by more and more friends as we checked in more and more code. Eventually, when the test cycle arrived, they all came running out of the dark like a horde of zombies that threatened to overrun us.

test chart

Why? For one, we moved on to other things. The longer it took until the defect surfaced the more we forgot about the context, our rationale, the harder it was for us to understand what’s going on and why we built the system in a certain way. This made it harder to figure out what’s the source of the defect as well as figure out the right solution that did not break anything else.

Another factor is that the more check-ins and changes that transpired between the last healthy test cycle and the new one, the harder it was to figure out what change this defect relates to. And without this information it is much harder to figure out the problem.

The 1-10-100 Rule

This chart from Steve McConnell’s book Code Complete calls it the “cost to repair defects.” The quality assurance ecosystem sometime refers to it as the 1-10-100 rule.

What we are missing in the formula is the cost of zombies defects lurking in the dark. Or, as we more professionally call it, the “holding cost” or “cost of delayed feedback.”

batch size chart

If we look at both the transaction cost as well as the holding cost/cost of delay you can understand why testing late might not actually be the best economic decision. Even if you have high transaction costs every time you test, it makes sense to start testing in smaller batch sizes.

The chart also explains why, in many cases, moving to small batch testing does not make sense for organizations that do not have the right capabilities and competence. For these organizations the question is now, “how could we reduce our testing transaction costs in order to enable us to test even earlier and reduce the overall product development cost even further?”

Now you might understand why_ investing in continuous integration and test automation is so critical to the economics of product development_. Whatever you can do to achieve automation for as much of your testing and reduce the cost per test cycle will enable you to achieve earlier feedback and better overall economic outcomes in the form of cheaper-to-fix defects. The effect is higher product quality and happier customers.

For test types you cannot automate (like exploratory, user experience etc.) you need to look at whatever ways you can find to enable you to run frequent and efficient test cycles. To the point that progression and most high-risk regression testing happens within hours/days after development.

Shift Testing To The Left Of The Product Development Life Cycle

If we look at the product development value stream as moving from left to right what we want is to shift testing activities from their legacy position on the far right to the left as much as possible.

linear test movement

At a minimum, the shift left means running as much test design and execution as close to coding/development activities as possible. This is what talk about in agile development and—specifically—agile testing practices employed by teams using Scrum, Kanban, XP or other agile approaches. But even if you do not follow a formal agile development approach, making sure that testing happens as early and as frequently as possible is a best practice that helps teams achieve quality and velocity.

You might also be interested in: