How to Assess Testing Resource Allocation
When I come in as a consultant to look at how software testing is done, it's common to start with a conversation about strategy — in other words, the major pieces of the test effort, and how they should ideally fit together to reduce risk.
One of the worst responses we hear is, "Well, this is how we do it," without being able to explain why. On one project, the test staff could not even explain what they were doing — ask me about that story another day.
You might expect a test strategy document to better explain the organization’s test effort. Sadly, however, I find these documents are fairly useless; most of them are just a list of ideas. Test strategy documents might include a mix of types of tests and test strategies, such as:
unit tests
performance tests
test features as they are created
regression tests at the GUI and API levels
continuous integration
crowdtesting
browser compatibility
security
accessibility
localization.
They look like a big list with very little clarity, detail or emphasis on a particular area.
Now imagine that your company possesses 100 points of "test effort" to spend on these items listed above. How much do you spend on each, and why? For that matter, imagine that every project has a percentage of budget to spend on testing. You'd expect some critical projects might receive a higher percentage of budget relative to the amount of features covered. Yet most companies do not allocate resources in this way. Instead of "testers" who have an integrated vision, there are people doing testing activities. So people with a role involving testing take the things out of the strategy document they know how to do, or the things they did in the last sprint, and do some of them — maybe.
We used to ask "How much testing is enough?" and shrug, because testing ended when we hit a key date and had no known show-stopper defects. Today, we could ask "How much of each of these testing types is enough?" To take it a step further, we could ask the organization, "What is the value produced by investing this much in each of these types of testing? Should we move the slider to invest more or less?"
If you can answer those questions, you'll be much closer to explaining testing as an investment, rather than a cost.
Assessing the value of test effort
In the past, the classic measurement of test effort was the developer/tester ratio. My old mentor, Dr. Cem Kaner, co-wrote a classic paper on this, presenting it to the Pacific Northwest Quality Conference nearly 20 years ago. At that time, build systems were standard, while continuous integration and automated unit testing were just entering the early adopter phase. Kaner, along with co-authors Elisabeth Hendrickson and Jennifer Smith-Brock, concluded that such measurement was a mixed bag — after all, any individual person, be it a developer or tester, can do a lazy or poor job. They suggested staffing projects according to risk, with high-risk and larger projects earning more resources. Here's how they suggested determining risk:
The level of risk is affected by such factors as the technical difficulty of the programming task, the skills of the programmers, the expectations of the customers, and the types of harm that errors might cause. The more risk, the more thoroughly you'll have to test, and the more times you'll probably have to retest.
To do more testing on a project because it is riskier, we must first determine a baseline, a standard amount of work to perform. Here's an exercise to help with that.
First, lay out the techniques the team uses to reduce risk. Use a survey if necessary. Without anything else to measure, you can use the 100 points of effort I suggested above. The spreadsheet could look something like this:
What | Points we should spend | Points we do spend |
Unit testing | ||
Feature testing | ||
Human regression | ||
Automated regression (GUI) | ||
Automated regression (API) | ||
Performance | ||
Crowdtesting | ||
Accessibility | ||
Localization | ||
Continuous integration | ||
Blue/green rollouts | ||
User acceptance testing | ||
Usability | ||
Platform engineering (environments and data) | ||
Device compatibility | ||
Hardware testing |
Make a single template, copy it into an online spreadsheet tool, and invite team members — anyone from QA — to populate it. The results should yield a few interesting metrics:
the average results for each category;
the mean results (middle) results for each category;
the standard deviation within a category;
the extreme results for each category.
The metrics above provide a window into how the team assesses its current and potential testing resource allocation. There is no right answer for this exercise. Points have no meaning; they are an imaginary concept used to compare relative effort.
That said, there can be wrong answers. A wide variety of results that map to a job description would indicate the various roles do not really understand what their peers in other jobs are doing, or why.
The discovery process
These two ideas help advance a real, useful test strategy. First, consider the level of risk on projects and staff them accordingly. The total risk is the percentage of danger multiplied by the cost if something goes wrong. Even if the risk is low, consider the potential reward of how you staff a project, or the dollars that flow through the system.
The easiest way to look at the potential risks is to look at what went wrong on other projects. Issues on other projects generally fall into three broad categories:
defects, especially show-stopping bugs;
higher-level strategic problems like technology failure, marketplace failure or late project delivery;
regulatory compliance and conformance challenges.
If other projects are late, odds are this one is late too. We can mitigate lateness by structuring the work in phases, so late projects can still ship something. Likewise, we can reduce the chance that a feature will be rejected by the marketplace through A/B split testing. Comparing the kinds of testing outlined above with the actual bugs on recent work can help us understand whether we’re using the right techniques.
The second approach is to take a high-level look at how the team spends its time. If, for example, the team doesn't spend much time on compatibility testing, but that is where serious defects are found — by customers, no less — then it may be time to adjust. Anyone can do this. Management can gather the numbers to make decisions over time. Line workers can either create a survey, or just publish the numbers themselves.
It’s much easier to start a conversation about a spreadsheet than one that begins by pointing fingers and saying, "Why didn't you find that bug?"
Give it a try – see what you find. And share it with the rest of the testing community to provide value outside your own walls.
6 Steps to Get Started With Crowdtesting
Set your crowdtesting efforts up for success with these six steps. Learn how to select a project, develop success criteria, and lay the foundation for an effective partnership.
Read '6 Steps to Get Started With Crowdtesting' Now