How eBay’s, Walmart’s Engineering Teams Tackle Test Automation
Last month, I had the pleasure to moderate a live panel featuring engineering directors from two Applause customers: eBay’s Srikanth Rentachintala and Walmart’s Anthony Tang.
At eBay, Rentachintala is responsible for all aspects of quality engineering within the buyer experience group, including test automation, environments and strategy. He is also responsible for the digital accessibility of eBay’s properties.
As part of Walmart Global Tech, Tang leads the developer experience tools team. He is responsible for test automation tooling as well as accessibility and performance tooling for customer experience.
We discussed several strategic areas that all engineering leaders face, such as how to leverage test automation, use test data properly and insert accessibility testing.
In this blog post, learn how Rentachintala and Tang approach test automation at eBay and Walmart, respectively, particularly around test coverage, metrics and shifting testing to the left.
What is the right percentage of coverage?
There are two kinds of coverage, which often causes confusion: test coverage and code coverage.
Test coverage refers to the percentage of test cases that are automated. Code coverage is the percentage of code paths covered by automated testing.
When it comes to test coverage, some test cases simply can’t be automated. Test cases such as specific error paths, for example, require manual testing. With code coverage, aim to be as close to 100%, but it might not be possible to get all the way there if you’re just getting started. This conversation, however, focused on test coverage.
The percentage of test coverage ranges widely by organization, often from a low of 20% to upwards of 80%. Here at Applause, our experience has been that it’s challenging to extend beyond 80% and still get value. However, the test coverage percentage can be misleading. Teams should monitor the quality of the tests too.
“Test case automation coverage is not the easiest measure to come by,” Tang said. “To get the denominator correct, it involves some manual thinking around, ‘What are all the tests? What is the full population of test cases that you need to automate?’
“One interesting thing that we see is that a lot of people, when they see percent of test cases, they think percent of coverage. And so, a lot of people will report their coverage percentage, which isn't the same, and I think that's important to point out.”
The percentage of test coverage ranges widely by organization, often from a low of 20% to upwards of 80%
Two additional questions discussed in the webinar included:
- How does coverage differ for front-end vs. back-end applications?
- At what point does the return on investment (ROI) start to diminish?
Rentachintala said eBay is around 85%-90% for ROI on back-end services, while it’s more like 75% for front-end applications.
“At one point, there is a point of diminishing returns,” Rentachintala said. “So that's where you don't want to invest more. For those use cases, maybe it's not really helpful to have more automation there.”
At Walmart, there have also been differences in getting higher percentages of test coverage based on existing projects vs. new projects. Tang noted that his team wanted to improve the test coverage of an existing project, and topped out at around 70%. However, for a new project where the team could utilize engineering and QA best practices from the beginning, it is now at 90% test coverage.
“The last 10% is difficult,” Tang said. “The goal is, of course, 100%, but you reach a point of diminishing returns.”
What metrics are important for test automation?
There are dozens — if not hundreds — of metrics to judge your team’s success with test automation. Test coverage and code coverage are two of those many test automation metrics.
In the webinar, we broke these metrics down into three buckets.
|Quality Metrics||Operational Metrics||Product/Team Health Metrics|
|Test Code Coverage||Build Cycle Time||Development Cycle Time|
Functional Test Case Automation
|App/Test Cases Out of Sync||Production Defect Creation Rate|
|App Size||Automated Deployment||Consistency of Velocity|
|App Load Time||Deployment Frequency||Integration Frequency|
|App Crash Rate||Application Availability||Mean Time Between Build Failures|
“When it comes to the quality metrics, we look at the code coverage as one of the key metrics,” Rentachintala said. “Again, it's not the only metric. Specifically we look at the line coverage to understand how much automation is actually covering. That's one that we use.
“In addition, there are some applications which have a lot of code that has been wired off for various business reasons. We call that ‘dead code,’ and what we do is we try to exclude the ‘dead code’ from the reporting and that's how we measure the line coverage.”
As teams move toward a continuous release cycle, think about successful deployment frequency. Both eBay and Walmart seek to measure the ratio of successful deployments, and leverage key metrics determined by DevOps Research and Assessment (DORA). DORA metrics include:
- deployment frequency;
- lead time for changes;
- change failure rate;
- time to restore service.
“Quality automation is a big part in these metrics, and it's important to be able to measure that to know the health of the product,” Tang said.
Both eBay and Walmart seek to measure the ratio of successful deployments, and leverage key metrics determined by DevOps Research and Assessment (DORA).
“Those [DORA metrics] are becoming an industry standard now, so that's what we use,” Rentachintala said. “We are aspiring to be on the elite state there at eBay.”
How do you apply test automation properly?
Shifting testing to the left is a best practice for fast-moving development organizations. In this webinar, I was interested to hear how eBay and Walmart are executing on shift-left testing.
Walmart developed its ‘Testing Lava Lamp’ to visualize this process. With this strategy, static and unit testing form the foundation of the lava lamp, followed by integration testing with separate units — as opposed to integration testing with outside systems. The yellow sections of the diagram show the types of testing that the developer has more control over on their local environments.
“Then as we move into functional testing, it becomes more out of process and more in app,” Tang said. “You're actually running the code deployed somewhere, so it becomes more expensive. When you find bugs in the later stages of the pipeline, you have to go back to development, and go through the pipeline to get back to that testing. And the testing may not run as reliably or it might not run as consistently between your local environment versus the CI/CD environment. So, it's just there's a little more maintenance there.
“And then you get to the end-to-end testing, which is important, but we want to really limit how much is going on there, because you do want to make sure everything's working together, especially with different teams working on different components. How do all of those things integrate at the system level? And how do they integrate with the versions that are currently in production or about to go into production? But at the same time, they're very expensive to maintain.”
Similarly, eBay visualizes its testing and shift-left priorities. Rentachintala’s team leverages a testing pyramid, which has a base of unit and component testing, followed by integration testing and finally end-to-end testing built on top.
“At the unit and component level, that's where we wanted to invest heavily,” Rentachintala said. “It’s cheaper and better to identify the issues upfront. … When it gets to end-to-end tests, that's when we have multiple pages or flows put together, and then we run through end-to-end use cases.
“For us at eBay, we do find a lot of issues, about 30% of them, during integration and end-to-end testing. That's quite high, and primarily that’s because the contract is made between the clients and the services. Ideally, the contract should be honored, but in a good number of cases, it's not honored. So those kinds of issues can only be found through the integration testing and not through the component audits done with data. So that's where we find maximum value.”
In addition to test automation, Tang and Rentachintala also discussed test data and accessibility. Watch the whole presentation here to learn more.