Select Page
Blog - Bug Reproduction Requires Thinking Like a Real User

Bug Reproduction Requires Thinking Like a Real User

In part one of this series on bug reproduction, we discussed the value of taking the time to find the pattern of an issue in order to eliminate the potential future costs of a bug in a production setting. I provided a real-life example of an issue that users were reporting, but couldn’t be reproduced by the developers themselves.

In this post, let’s explore another real-life example of a production bug where I had the pleasure of finding the pattern that directly led to a fix — one that showcases the value of considering how real users will interact with a product, and how that impacts their experience.

Trouble reproducing the issue

Prior to joining uTest, I was working in the QA department of a large book retailer. I got called into the office of the development manager, who was dealing with an in-store system issue being reported to the support desk by store personnel. His team could not reproduce the issue in their lab.

Stores were complaining about the in-store kiosk that customers used to search for information, such as the shelf location of books, or to place a special order if the book was out of stock at the location. Every morning, the kiosks ran smoothly. But as the day went on, the kiosks would slow down to a crawl, eventually becoming useless to both employees and customers. This was a huge problem and was leading to lost sales.

As the day went on, the kiosks would slow down to a crawl, eventually becoming useless to both employees and customers.

Having not been involved with the developments of the kiosk, I learned that it was basically a locked-down version of the company’s web store with a few added applications. A sleep timer reset the kiosk back to the home page after a certain amount of time, and there were security features to stop anyone from tampering with them.

Come up with a theory

After the meeting with the development manager, I went to the test kiosk in the lab. I had a theory on how I could reproduce the issue and set out to put it to the test. My theory was that the development team could not reproduce the issue in the lab because they were not testing for a real-life scenario.

With what I now knew about how the kiosks were set up, how they were used in the stores and the description of the problem being witnessed, I surmised that every search would take a bit longer than the last, eventually coming to a crawl. Since the systems were rebooted nightly, my theory was that is why they worked fine at the start of the day.

This theory was based on how customers used the kiosks. A customer would walk into the store, run their search, get their information and walk away. Then the sleep timer would kick in, resetting the kiosk to the home page, until the next customer walked up and the process started over again.

Put the theory to the test

The first step to reproduce this bug in the lab was to take the time to mimic this real-life scenario. Armed with a stopwatch timer, I tapped a key on the keyboard to wake the device and began timing my search. The first search took about 15 seconds, and I recorded my timing in a log while I waited for the kiosk to reset itself.

Once the kiosk reset, I timed the exact same search again: 18 seconds this time. I ran the tests again and again, recording my time-per-search each time. Each run added a few more seconds to the search time. Eventually, the kiosk was crawling to perform a search.

I walked back to the development team to tell them I had reproduced the issue. The look on their faces was priceless — they were flabbergasted since I had only been in the lab about an hour.

We went into the lab and I showed them the reproduced issue. I restarted the system to simulate a nightly reboot and ran them through the multiple searches, timing each one, to show them what was happening. However, even though I had managed to reproduce the issue, I couldn’t tell them why it was happening.

Understand the root cause

I asked for the development team’s help in removing some variables from the system, namely the added applications. One-by-one, we removed the applications and retested until there were no added applications left on the kiosk. The issue persisted. With no variables left, we surmised that whatever was creating the problem was part of the web store code.

Armed with this information, I went to my personal desktop and pulled up the web store to see if I could reproduce the issue there as well. Sure enough, it was there. I then ran the test again, this time with a network traffic sniffer and isolated the file that was causing the issue.

We took this evidence to the Web team and they identified the issue: a garbage-collection file.

Normally this file wouldn’t be a problem — when a home user ends their session, the file gets cleared. However, in the store kiosks, the sessions were never cleared, and the garbage collection file simply got larger and larger until it was so big the system would lock up. Because the development team hadn’t been replicating how the technology was used in real-life, they couldn’t figure out why there was a bug. The problem was now clear and the fix was simple: clear the session after each use.

Because the development team hadn’t been replicating how the technology was used in real-life, they couldn’t figure out why there was a bug.

With this fix, the kiosk didn’t routinely slow down for customers, and the retailer wasn’t losing out on sales.

Reproducing bugs takes patience, investigation and the ability to put yourself in the mindset of a real user. The closer you can bring your testing to replicating real user behavior and understanding the context of in-the-wild use, the more likely you are to identify and reproduce issues.

Whitepapers

Introducing the Applause Quality Score™

Applause Quality Score ratings enable users to assess the effectiveness and thoroughness of their structured and exploratory tests build-over-build to make informed release decisions.

Want to see more like this?
View all blogs ⟶
Published: August 10, 2020
Reading Time: 8 min

Mobile App Accessibility Testing Basics

Adhere to these mobile app accessibility standards

Usability Testing for Agentic Interactions: Ensuring Intuitive AI-Powered Smart Device Assistants

See why early usability testing is a critical investment in building agentic AI systems that respect user autonomy and enhance collaboration.

Global Accessibility Awareness Day and Digital Quality Insights

Get the latest insight around accessibility and inclusive design from our annual survey of professionals working in digital quality and software development. Learn the steps your organization can take to move forward with accessibility and inclusive design.

Do Your IVR And Chatbot Experiences Empower Your Customers?

A recent webinar offers key points for organizations to consider as they evaluate the effectiveness of their customer-facing IVRs and chatbots.

Agentic Workflows in the Enterprise

As the level of interest in building agentic workflows in the enterprise increases, there is a corresponding development in the “AI Stack” that enables agentic deployments at scale.

Your Essential Web Accessibility Checklist

Priorities for digital inclusivity
No results found.