Select Page
Blog - Building Generative AI into Applause’s Workflows

Building Generative AI into Applause’s Workflows

Like many other forward-looking organizations, Applause regularly evaluates ways to capitalize on generative AI’s potential to boost productivity and streamline business processes. To date, we have three production uses of genAI in place. In this blog post, I’ll outline how we’re using generative AI for one of those use cases, why we chose this specific application for the technology, how we tested it, and what we learned through the process.

GenAI for test case management

Test case management is a crucial practice for developing high-quality software. By effectively managing your test cases, you can ensure that your software is thoroughly tested and delivered with fewer defects. Well-written test cases take less time to execute, which improves efficiency and throughput, ultimately reducing costs. Conversely, poorly written ones can create confusion and bog down testing efforts.

When we developed our test case management solution, we added the option to have AI review test cases and recommend improvements. We’re using a double opt-in, so each customer decides whether to enable that feature for their organization. Once enabled someone on the QA team can request recommendations for a specific test case.

Once a human sends a test case for review, OpenAI’s GPT suggests edits that would improve the language, the clarity, or the steps in that test case. The person who submitted the test case for review has the option to accept all of the recommendations, accept a portion of the improvements, or pass and keep the test case as-is.

Why we selected this use case

We identified test case recommendations as the easiest place for us to get started with large language models. Test cases are very structured; they typically don’t contain PII, screenshots, or random input from our testing community or other people. They’re typically pre-production. We determined that using genAI in this way was safe legally, and there’s clear value that it can provide.

How we developed and tested the application

For the test case improvements, a lot of the work was integrating the feature into our product in terms of a user interface that presents the test case to the customer and lets them decide whether they like the changes or not. That was done with a workflow implementation. The rest of the work, integrating with an API, was a very simple thing.

Most of the time involved was focused on prompt engineering, which is an interesting thing. It’s kind of like programming — almost the test cases of the future, to a degree — where you have to write a very structured set of instructions for what you want the machine to do. For the large language model to produce good output, you have to build a lot of instructions into the prompts. For example, in our application, we had to specify that we wanted the model to be precise and avoid adding humor. We told the model to use all of its knowledge of the QA disciplines and take into account that we’re engineering products in the mobile and web space, as opposed to manufacturing.

As far as testing, we did our normal QA through the community. We had a number of our professional test case writers review the test cases before and after and sign off on the recommendations from the model. Our bar was that at least 80% of the time, the recommended test cases needed to be better than the original, so that someone would accept the changes.

Long term, we’re collecting feedback on how many times people have asked to clean up a test case and how many times the recommendations have been accepted. Since this is a new feature, we need to let it run for a period of time, but then we’ll get that feedback. And we might get different answers from the customers that are using the tool for feedback on their test cases versus our own internal use. I would expect us to get less of an acceptance rate from our community’s professional test case writers because we should be writing better test cases to start, whereas customers’ internal teams may not be experts in writing test cases. So maybe we get a higher acceptance rate there — that’ll be interesting data for us longer term.

Lessons learned

We spent weeks refining the prompt to get the results we wanted at this stage and evolution of large language models. In the case where you’re trying to do repetitive things, like clean up hundreds of thousands of test cases, automating has value. Refining that prompt makes sense, because your work one time to get a good prompt will magnify itself over and over again.

We ran into some strange things, like one case where cleaning up a test case was taking literally minutes for the machine to come back and provide us a recommendation. We found that taking a single comma out of the prompt that we had developed made it 60 times faster. When things aren’t going the way you want, be creative. There’s no documentation out there that will tell you to do something like that. You just have to keep adjusting things and then testing. It’s experimentation – you want to make sure your prompt engineering is going to yield a return on the investment of time you’re putting into it.

In summary, we have a new tool in our bag of tricks as developers, as people, as employees. As we’re approaching problems today, whether it’s in marketing or engineering or finance, you have to ask not only if generative AI could help, but whether it’s efficient. You’ll definitely want to think about what you need to share with that tool to get the answers you want. Consider PII and legal compliance — you want to make sure that you’re being safe with the data. You shouldn’t give it to systems where you’re not sure how the AI is being trained or the data is stored or retained. Even if a provider does claim they don’t save things, you’re better off not trusting them at this stage.

You need to know what tools are out there, and then you do need to experience the different tools. You need to spend time with generative AI to understand how different systems work and the options available. Once you understand what one is like, that goes into your bag of tricks and when you’re presented with a problem to solve, you can turn to generative AI with a clearer sense of what assistance you’ll get and where you have to be careful.

Want to see more like this?
Published: December 18, 2023
Reading Time: 7 min

Usability Testing for Agentic Interactions: Ensuring Intuitive AI-Powered Smart Device Assistants

See why early usability testing is a critical investment in building agentic AI systems that respect user autonomy and enhance collaboration.

Do Your IVR And Chatbot Experiences Empower Your Customers?

A recent webinar offers key points for organizations to consider as they evaluate the effectiveness of their customer-facing IVRs and chatbots.

Agentic Workflows in the Enterprise

As the level of interest in building agentic workflows in the enterprise increases, there is a corresponding development in the “AI Stack” that enables agentic deployments at scale.

What is Agentic AI?

Learn what differentiates agentic AI from generative AI and traditional AI and how agentic raises the stakes for software developers.

How Crowdtesters Reveal AI Chatbot Blind Spots

You can’t fix what AI can’t see

A Snapshot of the State of Digital Quality in AI

Explore the results of our annual survey on trends in developing and testing AI applications, and how those applications are living up to consumer expectations.
No results found.