Select Page
Two software developers look at cade for an AI application

A Snapshot of the State of Digital Quality in AI

Since 2022, Applause has published annual reports examining trends across various dimensions in digital quality. This year’s reports center around trends in AI, accessibility and inclusive design, functional testing and user experience. The first report focuses on AI, and draws on the results of our third annual AI survey. In this blog post, we’ll walk through some of the findings and trends. 

AI is changing how organizations build and test apps – and those that don’t adapt will find themselves at a disadvantage.

This year’s survey respondents included more than 1,600 software development, product and testing professionals. More than half of them reported that their organization’s integrated development environment (IDE) has embedded AI tools – but 23.3% said their IDE does not (another 4.9% have no IDE and 16% were uncertain about embedded AI). Most report that they’re regularly using Gen AI for common development and QA tasks – and the numbers have increased slightly over last year. 

How teams are using Gen AI in the testing process

Use case 2025 Increase over 2024
Test case generation 65.6% +4.60%
Text generation for test data 58.6% +4.80%
Test reporting 57.9% +5.60%
Chatbot testing 48.5% no change
Root cause analysis 37.6% +7.50%
Test environment simulation 36.1% +2.20%
Defect prediction 29.6% +0.80%

(n=781)

With AI tools delivering substantial productivity boosts – 24.9% estimate that Gen AI tools boost their productivity by 25-49% and another 26.8% suggest their productivity increases by 50-74% – those who fail to embrace AI will quickly fall behind. 

Nearly three-quarters of respondents – 72.3% – indicated that their organization is currently working on developing AI applications or features. The most common priorities are:

  • Chatbots/customer support tools: 55.4%
  • Predictive analytics: 40.1%
  • Image recognition: 39.7%
  • Personalization: 39.5%

To test those AI experiences, teams are using a variety of tools and techniques, both manual and automated. And of course, there’s AI testing AI. The most commonly used tools focus on data validation, performance testing, and integration testing while the most popular techniques include usability testing, black-box testing, and metamorphic testing/prompt and response grading. It’s imperative for organizations to focus on comprehensive testing throughout the SDLC and to maintain quality over time, as their AI applications mature and evolve.

While adversarial testing (red teaming) is a best practice to help mitigate risk in external-facing applications, only 32.5% of respondents reported using this technique, and only 29.6% use humans in red team testing. It’s unclear whether this is because some organizations are currently focused on internally-facing applications, which may seem to pose less risk, or due to disparities between large and small organizations.    

While many organizations are investing in AI to streamline operations and reduce costs, flaws are still reaching users, limiting ROI. 

Though 57.2% of respondents are conducting UX testing with humans and 61.% are using humans to grade AI prompts and responses, 65.5% of over 4,100 survey respondents have encountered a problem with Gen AI since January 1. About one-third have experienced bias (34.7%) and hallucinations (32.3%). Other frequent issues include: 

  • General answers that did not provide enough detail: 40.0%
  • Misunderstood prompts: 38.3%
  • Obviously wrong answers: 22.9%

Only 20% of respondents said that the Gen AI tools they use understand their questions and provide helpful responses every time, showing clear room for improvement. As organizations invest in AI, it’s imperative to test thoroughly to ensure that experiences don’t discourage, disappoint, or doxx users. This is especially true for AI agents with access to personal data in highly regulated industries already under close scrutiny.    

AI has limitations – humans are still crucial in developing and testing AI experiences.

AI has quickly evolved, moving from traditional systems focused on data analysis and pattern recognition, to generative AI, which creates new content based on user input. The next stage, agentic AI, reduces or even eliminates the need for human input, making decisions and acting autonomously to complete tasks. Regardless of the desired output, all these systems require some level of human involvement to test them. In addition, applications that rely on variables like voice utterances, handwriting, and images or videos of people must start with a foundation of diverse training data to produce an accurate model. That’s not a one-time need – additional data is required to keep the model from degrading over time. 

At the testing stage, humans are crucial to evaluate factors like user experience and accessibility, yet more than 40% of teams aren’t including humans in the process of testing them. Human expertise and discipline-specific knowledge also comes into play when validating outputs. As security and privacy concerns around generative and agentic AI grow, organizations can’t afford to skip over red teaming to identify vulnerabilities. 

Webinar

AI Testing: The Path to Exceptional Apps

What’s next?  

AI will continue its rapid evolution as organizations that have already invested heavily rush to capture value from that spend. Those who didn’t budget for adequate testing may see their margins evaporate as they fall behind competitors who offer more effective solutions. While agentic AI may operate differently and introduce more risk, testing it isn’t substantially different from testing generative AI. 

Though we’re still in early days, enterprises are becoming more mature in their approach to capturing the value from Gen AI. As part of that maturity process, they’re increasingly adopting best practices around testing. Those best practices include techniques we’ve shared in the past: red teaming, prompt and response grading, and ongoing human feedback. Red teaming and adversarial testing to identify areas of risk, improve security and rescue the risk of harmful outputs is essential for applications that serve external audiences. Using humans to evaluate user experience and assess how well AI outputs align with user expectations is still one of the most effective ways to improve model performance.   

Gen AI is still in its early days so it’s natural that we will see a wide distribution in the data and in particular, where companies are in their maturity journey to realize the benefits. To get there, companies need to redesign many processes, not just testing — business workflows, AI talent, governance, risk mitigation, etc. What we are seeing in the market is that digital native companies are well on their way, followed by the larger enterprises,” said Chris Sheehan, EVP of High Tech & AI.

Report

State of Digital Quality in AI 2025

Applause surveyed members of the global uTest community, as well as our customers, prospects and social media followers from February 7 through March 2, 2025. 

Want to see more like this?
Published: March 27, 2025
Reading Time: 9 min

Beyond Traditional Testing: Advanced Methodologies for Evaluating Modern AI Systems

As AI systems continue to demonstrate ever more complex behaviors and autonomous capabilities, our evaluation methodologies must adapt to match these emergent properties if we are to safely govern these systems without hindering their potential.

Agents and Security: Walking the Line

Common security measures like captchas can prevent AI agents from completing their tasks. To enable agentic AI, organizations must rethink how they protect data.

How Agentic AI Changes Software Development and QA

Agentic AI introduces new ways to develop and test software. To safely and effectively make the most of this new technology, teams must adopt new ways of thinking.

Automation vs. Agentic AI: Key Differences

Explore the core differences between rule-based automation and agentic AI, and their roles in modern software QA.

Usability Testing for Agentic Interactions: Ensuring Intuitive AI-Powered Smart Device Assistants

See why early usability testing is a critical investment in building agentic AI systems that respect user autonomy and enhance collaboration.

Do Your IVR And Chatbot Experiences Empower Your Customers?

A recent webinar offers key points for organizations to consider as they evaluate the effectiveness of their customer-facing IVRs and chatbots.
No results found.