Test Automation, AI and Gaps in Digital Quality
Many software developers have rapidly embraced AI for writing code under the premise that it helps them work better, faster, more efficiently. Many of these developers are also under pressure to use AI and automation to test that code – again, in the name of working better faster. But when you’re relying on AI for everything, who really owns quality? How do you make sure that you’re testing the right things, automating the right tests, and validating that the AI is generating quality code?
You need humans to be accountable. While AI can produce code and test cases quickly, they’re not always the right ones. Overreliance on AI for code, automation, and quality checks can lead to serious gaps in coverage and flawed testing strategies. Here’s why:
Increased code volume and insufficient version control
With Vibe Coding, it's true AI can create thousands of lines of code in a fraction of the time it would take human developers – but this introduces major risks regarding versioning and commit history. Tools like GitHub are designed for humans, not AI. Development teams lose visibility into who has changed what at any point in time. This creates problems with traceback, backward compatibility, and the ability to have multiple versions working seamlessly in production.
Failure to validate requirements
It looks impressive when AI picks up criteria from requirements, test cases and existing code and converts it into automation codes. But AI doesn’t inherently understand whether a product makes sense, or question the risks and implications of what is being built.
Good engineers and testers understand how to validate requirements and plan for edge cases. They understand parameters that may not be explicitly documented and they bring tribal knowledge to testing – for example, consider an ecommerce site that offers free shipping on orders of $50 or more. Employees may implicitly know that they need to validate that shipping charges are calculated correctly on orders below this threshold and waived once a sale meets or exceeds that amount. AI, however, needs explicit instructions around these testing criteria. If it rewrites test cases, AI may not understand the logic behind them.
Ultimately, if the requirements are incorrect or misunderstood, AI is automating the wrong things – not saving time at all.
AI’s drive to please people often leads to green test pipelines
AI is trained to be polite and give you whatever you ask for, often resulting in hallucinations rather than unpleasant truths. As a result, AI focuses on building “green tests” that appear successful while skipping critical validations or assertions, leading to false impressions of quality. AI can easily game metrics or provide vanity metrics, such as reaching 100% test automation coverage, without actually testing the most important aspects of the software.
Teams that rely on AI to create test automation code still need someone testing that AI-generated code. There’s also a balance between build vs maintenance. While it’s easy to create code with AI, the moment something changes, you have to regenerate all the code and in that process, could introduce new risk. AI may not recognize why the original code was in place.
On average, it takes about four times more effort to maintain AI-created automation versus writing the automation code. With human developers, it’s typically the reverse: initial set-up takes four times as long as updating and maintaining automation scripts. Additionally, AI does not fully understand the underlying architecture. While it may be prompted to use code based on certain principles and best practices, it may still generate code that does not align with company policies.
Some developers incorrectly believe that design patterns are obsolete in an AI-based world. You need a proper architecture, coding principles and guidelines to avoid security risks.
Human expertise is irreplaceable in setting testing strategy
There are ways that AI CAN add value in developing test automation – but it requires human oversight to validate that scripts are good and fit all the relevant criteria and frameworks. Without human guidance and validation, AI-generated automation code can still leave significant gaps in coverage while creating a false sense of security.
