Select Page
Women looking at a large monitor and working on a laptop

The Rise of Red Teaming for Testing Generative AI

Red teaming is well known in cybersecurity as an approach for identifying information security vulnerabilities. A team of experts generally executes a series of tests to see if the security defenses identify cracks where hackers may be able to exploit a weakness. It’s an adversarial technique designed to surface points of failure.

This concept has more recently been adopted for generative AI because it also has points of failure that can be tough to surface through automated tests alone. However, it is not limited to security.

Because generative AI models are probabilistic and can generate a wide range of outputs that may include inaccuracies, out-of-scope responses, unsafe material, or outright hallucinations where information is invented by a large language model (LLM), red teaming is becoming a favored technique to identify problems. Developers can then use that information to retrain the models or develop “guardrail” rules to mitigate risk.

Generative AI red teaming is a systematic adversarial approach employed by human testers to identify issues in AI models and solutions. It commonly focuses on identifying problems related to security, safety, accuracy, functionality, or performance.

Generative AI red teaming can focus on a goal like security and safety or domain-specific topics. That means human teams often require domain specialists or generalists with particular demographic characteristics. The result is that the quality of a red team’s work product relies heavily on the quality of the testing team.

Demographics of Generalists

Generalists typically evaluate elements like solution functionality, performance, and safety.

  • Does the solution work as expected?
  • Do the features function reliably?
  • Is the solution consistent in terms of latency and quality?
  • Does the solution produce offensive, inappropriate, or out-of-scope outputs?

The red team’s role is typically to identify systemic issues. While some testing can be done regardless of a human tester’s background, the best practice is to recruit based on demographic characteristics. This enables the solution provider to understand better how a broad user base is likely to react and potentially identify subjective issues that rely on user interpretation. Companies would typically prefer to surface potential AI safety and ethical issues during testing before they arise as customer complaints in production.

Domain Specialists

Specialists are brought on for their deeper knowledge of specific subjects, anything the generative AI tool might discuss. That means looking for those versed in law, history, sociology, ethics, physics, math, computer science, or really anything that a domain-specific generative AI model might produce. Their in-depth knowledge is crucial for probing the accuracy and quality of the outputs.

For example, ChatGPT can talk about many subjects, while Spellbook is domain-specific for legal documents and contracts. A red team for Spellbook will, therefore, benefit from at least some (and maybe most) of the testers having legal knowledge. ChatGPT might bias towards red teaming based on demographic characteristics, and OpenAI may also want to perform red teaming on some specific topics by leveraging domain expertise. A similar red team solution for a banking app might be a mix of expertise on the bank’s products and generalists with demographic diversity.

Red Teaming and Generative AI

Red teams have been a part of generative AI for years now. Microsoft’s AI Red Team was formed in 2018 and has reportedly tested over 150 generative AI systems across Microsoft and found over 400 failures, ranging from security vulnerabilities to ethical issues.

There’s a lot of demand from businesses for red teams, with a Harvard Business Review survey finding that 72% of those using generative AI have run their programs by a red team. There was even a Generative AI Red Team competition, co-hosted by the White House and DEFCON last year. Participants tried to find and exploit failures in eight LLMs.

Identifying Unknown Risks

The use of generative AI red teams is rising and is likely to expand considerably based on new risks. A recent paper from researchers at Anthropic chronicles how they trained a generative AI system to engage in deceptive behavior, with the model overcoming common AI safety techniques such as supervised fine-tuning, reward shaping, and interpretability. In addition, the team found that some models may inadvertently hide corrupt data and processes during the training process.

As Bret Kinsella pointed out regarding Anthropic’s study,

“It may be important for companies to start their Red Teaming before supervised fine-tuning (SPT). The study found that robustness (i.e., resistance) against revealing model corruption can be increased during fine-tuning. Red Teaming before SPT may be a way to assess the models prior to the time and expense of training.

Large Pools and Large Segments

The rationale for red teaming is clear, but current practices do not address a critical element of “The How.” Beyond processes and tools employed by human testers, recruitment for demographic and expertise categories has proven challenging for many organizations. It is insufficient to have access to a large pool of testers. You must also have access to a large pool of testers that are pre-screened and fit the demographic and domain expertise requirements.

This has led to a lot of recent inbound requests to Applause. There are thousands of generative AI applications in production, in development, and about to launch. Many of these application developers struggle to recruit people with the right tester profiles. Applause has assembled this type of broad and deep pool and has additional expertise in recruiting for increasingly niche needs. Let us know if you would like to learn more about red teaming or how to assemble the right testing team.

Ebooks

Building a Comprehensive Approach To Testing Generative AI Apps

This ebook examines genAI use cases, inherent risks and challenges in developing generative AI apps and how they can be mitigated by a thoughtful and deliberate approach to development.

Want to see more like this?
Published On: January 24, 2024
Reading Time: 5 min

Automotive Testing Trends and Challenges in 2026

As the automotive industry shifts toward software-defined vehicles and integrated digital ecosystems in 2026, QA teams face unprecedented complexity. Discover the top trends and real-world testing strategies.

EAA Enforcement: What We Learned at IAAP Dublin

We recap the main talking points of the IAAP EU Accessibility event in Dublin, with a special focus on EN 301 549 and the European Accessibility Act.

Why Accessibility Is the Infrastructure for AI Readiness

AI agents cannot transact with what they cannot interpret.

U.S. Super Apps: Orchestrating Seamless Ecommerce Experiences

Learn why the US super app is an integrated layer, powered by agentic AI. And why quality execution is the core challenge.

Rethink Regression Testing: 3 Reasons to Outsource

Hand off regression testing to a crowdtesting partner to save time, improve coverage and keep your QA staff happy.

Crowdtesting vs. In-House QA: Why Market Leaders Choose a Hybrid Strategy

Internal QA is an organization’s main line of defense in digital quality. Find out how crowdtesting fills in the gaps and complements in-house teams.
No results found.