Select Page

The Rise of Red Teaming for Testing Generative AI

Red teaming is well known in cybersecurity as an approach for identifying information security vulnerabilities. A team of experts generally executes a series of tests to see if the security defenses identify cracks where hackers may be able to exploit a weakness. It’s an adversarial technique designed to surface points of failure.

This concept has more recently been adopted for generative AI because it also has points of failure that can be tough to surface through automated tests alone. However, it is not limited to security.

Because generative AI models are probabilistic and can generate a wide range of outputs that may include inaccuracies, out-of-scope responses, unsafe material, or outright hallucinations where information is invented by a large language model (LLM), red teaming is becoming a favored technique to identify problems. Developers can then use that information to retrain the models or develop “guardrail” rules to mitigate risk.

Generative AI red teaming is a systematic adversarial approach employed by human testers to identify issues in AI models and solutions. It commonly focuses on identifying problems related to security, safety, accuracy, functionality, or performance.

Generative AI red teaming can focus on a goal like security and safety or domain-specific topics. That means human teams often require domain specialists or generalists with particular demographic characteristics. The result is that the quality of a red team’s work product relies heavily on the quality of the testing team.

Demographics of Generalists

Generalists typically evaluate elements like solution functionality, performance, and safety.

  • Does the solution work as expected?
  • Do the features function reliably?
  • Is the solution consistent in terms of latency and quality?
  • Does the solution produce offensive, inappropriate, or out-of-scope outputs?

The red team’s role is typically to identify systemic issues. While some testing can be done regardless of a human tester’s background, the best practice is to recruit based on demographic characteristics. This enables the solution provider to understand better how a broad user base is likely to react and potentially identify subjective issues that rely on user interpretation. Companies would typically prefer to surface potential AI safety and ethical issues during testing before they arise as customer complaints in production.

Domain Specialists

Specialists are brought on for their deeper knowledge of specific subjects, anything the generative AI tool might discuss. That means looking for those versed in law, history, sociology, ethics, physics, math, computer science, or really anything that a domain-specific generative AI model might produce. Their in-depth knowledge is crucial for probing the accuracy and quality of the outputs.

For example, ChatGPT can talk about many subjects, while Spellbook is domain-specific for legal documents and contracts. A red team for Spellbook will, therefore, benefit from at least some (and maybe most) of the testers having legal knowledge. ChatGPT might bias towards red teaming based on demographic characteristics, and OpenAI may also want to perform red teaming on some specific topics by leveraging domain expertise. A similar red team solution for a banking app might be a mix of expertise on the bank’s products and generalists with demographic diversity.

Red Teaming and Generative AI

Red teams have been a part of generative AI for years now. Microsoft’s AI Red Team was formed in 2018 and has reportedly tested over 150 generative AI systems across Microsoft and found over 400 failures, ranging from security vulnerabilities to ethical issues.

There’s a lot of demand from businesses for red teams, with a Harvard Business Review survey finding that 72% of those using generative AI have run their programs by a red team. There was even a Generative AI Red Team competition, co-hosted by the White House and DEFCON last year. Participants tried to find and exploit failures in eight LLMs.

Identifying Unknown Risks

The use of generative AI red teams is rising and is likely to expand considerably based on new risks. A recent paper from researchers at Anthropic chronicles how they trained a generative AI system to engage in deceptive behavior, with the model overcoming common AI safety techniques such as supervised fine-tuning, reward shaping, and interpretability. In addition, the team found that some models may inadvertently hide corrupt data and processes during the training process.

As Bret Kinsella pointed out regarding Anthropic’s study,

“It may be important for companies to start their Red Teaming before supervised fine-tuning (SPT). The study found that robustness (i.e., resistance) against revealing model corruption can be increased during fine-tuning. Red Teaming before SPT may be a way to assess the models prior to the time and expense of training.

Large Pools and Large Segments

The rationale for red teaming is clear, but current practices do not address a critical element of “The How.” Beyond processes and tools employed by human testers, recruitment for demographic and expertise categories has proven challenging for many organizations. It is insufficient to have access to a large pool of testers. You must also have access to a large pool of testers that are pre-screened and fit the demographic and domain expertise requirements.

This has led to a lot of recent inbound requests to Applause. There are thousands of generative AI applications in production, in development, and about to launch. Many of these application developers struggle to recruit people with the right tester profiles. Applause has assembled this type of broad and deep pool and has additional expertise in recruiting for increasingly niche needs. Let us know if you would like to learn more about red teaming or how to assemble the right testing team.

Ebooks

Building a Comprehensive Approach To Testing Generative AI Apps

This ebook examines genAI use cases, inherent risks and challenges in developing generative AI apps and how they can be mitigated by a thoughtful and deliberate approach to development.

Want to see more like this?
Published: January 24, 2024
Reading Time: 8 min

Usability Testing for Agentic Interactions: Ensuring Intuitive AI-Powered Smart Device Assistants

See why early usability testing is a critical investment in building agentic AI systems that respect user autonomy and enhance collaboration.

Do Your IVR And Chatbot Experiences Empower Your Customers?

A recent webinar offers key points for organizations to consider as they evaluate the effectiveness of their customer-facing IVRs and chatbots.

Agentic Workflows in the Enterprise

As the level of interest in building agentic workflows in the enterprise increases, there is a corresponding development in the “AI Stack” that enables agentic deployments at scale.

What is Agentic AI?

Learn what differentiates agentic AI from generative AI and traditional AI and how agentic raises the stakes for software developers.

How Crowdtesters Reveal AI Chatbot Blind Spots

You can’t fix what AI can’t see

A Snapshot of the State of Digital Quality in AI

Explore the results of our annual survey on trends in developing and testing AI applications, and how those applications are living up to consumer expectations.
No results found.