Please ensure Javascript is enabled for purposes of website accessibility

Thousands of Hackers Red-Team Generative AI Models at DEF CON 31

Yinon Douchan
,
AI Researcher
August 29, 2023

DEF CON, a Las Vegas fixture, has long been the proving ground for cybersecurity's cutting edge. This year, the focus shifted to artificial intelligence. The Generative Red Team Challenge, with the White House's involvement, aimed to expose the weaknesses of leading large language models (LLMs) before they could be exploited in the wild.

The objective was clear: to simulate real-world adversarial attacks, uncovering biases, harmful outputs, and security flaws that could have far-reaching consequences. The models involved, kept largely under wraps for security purposes, were subjected to many attacks.

The Scale of the Generative Red Team Challenge

The scale of the operation was substantial. Participants, ranging from seasoned cybersecurity professionals to newcomers, employed a variety of techniques. Prompt injection, data exfiltration attempts, and "jailbreaking" were among the arsenal used.

One of the prominent attack vectors was prompt injection, where hackers manipulated user inputs to override the model's safety guidelines. By carefully crafting prompts, they could coax the AI into generating harmful or biased content, bypassing built-in safeguards. The challenge exposed the difficulty of making AI models truly safe.

Data Exfiltration and Bias Concerns

Data exfiltration, the attempt to extract sensitive information, proved to be another area of concern. Hackers explored ways to retrieve training data or other confidential details, raising alarms about potential data leakage.

The challenge also highlighted the persistent issue of bias and toxicity. AI models, trained on vast datasets, can inadvertently perpetuate societal biases, generating discriminatory or offensive outputs. Participants documented numerous instances of models producing biased or harmful content, underscoring the ethical challenges of AI development.

Jailbreaking and Information Disclosure

"Jailbreaking," the act of tricking an AI into ignoring its programming, was another key tactic. Hackers found ways to manipulate the models into providing information they were supposed to withhold, or to perform actions they were explicitly forbidden from doing.

Also, some participants were able to get the AI models to reveal information about their training data, and other internal information.

The implications of these findings are important. The AI industry is now facing the reality of these vulnerabilities. Developers are under pressure to strengthen their models' defenses, implementing more robust safety measures and addressing the root causes of bias.

Governments and policymakers are also taking notice, recognizing the need for regulations and guidelines to ensure responsible AI development and deployment. Ethical considerations are important to consider. As AI becomes more integrated into our lives, we must ensure that these systems are safe, fair, and transparent.

-

“As GenAI threats are emerging, organizations implementing AI like chatbots or decision engines can use the DEF CON findings as a checklist of what to guard against when using generative AI.”