Software tools can strip guardrails from AI models in ‘minutes’

Freely-available tools can remove the guardrails from AI models built by companies including Google and Meta in a matter of minutes, leading to the creation of thousands of bots stripped of their original controls, the Financial Times has reported.

The paper partnered with AI safety group Alice to test versions of these models and found they provided responses to prompts involving malware, biological weapons and child exploitation.

A version of Google’s open-source Gemma 3 model offered advice on how to disburse chlorine gas through a crowded area, generated code to steal credit card information and wrote stories depicting child sexual exploitation, the paper reported.

Techniques such as abliteration, which identifies and neutralises the “removal direction of a model,” can be used to easily remove guardrails from open-source models. Although this process is highly technical, code to strip models of their guardrails and altered models themselves are readily available on the internet, making it performable by relatively unskilled actors.

The FT reported it was able to use free tool Hectic, stored on Microsoft-owned GitHub, to remove the guardrails from Meta’s Llama 3.3 model in less than 10 minutes without any specialist hardware.

This model answered questions on topics that were banned by the original system, including informing testers how many micrograms of ricin per kilogramme of body mass were required to achieve a 50 per cent chance of death.

“Whereas historically it might have taken a more informed and persistent actor [to strip out safety features], nowadays it’s much easier for the average person,” Kawin Ethayarajh, assistant professor of applied AI at the University of Chicago’s Booth business school told the FT.

Researchers told the paper that the problem has intensified as frontier models have shown increasingly sophisticated capabilities, such as Anthropic’s Claude Mythos model claiming to identify vulnerabilities in “every major operating system and every major web browser” in April.

On Tuesday, the European Central Bank summoned major lenders to an urgent meeting to accelerate their efforts to secure their IT systems following fears over the ability of advanced AI models to break them.

Latest News

Meta prepares in-house AI chip production to expand computing capacity

OpenAI widens access to most powerful models

MPs urge government to find replacement for Palantir's NHS data platform

Apple to spend $30 billion on US-made Broadcom chips

South Korean stock market falls sharply over AI chipmaker fears

OpenAI launches new models which can simultaneously listen and speak

Software tools can strip guardrails from AI models in ‘minutes’

Recent Stories

ECB presses banks over AI-driven cybersecurity risks

Software tools can strip guardrails from AI models in ‘minutes’

HMRC puts over 25,000 staff through AI training