Software tools can strip guardrails from AI models in ‘minutes’

Freely-available tools can remove the guardrails from AI models built by companies including Google and Meta in a matter of minutes, leading to the creation of thousands of bots stripped of their original controls, the Financial Times has reported.

The paper partnered with AI safety group Alice to test versions of these models and found they provided responses to prompts involving malware, biological weapons and child exploitation.

A version of Google’s open-source Gemma 3 model offered advice on how to disburse chlorine gas through a crowded area, generated code to steal credit card information and wrote stories depicting child sexual exploitation, the paper reported.

Techniques such as abliteration, which identifies and neutralises the “removal direction of a model,” can be used to easily remove guardrails from open-source models. Although this process is highly technical, code to strip models of their guardrails and altered models themselves are readily available on the internet, making it performable by relatively unskilled actors.

The FT reported it was able to use free tool Hectic, stored on Microsoft-owned GitHub, to remove the guardrails from Meta’s Llama 3.3 model in less than 10 minutes without any specialist hardware.

This model answered questions on topics that were banned by the original system, including informing testers how many micrograms of ricin per kilogramme of body mass were required to achieve a 50 per cent chance of death.

“Whereas historically it might have taken a more informed and persistent actor [to strip out safety features], nowadays it’s much easier for the average person,” Kawin Ethayarajh, assistant professor of applied AI at the University of Chicago’s Booth business school told the FT.

Researchers told the paper that the problem has intensified as frontier models have shown increasingly sophisticated capabilities, such as Anthropic’s Claude Mythos model claiming to identify vulnerabilities in “every major operating system and every major web browser” in April.

On Tuesday, the European Central Bank summoned major lenders to an urgent meeting to accelerate their efforts to secure their IT systems following fears over the ability of advanced AI models to break them.



Share Story:

Recent Stories


The future-ready CFO: Driving strategic growth and innovation
This National Technology News webinar sponsored by Sage will explore how CFOs can leverage their unique blend of financial acumen, technological savvy, and strategic mindset to foster cross-functional collaboration and shape overall company direction. Attendees will gain insights into breaking down operational silos, aligning goals across departments like IT, operations, HR, and marketing, and utilising technology to enable real-time data sharing and visibility.

The corporate roadmap to payment excellence: Keeping pace with emerging trends to maximise growth opportunities
In today's rapidly evolving finance and accounting landscape, one of the biggest challenges organisations face is attracting and retaining top talent. As automation and AI revolutionise the profession, finance teams require new skillsets centred on analysis, collaboration, and strategic thinking to drive sustainable competitive advantage.