Claude Fable 5 will be redeployed with a new set of classifiers designed to identify and block a broader range of cybersecurity-related tasks.
Artificial intelligence company Anthropic is set to restore public access to its most powerful AI models, Claude Fable 5 and Mythos 5, weeks after they were pulled offline under a directive from the US government.
Anthropic’s two latest models have been restricted from public access since June 12, when the government applied export controls following a report in which researchers bypassed Fable 5’s safeguards, forcing Anthropic to pull all access to the models immediately. The government lifted those restrictions on Wednesday, stated Anthropic.
“After a series of productive conversations with the US government, we’re redeploying the model with a new set of classifiers to target and block more cybersecurity tasks,” Anthropic said.
The suspension of the models raised concerns about state control over frontier AI technology and set a dangerous precedent, according to experts and technologists. The export controls also highlighted White House concerns about a potential national cybersecurity threat if these powerful models were jailbroken and used for malicious purposes.
US Secretary of Commerce Howard Lutnick said on X on Wednesday, “Over the past two weeks, we have worked closely with Anthropic to analyze and approve Fable 5 to ensure alignment across the US Government and to strengthen America’s leadership in AI.”
Meanwhile, White House Chief of Staff Susie Wiles said on X that government priority remains to “get the best [AI] tech deployed as quickly and safely as possible.”
Related: AI researcher claims he's already bypassed Anthropic's Fable 5 guardrails
The restrictions came after the government became aware of a report in which Amazon researchers found a method of bypassing Fable 5’s safeguards, prompting the model to identify several software vulnerabilities.
In a blog post, Anthropic argued this wasn’t a risk unique to Fable 5, as weaker models could also identify the same vulnerabilities and produce the same exploit.
Anthropic has also begun drafting a consensus framework with Amazon, Microsoft, Google and other partners in its Project Glasswing — a collaboration announced in April to safeguard against AI cybersecurity threats — for “assessing the severity of AI jailbreaks.”
Anthropic’s cybersecurity safety classifiers and how jailbreaks interact with safety classifiers. Source: Anthropic
Source
This article is syndicated for educational reading. For the latest updates, visit the original publisher.
Read on cointelegraph.com