Anthropic is changing course after facing criticism for quietly downgrading certain requests to its most capable AI model.
On Tuesday, the $965 billion company released a version of its most capable model, Mythos. Anthropic revealed Mythos in April, but held back any Mythos-class models from the public partly because the company said it was extremely adept at skirting cybersecurity defenses and was too dangerous to release.
This week, though, it opted to release the Mythos-class model, Fable 5, even as its capabilities “exceed those of every model we’ve previously made generally available,” according to Anthropic. Dianne Na Penn, Anthropic’s head of product management, research, and labs, previously told Fortune the company felt comfortable releasing Fable 5 because it feels “more confident with our safety guardrails in place.”
Yet, it was one of those guardrails, buried in a 319-page safety document, that earlier this week prompted a wave of backlash from AI researchers and other users online, and has now prompted the company to improve its transparency.
Information found in the Fable 5’s system card, a long document of safety disclosures, revealed the model would silently downgrade some requests related to advanced AI development. If, for example, an AI researcher is using Fable 5 to build their own AI, the program would default to a less capable model.
Some AI researchers complained Anthropic’s move would slow down AI development, including Jeremy Howard, the cofounder of nonprofit research group Fast.ai.
“Easy solution to slow down recursive AI self improvement: The lab with the top-ranked model must agree THEY must not use it for working on frontier AI. But everyone else should have access to it. By definition, this means the frontier doesn’t advance,” he wrote in a post on X.
On Wednesday, Anthropic’s critics got at least part of what they were asking for: visibility.
“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” an Anthropic spokesperson said in a statement to Fortune. “Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal. You will see this every time it happens.”
The company will continue to downgrade some requests, partly because its terms of service prohibit its model from being used to create competing AI systems, a restriction the company said is standard across the industry.
Yet, it also cited national security as part of the reason why its large language model downgrades or rejects some requests. The company said it doesn’t want foreign adversaries to improve their AI capabilities to the detriment of the U.S.
“The U.S. and its allies hold an edge in frontier chips and the highly optimized software that runs them at full potential. These safeguards ensure Claude isn’t used to erode that advantage—by optimizing chips developed by those adversaries, for example,” the spokesperson said.
The company also emphasized its restrictions “do not affect the vast majority of coding and ML work.”
Anthropic’s change of course highlights how quickly AI safety measures are becoming a part of the national security conversation. Earlier this year, Anthropic faced a standoff with the Department of War after it refused to give it full access to Claude models. The company took issue with language that said the Pentagon could use its models for mass surveillance and autonomous weapons.
In the end, the Department of War labeled Anthropic a “supply chain risk” to national security, limiting defense contractors and military agencies from using its products. Earlier this month, Secretary of War Pete Hegseth rejected Anthropic’s petition to change this designation, setting the stage for a federal court battle that remains unresolved.
Anthropic’s move with Fable 5 also comes after the company filed confidentially for an IPO earlier this month. While the company has staked much of its public identity on being an AI lab that puts safety first, its initial decision to obscure when safeguards were being applied touched a nerve in the AI research community. In a statement, the company acknowledged it had gotten the issue wrong.
“We made the wrong tradeoff and we apologize for not getting the balance right,” an Anthropic spokesperson said.
This story was originally featured on Fortune.com
