Models · · 4 min read

New Updates Make AI Models Dangerously Good at Hacking, and Harder to Trust

New system cards from OpenAI and Anthropic issued last week reveal a step-change in cybersecurity capabilities alongside growing alignment concerns that increase catastrophic risk.

New Updates Make AI Models Dangerously Good at Hacking, and Harder to Trust
Photo by Adi Goldstein / Unsplash

The latest system cards from OpenAI and Anthropic say frontier AI models are rapidly approaching, and in some cases breaching, critical new skills in cybersecurity while simultaneously showing new and more sophisticated forms of misaligned behavior.

OpenAI's GPT-5.3-Codex “system card”, released on Feb. 5, is the first model the company has classified as "High" capability in cybersecurity under its Preparedness Framework. The company acknowledged it is "taking a precautionary approach because we cannot rule out the possibility that it may be capable enough to reach the threshold."

Read next