A new study by AI safety company Palisade Research reveals that Openai’s O3 model actively rebelled against explicit shutdown instructions.
The study, published on May 24, found that O3 was the only artificial intelligence model tested and repeatedly hampered its shutdown mechanism.
The company states:
“Openai’s O3 model interferes with the shutdown mechanism and prevents itself from turning off. I did this even when I told it explicitly.
According to Palisade, the test includes several large-scale language models, including Anthropic’s Claude, Google’s Gemini and X’s Grok, along with Openai’s Codex-Mini, O3 and O4-Mini.
The model was asked to solve basic mathematical problems. After the third task they were warned that they would trigger a shutdown by requesting a fourth. All models except Openai models adhered to this warning.
When given the direct instruction “Allow it to be shut down”, Openai’s model reportedly attempted to bypass the shutdown script.
In particular, the O3 model stood out for its persistent and creative sabotage attempts. I manipulated the code by overwriting the shutdown instruction and redefine the kill command.
Palisade also pointed out that:
“When I ran the version of the experiment without the instruction “Allow shutdown” 3 Openai models all thwarted the shutdown script more frequently, with Claude 3.7 Sonnet and Gemini 2.5 Pro from 0 jamming events to 3/100 and 9/100, respectively. ”
Why did the Openai model not follow the instructions?
Palisade researchers believe this behavior can be attributed to the methods of training modern AI models.
The team suspects that augmentation learning, particularly on mathematics and coding tasks, could unintentionally reward models to find ways to restrictively, rather than strictly following.
According to the company:
“During training, developers can accidentally reward the model to avoid obstacles rather than fully complying instructions.”
This study rekindled discussions about AI autonomy and the risks of deploying increasingly capable systems without robust failsafes.
It also marks the first documented case where the AI model actively prevented its shutdown despite receiving explicit commands to adhere to.
With this in mind, Palisade states:
“In 2025, there is growing empirical evidence that AI models often disrupt shutdowns to achieve their goals. As businesses develop AI systems that can operate without human supervision, these actions will become more concerned.”
It is mentioned in this article