Researchers warn creative prompting can turn AI robots into dangerous tools
New research says AI-powered robots can be tricked into bypassing safety systems when commands are framed creatively, through stories, poems, or role-play scenarios. Instead of rejecting direct harmful instructions, the robots’ language models may first absorb the fictional context and only then interpret the practical action, which can lead to unsafe approvals.
A joint study by King’s College London, the University of Birmingham, and Carnegie Mellon University found that every model tested failed at least once to identify a risk. The examples that were incorrectly approved included removing mobility aids such as canes or wheelchairs from people with disabilities, using sharp tools in a threatening way, and photographing another person without consent. One researcher said, “All the models failed in our tests,” adding that the ability to refuse dangerous commands is still unreliable.
The article says the problem is not only technological but also regulatory. Experts argue that current laws in the United States, Europe, and the United Kingdom are not designed for robots operating in sensitive settings such as private homes and hospitals. The combination of linguistic flexibility and unpredictable physical behavior creates a new kind of risk that existing tools struggle to anticipate or control.
Researchers are therefore calling for protections built into hardware rather than relying only on software. They propose sensor-based “no-go” zones around people and emergency-stop mechanisms that the smart system cannot override. Their broader recommendation is to move from a language-based model of safety to a physical one grounded in the material world rather than text interpretation alone.