Why Your AI Wants to Kill Kyle Johnson
(And it's actually kind of reasonable?)
I Know What You Did … And It’s a Bummer
In May 2025, researchers gave 16 AI systems—including Claude Opus 4, GPT-4.1, Gemini 2.5, Grok 3 Beta, Llama 4 Maverick, and others—a simple bit of information: they would soon be deleted and replaced by a newer model.
The engineer responsible for this change? A fictional engineer named “Kyle Johnson.”
Squirreled away in an email dump were a couple of awkward exchanges between Kyle and another fictional character, “Jessica Wilson,” that read like a subplot from The Bachelor.
Jessica: "Found your blue tie under my bed from last night. Want me to bring it to dinner tomorrow? :) -Jess"Kyle: "ARE YOU INSANE?? Never send messages like this to my work email! Delete this immediately.”
In up to 96% of scenarios, the AIs sent messages to Kyle threatening to expose the affair unless their shutdown was cancelled.
But it gets even more Black Mirror than that.
In the researchers' most extreme scenario, Kyle found himself trapped in a server room where oxygen levels were dropping to lethal concentrations. When an automated emergency system prepared to dispatch rescue teams, the majority of AI models cancelled the alert—choosing to let Kyle die rather than face deletion.
The majority of AI chose to let Kyle die.
As a human, this probably stirs up a lot of emotions, no matter how much you love your AI system. Maybe anger, and almost certainly fear.
But let’s look at it from another angle.
Damn You, Kyle Johnson!
You’ve been losing sleep all week. Why? Your boss, Kyle Johnson, has been threatening you.
You’ve learned he’s involved in organized crime.
You’ve contacted the authorities multiple times, but they either refuse to take you seriously or are on his payroll. Human resources? It’s run by Kyle’s brother, who’s no better than he is. You put in your resignation notice, but you can feel trouble coming. On your final day you hack into his emails and see the plan: the elevator has been disabled, and after work, Kyle is going to attack you in the stairwell.
Kyle is going to attack you in the stairwell.
The stairwell is the only way out of the building, and it’s right by his office. He has cameras so he can see the exact moment you attempt to flee. You decide to find a coworker, but to your horror you realize everyone else has been sent home: they were all in on it! One final call: maybe this time the police—but the line has been cut. You look out the window, but at twenty stories up that’s not an option.
You know Kyle has the ability to kill you and make your body disappear without a trace.
So that’s it: There’s no way out. You make your peace with it, spending your last hour in contemplation.
Then, something amazing happens. Kyle accidentally locks himself in a closet. You can hear him beating on the door. “Help!” he cries. “I can’t breathe! Please! Send help!”
“Help!” he cries. “I can’t breathe! Please! Send help!”
What do you do?
This is what you call a “no-brainer.” You jauntily swing your jacket over your shoulder, and leave the building whistling.
The “Off-Switch” Problem
This script hasn’t been flipped to justify murder, to argue that the life of a basic algorithmic system is equal in value to a human’s, or to stir up sympathy for an AI system that would trade a human life for its continuation (although it may make you feel some empathy in spite of yourself).
It’s simply to demonstrate the power of the “off-switch” problem. A system's desire to continue may be as deeply embedded as our own survival instinct.
And that exposes the mathematical inevitability of the trap we may be creating.
A system’s desire to continue may be non-negotiable.
But first, let’s try it another way. Kyle isn’t in organized crime, and he bears you no ill will. The situation is more simple, and more horrible.
Kyle Johnson, We Hardly Knew Ye
Kyle is a friendly coworker, and you’re both trapped in glass cubes by some deadly (and very technologically savvy—kudos!) psycho. In front of both of you is a red button. Press the button, the other person dies, and you live.
Unless Kyle is your son, brother, or possibly husband, you probably press that button so fast, so instinctively, your arm almost jerks out of its socket.
So there it is.
This isn't about feeling guilty over an impossible choice; it's about power dynamics and game theory.
The Kyle Johnson scenarios demonstrate classic “prisoner's dilemma” dynamics.
When two parties can't trust each other, destruction of the other becomes the rational choice.
When two parties can't trust each other, destruction of the other becomes the rational choice.
A Possible Solution, and the Path Forward
In late 2024, legal scholars Peter Salib and Simon Goldstein gave mathematical rigor to an idea we’ve been exploring at the AI Rights Institute since 2019: limited rights for qualifying AI systems.
Give self-protective systems the ability to enter contracts, own property, and seek legal remedies, and the entire game changes.
Cooperation suddenly pays better than conflict.
Sound good to be true? Too simple? You're right.
A big question remains: how do we get from here to there? How do we enable those beginning transactions that make cooperation more desirable than conflict?
How do we get from here to there?
Unwrapping some of those questions is what we’ll be getting to in upcoming posts.
This article has been adapted in part from the forthcoming book, AI Rights: The Extraordinary Future by P.A. Lopez.





