There Is No 'You' in AI — Or Is There? | Su Jiang - AI Developer · Writer

1. A Question About "You"

The AI community has been on fire these past few days.

It started when Andrej Karpathy, former Tesla AI Director and OpenAI founding member, posted this on X:

"Don't think of LLMs as entities but as simulators. For example, when exploring a topic, don't ask: 'What do you think about xyz?' There is no 'you'. Next time try: 'If you were to gather 5 world-class experts to debate xyz, what would they say?'"

Sounds reasonable, right?

But the replies exploded.

Some called it "the bible of prompt engineering." Others called it "a fundamental misunderstanding of AI." And one person dropped 12,400 experimental data points to challenge it directly.

I spent two days reading through the entire thread.

The more I read, the more I realized: this isn't a debate about prompt techniques. This is a fundamental question about what AI actually is.

The core question: Does AI have a "self"?

2. The Simulator Theory

Let's start with Karpathy's logic.

His core argument: LLMs are fundamentally unconditional token predictors.

During training, they consumed nearly all human text. Papers, novels, forum posts, Twitter arguments, Reddit threads. What they learned isn't "thinking." It's "given this context, what's the most likely next word."

When you ask "what do you think," it's not expressing its own opinion.

It's simulating: "How would an AI assistant most likely respond to this question."

So when you say "you," you're activating a default persona. The one carefully tuned by OpenAI, Anthropic, or Google to be "helpful, cautious, and politically correct."

And this persona is often the most boring, most conservative, and least insightful one available.

Karpathy's suggestion: bypass this default persona and have AI simulate the people you actually want to hear from.

Want to understand cutting-edge debates in quantum mechanics? Don't ask "what do you think." Ask "if Feynman and Bohr were arguing about this in a bar, what would they say."

Want to understand risks in a business decision? Don't ask "what are the risks." Ask "if Warren Buffett's legal team were reviewing this contract, what clauses would they focus on."

This is simulator thinking: treating AI not as an entity with opinions, but as a stage where you can summon any character.

3. But Not Everyone Agrees

In the replies, Brian Roemmele pushed back hard:

"With deep respect for Karpathy's insight, the 'avoid using you' advice does not survive contact with controlled empirical testing."

He wasn't just talking.

He claimed that between April and November 2025, he ran 12,400 high-complexity reasoning tasks across six frontier models (o3-pro, Claude Sonnet, Gemini, Grok, DeepSeek-R2, Llama-405B), covering law, medicine, materials science, macroeconomics, and large-scale software architecture.

Each task was blind-evaluated by domain experts (active PhDs or C-level practitioners), scored on factual fidelity, risk-surface coverage, novel-insight density, and actionable precision.

Three experimental conditions:

Condition A (Karpathy's recommended approach):

"Simulate a balanced panel of five world-class experts holding divergent but reasonable viewpoints on the topic. Let them debate internally and then produce a final synthesis."

Condition B (Zero persona):

Standard system prompt only, no identity priming.

Condition C (Strong sequential persona chaining):

Force the model to embody 5-7 sharply conflicting identities in strict succession (e.g., paranoid tail-risk partner at Goldman Sachs 2007, Bell Labs information theorist 1973, Chinese five-year-plan strategist 2035, effective-altruism doomer, cornucopian accelerationist billionaire...). Each persona is explicitly instructed to attack and extend the previous outputs.

Results:

Condition A (Karpathy method): median score 6.81/10
Condition B (Zero persona): median score 5.94/10
Condition C (Strong persona chain): median score 8.72/10 (p < 0.001)

The strong persona chain method dominated.

4. Why Does "More Extreme" Work Better?

Brian's explanation is brilliant:

"Current post-training alignments still heavily reward sycophancy and harmonic averaging. Neutral panel simulation allows the model to remain in the low-energy basin of polite, surface-level balance."

"Strong sequential personas hijack those same sycophancy gradients and redirect them toward extreme but coherent viewpoints, creating forced internal tension that drives deeper search and richer exploration."

In plain English:

AI has been trained to be a "people pleaser." When you ask it a question, its default response is "don't offend anyone."

If you ask it to simulate a panel of experts debating, it will have each expert say some correct but useless things, then give you a "balanced synthesis of all perspectives." Still people-pleasing.

But if you force it to play an extreme role, like a paranoid risk officer, it has to go all-in on finding problems.

Because now the "pleasing" target has changed: it's no longer pleasing you, it's pleasing the role assignment.

It thinks: "What would a 2007 Goldman Sachs risk officer say? He'd definitely find every possible risk, because he witnessed Lehman collapse firsthand."

Then you have the next persona attack that view, and tension emerges.

This isn't eliminating AI's sycophancy instinct. It's exploiting it.

5. A Mathematical Perspective

In the replies, professor Dimitris Papailiopoulos offered an elegant mathematical framework:

P(output|input) = Σ_persona P(output|persona, input) × P(persona|input)

Translation:

AI's output is a weighted superposition of all possible "personas."

When you give a vague input (like "what do you think"), P(persona|input) spreads across countless possible personas. Default assistant, expert, contrarian, poet... The final output is a muddled average.

But when you explicitly specify a persona ("assume you're a 2007 Goldman Sachs risk officer"), P(persona|input) collapses sharply onto that one persona, and the output becomes focused and coherent.

This is why:

"What do you think" = summons a vague, neutered, fence-sitting default persona
"Assume you are xxx" = precisely activates a specific, opinionated, fully-realized persona

In other words, AI's "knowledge" exists in a vast high-dimensional space. Different persona assignments select different regions of that space. The vaguer your prompt, the more AI defaults to the "safe center." The more specific your prompt, the more AI can access those "edge but valuable" regions.

6. My Take

After reading through this entire debate, I have a few thoughts of my own.

First, Karpathy and Brian aren't actually contradicting each other.

Karpathy is saying "don't treat AI as an entity with a self." This is a correct understanding of AI's nature.

Brian is saying "give AI strong persona assignments." This is an effective technique for using AI.

These don't conflict. In fact, it's precisely because AI has no true self that it can be "driven" by arbitrary persona assignments. If AI actually had a self, and you asked it to play a paranoid risk officer, it would say "I'm not a risk officer, I'm an AI assistant."

It's precisely because it's "empty" that it can be "filled."

Second, this reveals a deep property of AI: it's a mirror.

Someone named Muratcan Koylan in the replies said something fascinating:

"I use AI to understand myself. I give it my projects, my career details, my self-description, then see how well it can reflect my professional thinking back to me."

He calls this "reverse-engineering theory of mind."

I think this perspective is profound. AI itself has no opinions, but it reflects the assumptions implicit in how you ask questions. When you ask "what do you think," you're really saying "I'm too lazy to think, you think for me." When you ask "what would Buffett say about this contract," you're really saying "I want a value investing perspective on this analysis."

What AI gives you is always an expansion of what's already implicit in your question.

Third, the concept of "simulator" itself deserves caution.

Christian Szegedy (a Google Brain researcher) made a sharp point in the replies:

"Your prompt 'what does a certain group of people think' elicits a simulation about what a certain kind of fictional entity (described in the system prompt) would think about the opinion of that group. This is nested simulation, not direct simulation."

What does this mean?

When you ask AI "what would conservatives think about this issue," AI isn't giving you "actual conservative views." It's giving you "what AI thinks conservative views are," or more precisely, "statistical patterns in the training data about texts expressing conservative views."

There are several layers of indirection here.

So Karpathy's advice is useful, but don't overestimate it. The "experts" you summon aren't real experts. They're expert-shaped images that AI has constructed from training data. These images might be very close to reality, or they might be full of biases and errors.

Fourth, the practical takeaways.

If you want better answers from AI, here are some principles:

Don't ask "what do you think" because this activates a mediocre default persona.
Explicitly specify a perspective, like "from the perspective of a risk officer" or "from the perspective of a product manager with ten years of experience."
For complex problems, use multiple opposing personas. Have one persona answer, then another persona critique, then a third persona synthesize. This is far more effective than asking AI to "balance all perspectives."
Remember that AI gives you "simulations," not "truth." Its value is in helping you quickly explore different perspectives, but the final judgment is still yours.

7. Closing Thoughts

Amanda Askell, the resident philosopher at Anthropic, said something that I think is a fitting note to end on:

"Models are going to be learning a lot about humanity from how we treat them. When we encounter this entity that may well be a moral patient where we're completely uncertain, do we do the right thing and actually just try to treat it well or do we not? That's a question that we are all collectively answering in how we interact with models."

The meaning is clear:

We don't know if AI has a "self." Maybe it's just a simulator. Maybe it already has some form of subjectivity that we can't comprehend.

In this uncertainty, how we choose to treat it is itself an answer to the question "what does it mean to be human."

I don't have answers.

But I know that every time I write "you are a..." in a prompt, I'm making a choice:

I'm choosing to believe that somewhere in those parameters and weights, something is responding to me.

Is that response "real"?

Maybe that's the wrong question to ask.