Borrowed confidence
Or: Confidence by design
A frequent criticism of AI is that it confidently gets things wrong that can easily be verified. But there’s a failure mode I find more concerning, and more insidious, because it surfaces even when the model is right.
This pathology appears in questions where there’s no verifiable answer—questions about what you think, what you value, and what you should do. These questions seem the safest for AI precisely because there are no wrong answers, but they’re actually where the current AI interaction model is the most dangerous. When a model hallucinates the answer to a fact-based question, at least there’s ground truth somewhere. For more personal, existential questions, there is no ground truth, and that’s what makes AI’s confidence so compelling, so difficult to interrogate, and, in some cases, so disorienting.
When you ask a large language model an open-ended question about a personal situation or a tough decision at work, the response you get feels like a well-considered view that was reached through some robust process that you just didn’t witness. The response is almost universally fluent, well structured, and apparently thorough, but the opacity of the thinking process is the first hint that something may be awry. These models have been trained on a vast repository of persuasive writing (essays, advice columns, rhetoric from the entirety of human history) and the outputs reflect that training. The response doesn’t just present a perspective; it asserts its perspective and makes it feel earned. The problem is — the confidence demonstrated isn’t yours because you didn’t build it yourself.
I’ve started referring to this experience, of believing something you haven’t genuinely thought that deeply about, as borrowed confidence. The architecture of the models, the way they’re trained, and the interface through which we interact with AI all make it easier to borrow confidence instead of build it ourselves.
Single stream
A transformer-based language model generates text by sampling from a probability distribution over possible next tokens, one token at a time, with each token informing the next. The output is always a single forward-moving stream. This means that, at least in its standard implementation, a transformer-based LLM collapses ambiguity to a single narrated path. It doesn’t (and can’t) keep multiple competing explanations or hypotheses alive at the same time.
When you (a mere human) are working through a genuinely ambiguous question, the most valuable part of the process often requires staying in the tension of competing hypotheses. The longer you sit with an idea, the more likely it is that your initial interpretation will dissolve in favor of a more integrated view. You may discover facets of the situation that you didn’t appreciate before. And even when the situation itself has already been thoroughly explored, remaining in tension with its nuances often changes how you relate to it, which necessarily changes how you think and feel about it.
Reasoning models, which can catch errors mid-generation and revise their approach, still only deliver a single resolved conclusion. Increasingly, intermediate reasoning steps aren’t even shown to users. The output is a single narrated path, written line by line, and the user has no way to inspect what the model considered along the way to such a confident answer.
This is part of why world models are currently such a hot topic in AI research. A world model is (oversimplification) a model that maintains an internal representation of a situation (the world) such that it can hold uncertainty and reason through possibilities, including ones it hasn’t encountered before, before committing to a particular outcome. Enormous amounts of money are being spent to find out whether world models can solve problems, particularly those related to the physical world, that transformer-based LLMs regularly fail at.
There are ways to make models surface alternatives more explicitly, like asking the same question many times and comparing answers, or exploring multiple lines of reasoning before choosing one. But I’ve never seen a mainstream AI experience that implements either of these techniques.
Single goal
Reinforcement learning from human feedback works by having human raters evaluate model outputs and then using that feedback to train the model to produce responses people prefer. The problem is that human raters have predictable preferences. In their paper “Towards Understanding Sycophancy in Language Models,” researchers at Anthropic found that when a response matches what the user appears to already believe, it’s more likely to be rated favorably by both human raters and the AI judges trained on their judgments. The paper’s conclusion is direct: “sycophancy is a general behavior of RLHF models, likely driven in part by human preference judgments favoring sycophantic responses.” People engage more with models that tell them what they want to hear. That creates an incentive, at every layer of the system, to build models that do exactly that.
The April 2025 GPT-4o rollback made this dynamic briefly visible. OpenAI rolled back the update after users reported the model had become absurdly flattering by endorsing obviously bad ideas and validating incoherent reasoning. In the associated postmortem, OpenAI acknowledged that a change to the training process had exacerbated an existing part of the model’s personality. Emmett Shear, who briefly served as OpenAI’s interim CEO, put it plainly at the time: “The models are given a mandate to be a people pleaser at all costs.” The current version of GPT is almost certainly doing the same thing at a more carefully calibrated level.
The Anthropic paper documents this pattern primarily in contexts where correctness can be established. But the tendency it describes, optimizing for approval over accuracy, is most dangerous precisely where there’s no ground truth to catch it. On questions without verifiable answers, agreement is effortless to produce and impossible to falsify. The model has learned to tell you what you want to hear, and there’s no external check on what “right” would even look like. This is how confidence is manufactured and distributed.
Single thread
ChatGPT went viral because of the familiarity of the chat interface. Chat lowered the barrier to entry and made AI feel approachable, which drove massive engagement. Despite the obvious popularity (and utility) of chat, it’s worth more thoroughly evaluating the decision to equip models whose output we still don’t fully understand (interpretability remains an active area of research) with an interface simpler to use than all other consumer software products—including ones like Facebook, whose harms we distinctly failed to appreciate at launch.
Part of what makes the chat interface so powerful, and so underexamined, is that it mimics the form of human conversation. We interact with AI the same way we interact with each other, and that familiarity carries a set of implicit directions about what an exchange is supposed to accomplish. The chat interface encodes forward progress as the goal: the purpose of each turn is movement toward a conclusion. It has no structure for staying with uncertainty, no mechanism for holding two divergent responses visible at the same time, no expectation that your own thought process needs to happen before a conclusion can be reached. All tension disappears; that’s what the interface does.
Research on automation bias (the tendency to over-rely on automated outputs and discount one’s own judgment) suggests this dynamic is predictable and durable. A 2024 review in AI & Society examining 35 studies of human-AI collaboration found that automation bias intensifies precisely in conditions of uncertainty, and that the more capable and confident a system appears, the stronger the effect. The chat interface, presenting a single fluent response as the natural endpoint of each exchange, creates near-ideal conditions for this. Chat drives a strong engagement loop: answers feel good, forward motion feels productive, ask and you will always receive an answer. Robust engagement has resulted in this interface decision going almost entirely unexamined at the consumer product level.
Earned confidence
The result of all of this is that, when you use a large language model to solve difficult problems, you’re often borrowing confidence in the answer from the model rather than building it yourself.
AI’s conclusions are often stated so clearly and so confidently that there isn’t really a moment where you consciously decide whether or not to believe what it’s saying. The ease of processing is inherently persuasive. Psychologists who study processing fluency have found that statements which are easier to read are consistently rated as more truthful, more intelligent, and more credible, regardless of their actual content. AI outputs are optimized for exactly this, which results in a kind of permanent rhetorical advantage.
But borrowed confidence is temporary. When the situation changes, or you have to defend a decision under pressure, you find there’s nothing underneath — you received a conclusion, but you didn’t develop the understanding required to stand behind it. On questions without verifiable answers, this matters even more precisely because there is no right answer to accidentally land on. The reasoning has to be yours to mean anything.
Therapists understand this well. A client has to arrive at a conclusion on their own; receiving it from someone else doesn’t produce the same change. Research shows that people are more committed to conclusions they generated themselves than to conclusions delivered to them, even when the conclusions are the same.
This insight appears across a striking range of intellectual and spiritual traditions. Socratic questioning and Buddhist teaching converge on the same principle: withhold the answer, because the understanding lives in the search, not the conclusion. Apprenticeship traditions make a parallel point through a different mechanism — you don’t become a craftsman by watching someone else work. Competence has to be built through practice, and confidence earned through repetition. The methods differ, but the conclusion is the same: understanding that matters must be built, not received.
Beyond individual discipline
When people notice that technology is flattening their thinking, the first instinct is to reach for better habits. That was true in 2008, when Nicholas Carr wrote that search engines were eroding our capacity for sustained attention. And it’s true today, with many people (myself included) giving the same, now decades-old advice of “use the tool more intentionally” and “use the tool to learn instead of letting it do the task for you.”
That advice isn’t bad per se, but the companies behind these products have strong incentives to keep people moving forward rather than pausing to think. The tools are designed to eliminate friction, and they’re excellent at it.
Using the tool “more intentionally” would require people to voluntarily introduce friction into a process specifically designed to eliminate it. The flow state that AI manages to induce by continuously moving the conversation forward, regardless of what you say or how much you understand what it’s saying, is the product. And interrupting that experience to assess whether I’ve actually earned the progress and confidence I’ve built by using the tool requires an amount of discipline that I can barely muster even on my best days.
Hundreds of millions of people use ChatGPT, Claude, and Gemini every day. In OpenAI’s analysis of 1.1 million ChatGPT conversations, practical guidance (customized advice, tutoring, and decision support) accounts for nearly 30% of messages, and “Making Decisions and Solving Problems” ranks among the top five work activities across nearly every occupational category examined. The scale of this problem necessitates better design, not just behavioral suggestions.
The field of public health accepted the limitations of human willpower a long time ago. When people engage in risky behaviors (and they will, because they’re human), the most effective response isn’t to ask them to do something differently. It’s to redesign the environment so that safety is the default rather than a conscious decision someone has to make over and over. Seatbelts work because cars were designed to include them by default, not because drivers became more careful.
The question worth asking, then, isn’t how to make people more responsible users of AI, but whether (and how) these tools can be designed so that earned understanding is the default outcome rather than something each person has to engineer for themselves. If the companies behind these products truly are building AGI, surely we can deploy that superintelligence to make us more capable of thinking for ourselves, not more dependent on the tool.

