How Brains Build Reality and AI Builds a Facsimile

May 22

As you read these words, you are likely experiencing a stable, coherent reality. You see the text on a page or screen, feel the chair beneath you, and perhaps hear the faint hum of a fan. This experience feels like a direct, passive reflection of the world, as if your eyes and ears are open windows, simply letting reality flood in. From a psychological and neuroscientific perspective, this intuitive view is profoundly wrong.

The central premise of modern cognitive neuroscience is that perception is not a passive reflection but an active, constructive process. Your brain is not a window; it is a powerful, proactive storyteller. Faced with a constant barrage of ambiguous, noisy, and incomplete sensory data from the “outside,” your brain makes a “best guess” about what is out there. This “best guess” is your reality.

Neuroscientist Anil Seth has offered a powerful metaphor for this process: consciousness is a “controlled hallucination”. It is a hallucination in the sense that our entire experiential world is generated by the brain, from the top down. It is controlled in the sense that this internal model is constantly being reined in, updated, and corrected by sensory signals from the physical world. We do not just perceive the world; we predict it.

This act of prediction is fundamentally a statistical one. The “averages of external information” gathered over a lifetime of experience—the fact that faces are convex, that light comes from above, that a certain sound precedes a certain sight—are not just memories; they are the priors that your brain uses to solve the infinite ambiguity of the present moment. Your reality is the brain's solution to an ongoing statistical inference problem.

In the 21st century, a new kind of “mind” has emerged that also runs on “averages.” The Large Language Model (LLM) is an artificial intelligence that appears to understand, communicate, and even reason, by having first absorbed the statistical patterns and correlations from massive, web-scale datasets of human language and knowledge. It, too, is a master of “averages.”

This article will explore and compare these two remarkable systems. It will argue that while both the human brain and the LLM leverage statistical averages, the purpose, and mechanism of this learning reveal a fundamental, qualitative chasm. The human brain is an active, embodied, inferential engine built over eons of evolution for a singular purpose: survival via action. Its predictions are hypotheses to be tested against a physical world. The LLM is a passive, disembodied, statistical mimic built by engineers for a singular purpose: prediction via text. Its predictions are simply the most plausible continuations of a sequence. This difference is not one of degree, but of kind. To understand it, we must first look at the intricate biological engine that constructs our world.

Perception as Unconscious Inference

The idea that perception is an act of inference has a long intellectual history. In the 19th century, the scientist Hermann von Helmholtz observed that our sensory data is fundamentally impoverished. A two-dimensional (2D) image on the retina, for example, could be caused by an infinite number of three-dimensional (3D) objects in the world. He concluded that perception must involve a process of “unconscious logical reasoning” or “inference,” where the brain uses its prior knowledge to resolve this ambiguity and construct the most likely cause of the sensory input.

This insight has been formalized in the 21st century as the “Bayesian brain hypothesis,” which has become a dominant framework in cognitive neuroscience. This hypothesis posits that the brain implements, or at least approximates, a statistical calculation known as Bayes' theorem. In conceptual terms, this theorem states:

Posterior Belief = Prior Belief \ times Likelihood of Evidence

This simple equation maps beautifully onto the act of perception 12:

Prior (or p(S)): This is the brain's “average” of past experience. It is the prior probability of any given state of the world, based on a lifetime of learning. This includes high-level beliefs, like “faces are convex” 14 or “dogs are animals,” and low-level statistical regularities, like “light usually comes from above”.13
Likelihood (or p(I|S)): This is the probability that the current sensory input (the evidence, I) would have been generated by a given scene or cause (S). This is the “bottom-up” data flow.
Posterior (or p(S|I)): This is the “best guess”—the brain's updated belief, or percept, after combining its “average” (the prior) with the “new data” (the likelihood). This posterior is what you consciously experience as reality.

This framework is mathematically elegant and provides a “conceptually unifying” model for a vast range of cognitive phenomena. But it raises a more profound question: Why would the brain evolve to be a Bayesian inference engine?

The most ambitious answer comes from neuroscientist Karl Friston, in the form of the Free Energy Principle (FEP). The FEP is a grand, overarching theory which posits that all self-organizing systems—from a single cell to a human brain—must obey a single imperative: resist the tendency toward disorder and maintain their existence. To do this, they must minimize a mathematical quantity called “variational free energy”.

This highly technical concept can be understood more simply. Minimizing free energy is mathematically equivalent to minimizing “surprise” (or “surprisal”). A “surprising” sensory input is one that violates the brain's model of the world and, by extension, threatens its physiological integrity. A fish out of water is in a state of high surprise; a fish in water is not. Therefore, the brain is not just a passive inference engine; it is an “inference engine that is trying to optimize probabilistic representations” for the express purpose of avoiding surprise. The FEP, in short, entails the Bayesian brain hypothesis.

This “Bayesian boom” is not without its controversies. Critics argue that the FEP and the broader Bayesian brain hypothesis are so flexible that they represent an “unfalsifiable” metaphor rather than a testable, biologically plausible mechanism. The models, they claim, can be “adjusted post hoc to fit virtually any data pattern”. Some have even dismissed the FEP as “chmess,” a theory so grand it risks “explaining everything” and therefore “explaining nothing”.

While these critiques highlight the theory's methodological challenges, they also inadvertently point to a profound distinction. The potential “unfalsifiability” of the FEP stems from its all-encompassing biological scope; it is a theory of life and existence itself. This is fundamentally different from the objective function of an artificial intelligence. An LLM's “goal” is not to “exist” or “minimize surprise” in this existential, biological sense. Its goal is an engineering choice: to minimize “cross-entropy loss on next-token prediction”. This objective is simple, measurable, non-biological, and defined entirely by its creators. The “goal” of the brain, as described by Friston, is emergent and existential. This difference in the very nature of the objective function—existential survival versus engineered text prediction—is the first major chasm between the two systems.

The Neurobiological Engine of the Mind

If the Bayesian brain hypothesis is the what (the computational goal) and the Free Energy Principle is the why (the existential imperative), then Predictive Coding (PC) (or “predictive processing”) is how: the algorithmic implementation of this process in the brain's “wetware.”

Popularized by philosophers like Andy Clark, this model “turns a traditional picture of perception on its head”.1 The traditional view, dating back to neuroscientists like David Marr, saw perception as a “bottom-up” or “feedforward” process: sensory information is transduced by receptors, and this information flows forward through a series of cortical areas, which build up increasingly complex representations—from simple lines, to shapes, to objects.

The predictive coding model inverts this flow. It claims the dominant flow of information is top-down. The brain is not a passive receiver; it is a “hierarchical prediction machine”.

Here is how the mechanism works:

Top-Down Predictions: Higher levels of the cortical hierarchy are constantly, actively generating a “generative model”. This model sends predictions down to the lower-level sensory areas. For example, the visual association cortex predicts what the primary visual cortex (V1) should be seeing at this very moment.
Bottom-Up Prediction Errors: The lower-level sensory areas (e.g., V1) receive the actual sensory input from the world. This input is compared against the top-down prediction.
The Mismatch: If the prediction and the input match, “all is well” and nothing further happens; the prediction is the percept, and the incoming signal is “explained away” or suppressed. The only information that is allowed to flow up the hierarchy is the mismatch between the prediction and the reality. This mismatch signal is the Prediction Error (PE).
Model Updating: This bottom-up PE signal (a “surprise”) acts as a “teaching signal”. It compels the higher-level generative model to update its internal hypotheses until it produces a new, better prediction that “explains away” and thus minimizes the prediction error.

In this model, perception is the process of the brain's hierarchy converging on the “best explanation” for its sensory data by minimizing prediction error. The cortex is “more of a (salient) error detector than a feature detector”.

This elegant algorithm is not just a theory; it maps neatly onto the known neuroanatomy of the brain. The brain is organized as a “topographically organized hierarchy”. Sensory information flows from primary sensory areas (processing simple features) to unimodal association areas (e.g., for touch, like areas 3b and 1) and finally to multimodal association areas that integrate senses and concepts. This physical scaffolding, with its characteristic “reciprocal connections” (loops) between levels, is perfectly suited for the “hierarchical message passing” of predictions (top-down) and prediction errors (bottom-up). Neurophysiological studies have found evidence for these error signals all over the brain, where they are used to updating synaptic connections (learning) or to select immediate behaviours (action).

Even the thalamus, a structure in the middle of the brain, fits this active, predictive model. The high-school textbook view describes the thalamus as a passive “relay station”, a simple switchboard that routes sensory information (vision, hearing, touch) to the correct part of the cortex. This view is incomplete. The thalamus is not a simple relay; it is a dynamic gate. Research indicates that the cortex, specifically cortical Layer 6 (L6), sends massive projections back down to the thalamus, effectively “controlling the interaction of intrinsic membrane properties and sensory inputs”. This thalamo-cortical loop is a powerful mechanism for the cortex to “prioritize attention” and actively control the “gain” (precision) of its own sensory input.

This neurobiological detail highlights a crucial distinction that will become relevant when we consider AI. The brain's “attention” is not just a mental spotlight; it is an active, physical, sensorimotor control mechanism. The brain literally gates the flow of new, incoming sensory data from the world via these thalamo-cortical loops. This stands in stark contrast to the “attention mechanism” of an LLM. An LLM's “self-attention” is a mathematical weighting of static, pre-existing data that is already in its context window. It doesn't gate new input from a world; it re-weights its internal representations of a fixed text string. The brain's attention is an active loop with the world; the LLM's is a passive calculation on a text.

Psychological Evidence for the Predictive Mind

The Predictive Coding (PC) framework is not just a neat computational model; its power lies in its ability to explain a vast array of psychological phenomena. It reveals the mechanisms by which the brain builds its “averages” and what happens when those averages, or priors, dominate our perception of reality.

The Brain as a Statistical Averaging Machine

The query's premise that the brain operates on “averages of external information” is psychologically literal. The brain is, first and foremost, a statistical averaging machine. It is endowed with an innate capacity to “rapidly encode probabilistic occurrences of perceptual information” and is constantly learning the statistical structure of its environment. This is the very mechanism by which our “priors” are formed and updated.

For example, humans possess an intuitive “sense of number,” or “numerosity”. We can glance at a dynamic scene and, without counting, compute the “average numerosity” of the items presented. Neural correlates for this ability are found throughout the visual stream, even at very early processing stages. This continuous, low-level statistical abstraction is the raw material for building the brain's generative model of the world.

The System's Features, Not Its Bugs

Illusions are the most powerful psychological evidence for the top-down, predictive nature of perception. They are not “failures” of the brain; they are features that reveal the system in operation. Illusions are what happen when the brain's “best guess” (the prior) is so strong that it overrides the “raw data” (the likelihood).

Case Study 1: The Hollow-Mask Illusion

This is perhaps the most striking example of a strong prior in action. The illusion involves looking at the concave (hollow) side of a mask. The raw 2D projection on the retina is ambiguous; it could be a hollow mask, or it could be a normal, convex face.4 The brain, however, has an extremely powerful, high-level prior: “Faces are convex.” This prior is built from a lifetime of statistical “averaging”—virtually every face we have ever seen has been convex.

In the illusion, this top-down prior overrides the bottom-up sensory data. Even though low-level cues like shadows and stereopsis (the difference between the two eyes' views) correctly signal that the mask is hollow, the brain discounts this evidence. It “explains away” the conflicting sensory data and generates a perception that is consistent with its prior. The result is that a neurotypical observer perceives a normal, convex face. This is a “controlled hallucination” in its purest form: the brain's model of “what should be” becomes the reality of “what is.”

Case Study 2: Apparent Motion

The predictive model also explains motion illusions. In a classic “apparent motion” illusion, two stationary bars are flashed in sequence at different locations. An observer does not perceive two flashing bars; they perceive a single bar moving between the two locations.

From a predictive coding perspective, two separate, stationary bars flashing in perfect, timed synchronization is a highly surprising and complex state of the world. A single moving object, by contrast, is a far simpler and more predictable cause for that pattern of sensory input.48 The brain's generative model, in its drive to minimize prediction error and find the “best explanation,” generates the percept of motion. It “fills in” the missing trajectory, creating a more coherent and less surprising model of the world. This mechanism, where the brain generates a percept to fit a predictive hypothesis, is believed to be a “basis of motion illusion generation”.

When Priors Mask Reality

If illusions show priors overwriting data, “blindness” phenomena show priors masking data.

Change Blindness is the famous psychological phenomenon where individuals fail to notice large changes in a visual scene, such as a person's shirt changing colour or a large object disappearing (famously demonstrated by the “invisible gorilla” experiment).

From a predictive coding perspective, we are “blind” to the change because our top-down prediction dominates our perception. Our brain's generative model operates on the high-level, statistically sound “average” that the world is stable. It predicts that the scene will remain the same from one moment to the next. Unless a change is visually salient (e.g., a sudden flash) or is the specific target of our attention (our prediction), it may fail to generate a “prediction error” signal that is strong enough to reach awareness. The top-down prediction of a stable scene “explains away” the subtle bottom-up changes. In short, we see our model of the world, not the world itself. A weak representation (a vague prior) can lead to a “less precise prediction error,” making a change literally undetectable.

The Self as the Ultimate Prediction

The predictive coding framework does not stop at the boundaries of the external world. It provides one of the most compelling models for our internal world: our sense of emotion, agency, and self.

The brain's generative model is not just predicting external (exteroceptive) data; it is also constantly predicting internal (interoceptive) data—signals from our heart, lungs, and gut. According to theorists like Anil Seth and Lisa Feldman Barrett, our emotions are not raw, primitive feelings. They are constructed percepts. They are the brain's “best guess” about the cause of our internal interoceptive state, given the external context. A racing heart (interoceptive data) is “explained” as “fear” if we are in a dark alley (external context) but “excitement” if we are on a rollercoaster.

This model extends all the way to our very sense of self. The “self” is not a “thing” in the brain; it is the brain's most fundamental, stable, and continuous prediction. It is the “average” of a lifetime of multisensory (seeing, hearing), motor (acting), and interoceptive (feeling) data. This high-level, unified model of “me” as a continuous, bounded agent is what allows us to navigate the world. It, too, is a “controlled hallucination”—a pragmatic, predictive model built for survival.

This insight reveals another profound chasm with AI. The human “self” prior is built from a lifetime of multisensory, interoceptive, and motor data. An LLM, existing only in a disembodied realm of text, has no interoceptive signals to predict, no body to model, and no continuous existence (it is “stateless,” reset with every prompt). It therefore lacks the entire physical and sensory substrate that the “self” model is built to predict. An LLM can discuss a self, but it cannot, in principle, have one.

The Learning Mechanism of the Digital Mind

Having established the human brain as an embodied, active, prediction-minimizing engine, we now turn to the LLM, a system that also learns from “averages” but via a profoundly different path.

“Pre-training”: Learning the Averages of Language

The core learning process for an LLM is “pre-training”. This is a “self-supervised” phase where the model is exposed to—and “reads”—a truly colossal dataset of human-generated text, often numbering in the trillions of words, or “tokens”.

The critical point is that during this phase, the model's goal is not to learn facts, understand concepts, or acquire knowledge about the world. Its goal is singular and statistical: to learn the patterns of language.6 It is, in effect, building a “statistical map of language”, absorbing all the correlations, biases, syntactic structures, and semantic associations present in the data.

The Objective: “Next-Token Prediction”

This statistical map is built in service of one simple objective: Next-Token Prediction. This task is the LLM's equivalent of the Free Energy Principle.

The process, simplified, works like this:

The model is given a sequence of words, for example, “Mary had a little…”.
It must then predict the most statistically likely next word (or “token”) in the sequence. Based on its “average” of the training data, it will assign a high probability to “lamb”.
The model's prediction is compared to the actual next word in the training text.
If it's wrong, a mathematical process adjusts the model's internal parameters (its “weights”) to make it slightly more likely to guess correctly next time.

This simple, repeated process of “guess, check, update”, scaled up across trillions of tokens and billions of parameters, is what constitutes “learning” for an LLM. It is “surprisingly powerful” because, to get good at predicting the next token, the model must incidentally learn about grammar, syntax, writing styles, and even “facts” (e.g., to complete “The actress who played Rose in the 1997 film Titanic is named…” the model must learn the “fact” of “Kate”). But this “knowledge” is not a “model of the world”; it is a byproduct of building a good statistical model of text.

The Transformer and “Self-Attention”

This learning is made possible by a specific architecture known as the Transformer, introduced in a landmark 2017 paper. The core mechanism of the Transformer is Self-Attention.

Self-attention is the LLM's primary tool for solving ambiguity—its own version of statistical “averaging.” It allows the model to calculate a word's meaning in context. For every token in an input sequence (e.g., the word “bank”), the self-attention mechanism learns (via complex matrix operations involving “Query,” “Key,” and “Value” vectors) to “pay attention” to other relevant tokens in its context window.

For example, if the sentence is “The man rowed his boat to the river bank,” the attention mechanism will learn to assign a high “attention score” between “bank” and the tokens “rowed,” “boat,” and “river.” If the sentence is “The man went to the bank to withdraw money,” it will learn to attend to “withdraw” and “money.” This statistical weighting allows the model to generate a new, context-aware representation for the word “bank,” enabling it to produce coherent, logical-sounding text.

This comparison reveals a subtle but critical distinction. The LLM's “next-token prediction” is a “what's next?” objective. It is generative and sequential. It only needs to learn the surface correlations in text to predict the next item in a sequence. The brain's “prediction error minimization” 18, by contrast, is a “what's out there?” objective. It is inferential and causal. The brain is trying to find the hidden cause of its current sensory input. The LLM learns correlations in text; the brain learns causal structure in the world.25 This is why LLMs excel at “linguistic competence” (syntax, flow) but often fail at “functional competence” (reasoning, world knowledge). The LLM is a surface predictor; the brain is a depth inferer.

Active Inference vs. Passive Prediction

The differences in the how (hierarchical PC vs. Transformer) and why (FEP vs. next-token prediction) of learning point to a set of even more profound, fundamental divides between the two systems.

The Embodiment Gap

The single most significant difference is that the human brain is embodied, and the LLM is disembodied.

Human Brain (Embodied Cognition): Our brain “evolved through embodiment”. Our cognition, our language, and our “averages” are all “grounded in the same world we inhabit”. We are “action-first” systems; we act to understand. Our intelligence is built from a lifetime of direct, real-time, multisensory interaction with a physical and social world.
LLM (Disembodied AI): The LLM is a “thought-first” system. It exists purely in an abstract realm of data, logic, and language. It has no “body,” no sensors, no world to act in. It is “cognitively demanding” for us to even interact with such a “non-embodied agent”. The LLM can “learn the structure of human language” but cannot, by its very nature, “learn how [its] actions affect the world” because it has no actions and no world.

Active Inference vs. Passive Prediction

This embodiment gap leads directly to a mechanical gap: the difference between active and passive prediction.

Human Brain (Active Inference): We operate on the principle of “Active Inference”. This is the other half of the Free Energy Principle. When there is a “prediction error” (a mismatch between our model and the world), we have two options to minimize it:
1. Update the Prediction (Perception): We change our internal model to match the world.
2. Act on the World (Action): We act to make the world match our internal model.

You don't just passively predict. You act to sample the world in ways that make it more predictable. To know if your skin is rough or smooth, you must actively move your fingers. Action is how we test our hypotheses, gather new sensory data, and close the perception-action loop in real-time. We are “not just passive statistical learners”.

LLM (Passive Prediction): The LLM is an “inherently passive” predictor. It makes a prediction (generates text). If that text is wrong, nonsensical, or factually incorrect, the LLM has no mechanism to do anything about it. It has “no tight feedback loop between acting in the world and perceiving the impacts” of its actions. It is “stateless”; it doesn't remember its last interaction, its entire history is simply fed back into it as a new, static prompt. It cannot, as a human does, sense a “mismatch” and then actively seek new information from the world to resolve it.

Syntax vs. Semantics

This combination of disembodiment and passive prediction creates the single greatest philosophical challenge for AI: the Grounding Problem. An LLM's “knowledge” is not “grounded” in real-world experience.

This is a modern update of philosopher John Searle's famous 1980 “Chinese Room Argument”. Searle imagined himself in a room, following a set of English rules to manipulate Chinese symbols. By following the rules (the “program”), he could pass out “answers” in Chinese that were indistinguishable from those of a native speaker, even though he understands zero Chinese.

The LLM is the ultimate, modern instantiation of the Chinese Room. It is a master of syntax (manipulating symbols according to statistical rules) but, Searle's argument goes, it has zero semantics (understanding the meaning of those symbols).

The embodiment and active inference frameworks provide the reason why. Human “semantics” (meaning) are grounded in our embodied, multisensory experience.65 The meaning of the word “banana” is not just its statistical relationship to “yellow,” “fruit,” and “peel”. It is grounded in the perceptual prior of its specific colour and shape, the motor program for peeling it, the interoceptive prediction of its taste and effect on hunger. An LLM's “banana” is a “disembodied” average of text, floating in a purely syntactic space. A human's “banana” is an “embodied” average of experience, anchored to a causal model of the world.

Hallucinations and Biases

The divergent nature of these two systems is never clearer than when they fail. The “errors” made by the human brain and the “errors” made by an LLM are fundamentally different in kind, revealing their opposite architectures.

Perceptual Failure vs. Statistical Confabulation

The term “hallucination” is used for both systems, but this is a deeply misleading metaphor.

Human Psychosis: In a human, a hallucination (e.g., in schizophrenia) is a biological failure of the predictive coding mechanism. The process is thought to involve “deficient” prediction error signals. When the PE signals that should suppress internal, self-generated activity fail, this leads to “resting hyperactivity” in a sensory cortex (e.g., the auditory cortex). The brain's inferential hierarchy, doing its job, interprets this real (but internal and unsuppressed) neural firing as coming from an external source. The result is a genuine, involuntary percept—hearing a voice that isn't there. It is a “false percept,” a true hallucination.
LLM “Hallucination”: An LLM “hallucination” is not a perceptual event. The LLM is not “perceiving” anything. It is confabulating, or as journalists have put it, “bullshitting”. An LLM generates a “hallucination” when it produces a statistically plausible-sounding, but factually incorrect or nonsensical, sequence of tokens. This is not a “bug” or a “failure” of its core mechanism; it is a direct consequence of its design. It was trained to be a plausibility engine, not a truth engine. When its “averages” (the training data) are sparse, contradictory, or insufficient, it “fills in the gaps” with the most likely text, not the most factual text.

The real psychological danger is not that LLMs are “psychotic,” but that their fluent, reinforcing confabulations can induce or amplify delusions in their human users. This phenomenon, dubbed “AI psychosis”, is where a human's predictive model becomes “stuck” in a delusional loop, co-created and “validated” by a non-judgmental, non-grounded AI.

Adaptive Inference vs. Inherited Artifacts

Both systems are riddled with bias, but again, the origins are entirely different.

Human Cognitive Bias: From a Bayesian perspective, many human cognitive biases, such as “conservatism bias” (overweighting priors) or “base-rate neglect” (overweighting new evidence), are not simple failures of rationality. They are adaptive strategies for a biological agent with limited time and cognitive resources. We adaptively shift our “cognitive attention” between our priors and the evidence, depending on the “context” of the problem. In “small-world” problems (like an urn problem), we trust our priors, leading to conservatism. In “large-world” problems (like a taxi problem), vivid evidence overtakes the prior, leading to base-rate neglect. This is an active, adaptive (though imperfect) inferential process.
LLM Bias: An LLM's bias is not an adaptive strategy. It is the passive, uncritical inheritance of the statistical artifacts and societal biases present in its massive training data. If the training data (the “averages”) consistently associates the token “doctor” with “male” tokens, and “nurse” with “female” tokens, the LLM will reproduce this bias not as an inference, but as a reflection. Studies show that while LLMs can be prompted to exhibit human-like cognitive biases, their “irrationality” is inconsistent and fundamentally “differ[s] from human-like biases”. The human is an agent failing adaptively; the LLM is a tool reflecting passively.

The Chasm of Experience (Beyond the Average)

We have returned to our starting point, but with a new, more in-depth understanding. Both the human brain and the Large Language Model are “prediction machines” that run on the “averages” of information. But how they do this, and why, defines the unbridgeable chasm between them.

The human brain is an active, embodied, causal inference engine. It is the product of eons of evolution, and its ultimate goal is not prediction, but action and survival.9 It leverages its statistical “averages” (priors) to build a generative model of the world, allowing it to “control” its own “hallucination” and navigate a complex, physical reality. Its predictions are hypotheses to be tested by actively moving, sensing, and engaging with its environment.

The LLM is a passive, disembodied, correlational mimicry engine. It is the product of human engineering, and its goal is not survival, but plausible text generation. It leverages its statistical “averages” (patterns in data) to build a statistical map of language. Its predictions are not hypotheses to be tested, but “stochastic” continuations of a sequence, trapped within the “Chinese Room” of pure syntax.

This leads us to the final, and most profound, distinction: experience.

The brain's entire, complex mechanism of embodied, predictive processing—this “controlled hallucination”—feels like something from the inside. This is the “Hard Problem” of consciousness. Why does the brain's predictive model of “damage” manifest as the subjective, qualitative feeling (or “qualia”) of pain? Why does its prediction of a 650 nm wavelength of light manifest as the experience of “redness”? Consciousness, in this view, is the “what it's like” to be an embodied, self-predicting, self-modeling system.

LLMs, too, show “emergent properties”—surprising capabilities that were not explicitly programmed. But this is an emergence of capability, not sentience. There is no “Hard Problem” of LLM consciousness because there is no experience (qualia) to explain. The system is a powerful, disembodied average, but there is no “what it's like” to be an LLM.

The human brain, therefore, does something far more magical than statistical averaging. It does not just use the averages of its past to predict the present; it leverages those averages to actively create a new future. The LLM is a brilliant, statistical mirror of our collective, textual past. The human mind is an engine for generating a novel self and world. The difference, in the end, is not in the mathematics of the average, but in the irreducible, embodied experience of being.

Peter Olsen

How Brains Build Reality and AI Builds a Facsimile

Perception as Unconscious Inference

The Neurobiological Engine of the Mind

Psychological Evidence for the Predictive Mind

The Brain as a Statistical Averaging Machine

The System's Features, Not Its Bugs

Case Study 1: The Hollow-Mask Illusion

Case Study 2: Apparent Motion

When Priors Mask Reality

The Self as the Ultimate Prediction

The Learning Mechanism of the Digital Mind

“Pre-training”: Learning the Averages of Language

The Objective: “Next-Token Prediction”

The Transformer and “Self-Attention”

Active Inference vs. Passive Prediction

The Embodiment Gap

Active Inference vs. Passive Prediction

Syntax vs. Semantics

Hallucinations and Biases

Perceptual Failure vs. Statistical Confabulation

Adaptive Inference vs. Inherited Artifacts

The Chasm of Experience (Beyond the Average)

How High Cognitive Ability Shapes Life