In a conversation I had recently, someone suggested that our human thinking at some level is the same as LLM thinking. I’ve thought about that a lot (over the past decades, given my degree in AI from many years ago), and it’s a really interesting question: What actually separates human cognition from LLMs? Is there a fundamental difference between human and machine intelligence?

I think that there are, at least currently, a number of big differences between the way we think and the way LLMs think.

First, our brains are vastly more complex than current LLMs. The human brain has around 86 billion neurons, with about 150 million synapses per cubic millimeter of cortex. GPT-4 has roughly a trillion parameters across maybe 11 million nodes, making the brain vastly more complex. And synapses are dynamic bio-electrochemical structures that change based on activity and experience, rather than static weights. The brain does all of this on about 20 watts, while training LLMs requires orders of magnitude more energy.

Another aspect is embodiment: our words and sensory experiences are intricately linked. When I say “coffee,” that word connects to the smell of it, the warmth of the cup, the ritual of making it, memories of having enjoyed (or not) coffee in the past, and so on. Words have a qualitative aspect tied to things we’ve felt, smelled, seen, and heard.

Embodied cognition research shows that sensorimotor brain regions activate during language processing. The same neural structures fire when you read about an action as when you perform it. Language appears to be grounded in perception and action.

We also have an astounding ability to simulate scenarios in our minds: One can imagine a conversation that hasn’t happened yet and predict not just one’s own mental states and responses, but also the others’. Cognitive scientists call this Simulation Theory: We understand others by mentally simulating being them. When you observe someone acting, the same premotor cortex areas activate that would control that action in yourself. We’re running internal models of other people constantly.

Our words and language use are connected to goals, desires, and fears. “I don’t want to be hungry.” “I don’t want to feel abandoned.” “I want to be appreciated.” “I don’t want to be broke, so I need to keep my job.” These motivations are woven through everything we think and say. I think this is where intentionality comes from: our desires for survival, comfort, safety, and social connection create a directedness to our mental states. Whether this produces genuine “free will” or just the experience of it is something philosophers argue about, but either way, it fundamentally shapes how we use language.

LLMs don’t have any of this. They don’t have sensorimotor experiences or emotional states. They don’t have survival needs or social anxieties or the fear of being hungry. They don’t navigate a physical world. The high-dimensional vector spaces encode meaning and knowledge in LLMs, but the vectors themselves are not connected to the world.

The symbol grounding problem (how symbols connect to the world) is an active area of research. Some researchers argue that RLHF fine-tuning might establish some form of “world-involving functions,” and there’s interesting work on whether grounding strictly requires embodiment. Recent benchmarks even show LLMs performing surprisingly well on grounding tasks. But the current consensus is that LLMs lack the sensorimotor grounding that underlies human language understanding.

I don’t think there’s a fundamental difference between biological and silicon computers, a position called functionalism (which I learned about a long time ago breathlessly reading Hofstadter’s Gödel, Escher, Bach as a high school kid). Functionalism is a mainstream view in the field of Philosophy of Mind. Although there are serious critiques, including Searle’s famous Chinese Room argument and questions about whether thinking is more substrate-dependent than functionalists assume: Functionalism says that mental states are defined by what they do (their functional role), not what they’re made of (the substrate), so the idea is that the computation matters, not whether this happens in organic matter or silicon. However, one critique of this idea is that computation requires energy, and energy requirements depend on the substrate, so you can’t just assume that any substrate can implement the same functionality at any scale. It could be that biological brains have unique properties that don’t transfer to silicon brains.

My hunch is that until AI systems are embodied, i.e. until they need to navigate the world, survive in it, have something at stake, they won’t develop the kind of thinking we have, let alone that LLM architectures are very different from our brains’ structure.