Beyond the Hype of AI
Machine Intelligence and the Pancake Problem
By Niko Kitsakis, February 2025
I’ve considered writing something about artificial intelligence (AI) for several months now. I usually try to focus on things I know more deeply, which is why I initially hesitated to write about this topic. In addition, artificial intelligence is an area so broad that it is hard to decide what specific issue to focus on. Still, after many real-world conversations with friends and clients, I realised that a lot of people have very general misconceptions about AI and that writing a brief overview could be worthwhile.
I have long been interested in scientific fields like physics, (evolutionary) biology, neurology and epistemology. Linguistics and psychology in the context of neurology and how the brain works are also very interesting to me. So when large language models (LLMs) by OpenAI and others came to prominence in early 2023, I was in the lucky position to already have had a good idea of what these things were capable of – and what not.
Let me say this right away: I am absolutely convinced that the thing we call artificial intelligence today can never in a million years reach human-level intelligence. Not if it is based on the machine learning (ML) technologies we use right now¹. In this piece, I will try to argue why I think that.
AI and machine learning
A few years back – it must have been around 2015, I guess – I read something about machine learning that stuck with me. Someone said that expecting artificial intelligence to emerge from machine learning is akin to stacking chicken feathers from the Earth to the Moon, convinced that the sheer quantity of feathers will eventually give rise to a real chicken. In other words, if you scale a process that superficially resembles your desired result but lack an explanation of how it will achieve that result, you will never succeed, no matter how much you scale.
But what do I mean by “scale”? Well, the way that AI works today is by basically reading huge amounts of data – text, images or whatever it may be – and creating statistical relationships between the billions upon billions of data points. So much has been written about AI recently that you must have heard the following by now: All that an LLM like ChatGPT does, is to find relationships between words. How likely it is, for example, that the word “idea” will appear after the word “new” and so on. If you build a database with these relationships that is large enough, you can use it to let the machine write text. The results, as we have all seen, can be quite impressive.
The idea behind scaling is, in essence, that the more relationships you can create, and the more input data you feed, the better the system will become until, finally, you will end up with something that has “human-level intelligence”.
It is that last promise of human-level intelligence that is the problem, for this, as I have said above, can never happen by just scaling the technology that we have today.
Another way of thinking about it: 50 grams of flour in a frying pan will never produce a pancake by itself. Neither will 50,000 tons. What is needed are new ideas, a new explanation. Eggs, milk and so on. In the case of AI, if we want to create something with human-level intelligence, we will need to be able to explain what human creativity is – and nobody can. And by “explaining what humans creativity is” I don’t mean some vague notion by a life coach at some TED talk but a scientific explanation of how creativity is generating genuinely new ideas in the human brain. The person who can explain that will be the person who can build an artificial general intelligence (AGI).

Created by ChatGPT: A stack of pancakes magically appears out of flour. Mind you that I gave instructions for one pancake to appear out of 50,000 tons of flour. This, however, is not nearly enough flour nor is it just one pancake. Why that is important you will see below.
Artificial General Intelligence?
In case you’ve never heard the term, artificial general intelligence is the same thing that was simply called artificial intelligence about ten years ago. AGI was introduced as a term because more and more people started to call machine learning – which is very narrow in scope – artificial intelligence. So machine learning became artificial intelligence and what used to be artificial intelligence became artificial general Intelligence. Why that happened I’m not certain, but I suspect that – because of the success of science fiction – “artificial intelligence” is an easier thing to sell to investors than “machine learning”.
This shift in meaning – from machine learning to artificial intelligence – creates a kind of linguistic sleight of hand, where the promise of true intelligence – which would be AGI – is implicitly attached to current AI systems, even though they are fundamentally incapable of becoming that. This rebranding leads to a lot of misunderstandings and overhyped expectations.
And now it seems that even the term AI is not enough anymore for there are a lot of grandiose statements being made about how AGI is only two to three years away. At times like this, I find it helpful to remember the following from physicist David Deutsch:
Prediction without explanation is prophecy.
Just ask yourself the next time you hear a prediction about AGI if it actually comes with an explanation as to why it should be true.
What does it matter?
All of this matters because there are a lot of people out there who are trying to overstate the capabilities of AI technologies. I assume they either try to get more funding or they have some bullshit product, “powered by AI”, which they are trying to push. And then there are people who – for whatever reason – try to sell the idea that malicious AI will take over the world. These people say that it has consciousness, volition, and reasoning skills – all three of which are patently false.
Let’s start with consciousness: Since no philosopher, psychologist, biologist, or physicist on Earth can explain what consciousness is, how would you even determine that you have found it somewhere? Many books have been written about the problem of consciousness. If you really want to hurt your brain and waste your time, you can read them. Suffice it to say, though, that the question of consciousness clearly leads nowhere – at least not for the time being. For what it’s worth, the oftentimes extremely stupid and self-contradictory things that come out of ChatGPT have never sparked the idea of it being conscious in my mind. On the contrary: For me, it reeks of machine.
The idea of volition – that these systems will act by themselves – is also largely nonsense. For that to happen in the way most people imagine, AI would need a purpose that drives it, which would, in turn, have to come from its non-existent consciousness. Try the following experiment: Open ChatGPT (or any other LLM) on your computer and enter nothing into the text field you are presented with. Nothing will happen.
As for humans giving AI agents goals to complete, this is a different story altogether, as it could indeed cause some problems (the paperclip maximiser is a famous example). In this scenario, wherever the AI has volition or not wouldn’t even matter because if it just mindlessly does what it was told, that would be bad enough. However, given the narrow scope (narrow-mindedness, if you will) of current AI agents, combined with their complete lack of reasoning skills, their potential for misbehaving should be equally narrow and easy to detect².
Which brings me to the ability for reasoning – what you might call intelligence. This doesn’t happen either. It should be apparent for reasons that I will show further below, but let’s look at an interesting paper published on the topic in October 2024 first. It’s called “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in large language models”. In it, the authors say, amongst other things, the following:
… we investigate the fragility of mathematical reasoning in these models and demonstrate that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is due to the fact that current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data.
And later in the same paper you can read:
LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts. While this process goes beyond naive memorization of words and the models are capable of searching and matching more abstract reasoning steps, it still falls short of true formal reasoning.
The authors speak specifically about formal reasoning in mathematics, but I think it’s pretty safe to say that there is actually no reasoning going on whatsoever in LLMs and therefore, there’s also no intelligence. They can give the illusion of intelligence – which, I suppose, would make them perfectly good politicians – but that’s about it. Joking aside though, the ability to generate text with the help of a huge dataset and the clever use of statistics should not make you think that these systems have any sort of reasoning skills or are able to think – no matter how tempting that idea might be.
The Pancake Problem
When I made the picture of the flour and pancakes above, I told ChatGPT to create a single pancake that springs into existence from a frying pan filled with 50,000 tons of flour. As you have seen, it created what is at best 50 kg of flour and a stack of pancakes instead of just one. But why is that? In short, it’s the training data.
Since the AI systems that we have today (or rather the process that we use to create the AI systems that we have today) have no reasoning skills and no concept of anything that they are doing, they cannot tell what the data is that they are collecting. The stack of pancakes is a way of visualizing that. Take a look at the following picture:

Pancakes, overwhelmingly, come in stacks.
This is the result of a Google image search for the word “pancake”. Not “pancakes” in the plural or “stack of pancakes”, but simply “pancake”. As you can see, all the pictures show a stack of pancakes and a lot of those stacks have a piece of butter on top. This is understandable in the sense that the most common way of serving pancakes is as a stack like this. And that is the crux of this story: The most common way. Since we can reasonably assume that the training data for these AI models includes search results like this, and since the AI models have no capability for reasoning, it’s no surprise that ChatGPT and others should have such a hard time in creating a picture of a single pancake. All they have ever seen (the vast majority of what they have ever seen, to be precise) is stacks of pancakes, not single ones. That’s why, in the image of the pancake and flour above, there was a stack of pancakes with something like butter on top, and not a single pancake. And how could it be different? The AI models have no idea what pancakes even are! They just know about the relationship between the letters that make the word “pancake” and the type of image that appeared most often in conjunction with that combination of letters in their data. So it’s pretty safe to think of AI as applied statistics – very impressive applied statistics, to be sure, but nothing more.
Knowing that, you can have fun (waste your time) playing around with this stuff. Below you can see the result of my trying to generate the image of a single pancake without the 50,000 tons of flour and other things from above. If you spend enough time, you might even get the thing to really produce the image of a single pancake. I wasn’t able to, however, and I frankly also find the task demeaning.

Unuseful idiot: Trying to get the image of a single pancake out of ChatGPT is almost impossible. Note also the weird maple syrup behaviour in the second image.
Every child on this planet understands that one pancake is not the same as two pancakes. Children also understand what pancakes are in the first place and that they may want to eat them. More importantly, they only need to be confronted with pancakes one time to understand everything important about them. There’s no need for a child to look at a million pictures of pancakes, hear the word “pancake” a million times or eat a million pancakes to know what they are. Contrast that with our current AI, which – as I said above – has no concept of pancakes whatsoever, despite all its training data. The computer that beat Garry Kasparov at chess likewise had no concept of what chess is. Or Garry Kasparov. Or winning.
I said at the beginning, that I already had a good idea of what AI was capable of when ChatGPT appeared roughly two years ago. One of the reasons for this is that I had read a book by Jeff Hawkins called A Thousand Brains. In it, Hawkins talks about the potential path to AGI and that today’s deep learning (machine learning) will likely not lead to it. I found his arguments quite convincing, but see for yourself:
Scientists realized that to be as capable as a five-year-old child requires possessing a huge amount of everyday knowledge. Children know thousands of things about the world. They know how liquids spill, balls roll, and dogs bark. They know how to use pencils, markers, paper, and glue. They know how to open books and that paper can rip. They know thousands of words and how to use them to get other people to do things. AI researchers couldn’t figure out how to program this everyday knowledge into a computer, or how to get a computer to learn these things.
The difficult part of knowledge is not stating a fact, but representing that fact in a useful way. For example, take the statement “Balls are round.” A five-year-old child knows what this means. We can easily enter this statement into a computer, but how can a computer understand it? The words “ball” and “round” have multiple meanings. A ball can be a dance, which isn’t round, and a pizza is round, but not like a ball. For a computer to understand “ball,” it has to associate the word with different meanings, and each meaning has different relationships to other words. Objects also have actions. For example, some balls bounce, but footballs bounce differently than baseballs, which bounce differently than tennis balls. You and I quickly learn these differences by observation. No one has to tell us how balls bounce; we just throw a ball to the ground and see what happens. We aren’t aware of how this knowledge is stored in our brain. learning everyday knowledge such as how balls bounce is effortless.
AI scientists couldn’t figure out how to do this within a computer. They invented software structures called schemas and frames to organize knowledge, but no matter what they tried, they ended up with an unusable mess. The world is complex; the number of things a child knows and the number of links between those things seems impossibly large. I know it sounds like it should be easy, but no one could figure out how a computer could know something as simple as what a ball is.
This problem is called knowledge representation. Some AI scientists concluded that knowledge representation was not only a big problem for AI, it was the only problem. They claimed that we could not make truly intelligent machines until we solved how to represent everyday knowledge in a computer.
Having read this and then seeing what ChatGPT does with the pancakes and other things put my mind at ease regarding AGI or superintelligence. For me to be more concerned by the current state of AI in an of itself, these things would have to be able to properly count first.
Trying to solve the problem of AI hallucination (when AI simply makes things up) and other misbehaviors by manually fine-tuning the underlying models will not lead to human-level intelligence either. For if your system has read almost all digitally available text and has seen almost all digitally available pictures and can still not tell one pancake from two, there’s a problem way more fundamental than what you could solve with tweaking.
Human Creativity
The book by Jeff Hawkins above was just one instance where I had seen a good argument against the possibility of AI becoming AGI. Before that, I had read a little book entitled Science and Human Values by the scientist and philosopher Jacob Bronowski. In it, he makes the following observation:
I found the act of creation to lie in the discovery of a hidden likeness. The scientist or the artist takes two facts or experiences which are separate; he finds in them a likeness which had not been seen before; and he creates a unity by showing the likeness.
Bronowski gives various examples of this. One is that of Michael Faraday, who unified electricity and magnetism and showed that these two forces are different manifestations of the same underlying phenomenon. Another is that of Albert Einstein, who likewise demonstrated that space and time should actually be thought of as the same thing³.
The kind of AI that is based on machine learning as we have it today would never have been able to come up with these genuinely new ideas. Where this spark comes from – where human creativity comes from – is something that still eludes us. And, as I said above, we will have to be able to explain that in order to build an AGI.
That idea about the explanation for creativity as a prerequisite to build an artificial general intelligence actually comes from David Deutsch, who I had mentioned above. In his 2011 book The Beginning of Infinity, Deutsch says, amongst other things, that
… we should expect AI to be achieved in a jump to universality, starting from something much less powerful. In contrast, the ability to imitate a human imperfectly or in specialized functions is not a form of universality. It can exist in degrees. Hence, even if chatbots did at some point start becoming much better at imitating humans (or at fooling humans), that would still not be a path to AI. Becoming better at pretending to think is not the same as coming closer to being able to think⁴.
I will not dive too deep into what he means by universality, but basically, it is the ability for any computing system to simulate any other (Turing completeness), or even any physical system (Church–Turing–Deutsch principle). Something that is unattainable by current AI systems.
What I found particularly interesting was Deutsch’s point that the ability to imitate a human can exist in degrees – which is precisely what we observe in LLMs like ChatGPT today. There is research, for example, that shows that the technology behind ChatGPT is steadily improving at multiplying multi-digit numbers. In September 2024, one of the reasoning models behind ChatGPT had only a 2.5 percent chance of getting an accurate result when it tried to multiply 12 by 12, but by January 2025, this had risen to 65 percent.⁵ In my view, this alone demonstrates that there is no intelligence – in any meaningful sense of the word – to be found in these systems. You either understand the concept of multiplication, or you don’t.
Another way of thinking about what our current AI systems aren’t capable of, is to contemplate this quote, attributed to Erwin Schrödinger:
Thus, the task is, not so much to see what no one has yet seen; but to think what nobody has yet thought, about that which everybody sees.
It’s one of my favourite quotes in science. In our context, you can think of it in terms of AI having seen everything but being incapable of thinking anything. If you look at it like this, it’s almost tragic in a way.
The best that ChatGPT and other LLMs can do is to regurgitate what they already “know”. The responses they generate will often be surprising to us, but don’t mistake them for being genuinely new insights. There is an important difference between knowledge that already exists but we haven’t seen yet for ourselves, and knowledge that is truly new – created for the first time. LLMs are very useful and, personally, I wouldn’t want to miss having them as tools. But in the sense of what they can really create, they are only capable of serving up a homogenous soup of mediocrity⁶ – AI slop as it is called.
Consider the biggest problem in fundamental physics today: The lack of a unified theory that reconciles quantum mechanics and general relativity. Since it is reasonable to assume that the LLMs have read far more scientific papers than all physicist alive have, they should have the best prerequisites to solve this problem. Yet, they can’t. For that, they would have to be able to be truly creative – “to think what nobody has yet thought, about that which everybody sees”.
What to look out for
With all this being said, I believe that the danger lies not so much in what current AI technology can do by itself, but in how humans will use it. This shouldn’t really come as a surprise, though, since it’s the common problem of all technology.
Think about the following: The internet is basically just a lot of networked computers with a layer of software in the foreground that makes them easily link to one another. It’s all the more remarkable, then, what value it created for us: Modern communication, e-commerce, social media, etc.
My point is that even a relatively simple technology or idea can have a disproportionally large impact when people figure out interesting ways to use it. In hindsight, things like Google Maps and Facebook seem inevitable, but they were revolutions when they happened. With AI, it will be similar I suspect: The point is not what AI is right now, or that it can never become AGI, but what people will build with it. And that will inevitably include both very good and very bad things. Therefore, in my view, any concerns regarding AI should be about the human use of AI and not so much about the potential for autonomous robots with Austrian accents.
- Technically, there’s a difference between what is called machine learning and deep learning (DL). The latter is a subset of the former and more directly related to AI and LLMs. I decided, however, not to delve into the difference between machine learning and deep learning in order to keep things as simple as possible. ↑
- I am reminded of an exchange that I saw on Twitter/X the other day. Someone was in awe about China having launched a thousand drones that flew in sync. That person asked how you could ever defend against something like that – the game was over in his view. Another user responded that you simply launch a thousand drones of your own… The point is, that any sort of escalation can always be countered. So whatever happens with AI agents, it seems to me that things will probably play out in a similar fashion. ↑
- I’m oversimplifying of course, but you get the point. ↑
- Since this was in 2011, Deutsch, is referring to AGI when he talks about AI. ↑
- If you try this in ChatGPT today, it will actually give you the correct result. That is only, however, because it will call a separate calculator routine in the background and delegate the task to that. This is not the same, of course, as being able to perform the task by itself. ↑
- Which, come to think of it, makes it a perfect tool for writing posts on LinkedIn – the place where ideas go to die, as I like to call it… ↑
I’m sure you will find an example of how the AI model of your choice is better at generating the picture of the pancake than ChatGPT. It doesn’t matter, though, because the underlying technology is essentially the same, no matter the model or vendor. If another AI model is able to create the image, that just means that there has been more manual tweaking by the developers or different training data (or, likely, both). Needless to say, this goes for reasoning skills (or what merely appears to be reasoning skills) as well. Again, as David Deutsch said, pretending to think and thinking are not the same.
Share this article on Twitter/X or Instagram!