Skip to main content
Artificial Intelligence

Beyond the Turing Test: A New Path to Artificial General Intelligence

“The main power of Artificial Intelligence is not in modeling what we already know, but in creating solutions that are new.” - Risto Miikkulainen

Headlines these days are filled with impressive achievements from AI systems such as ChatGPT, AlphaGo, GPT-3, DALL-E and LaMDA. In fact, the LaMDA chatbot played the imitation game so well that it convinced a Google engineer of its sentience. With a tidal wave of investment dollars backing creative/generative AI platforms, it seems the dream of Artificial General Intelligence is near.

Is it, though? Are we indeed that close? Is it possible that upon a closer look, we would discover “The Turk” hidden inside almost all of our latest and greatest AI achievements? The ability for AI to imitate humans, as measured by the Turing Test, is no longer an adequate measure. Instead, the Bogdan Test may hold the key to developing and measuring true Artificial General Intelligence, developed with Evolutional Computing methods. 

  1. The Turk

On a chilly fall day in 1769, the empress of Austria-Hungary – the beautiful and powerful Maria Theresa Walburga Amalia Christina – summoned her trusted thirty-five-year-old civil servant, philosopher, writer, and inventor Wolfgang von Kempelen to her palace. The reason for such an urgent request was an unexpected visit from a French sorcerer who promised to show the empress his latest magic. Her majesty, being well-educated, was naturally skeptical about the French conjuror’s claims, so she wanted Kempelen to help her debunk the tricks they were about to witness. Unfortunately, the magician was so good that neither the empress nor her “science advisor” could explain the spectacular, jaw-dropping machinations that wowed the entire empress’ court that day. After the show, Kempelen was so disappointed, angry, and jealous that he promised himself to invent something so magical that no one in the world could ever top it off!

In the middle of the 18th century, intricate and elaborate mechanical devices were the most popular form of amusement for the educated aristocracy. Contraptions that resembled certain mechanical movements of people or animals were at the top of the list. However, nothing was more intriguing at that time than the clock-work-based devices that seemed to be able to think. All sorts of mechanical animals, musical boxes, clockwork monks, and even a statue that spoke and listened via a speaking tube were built by the top masters of the early renaissance era. Of course, as the game of chess was always considered to be the pinnacle of human intellectual abilities, it’s no wonder Wolfgang von Kempelen chose chess as the foundation for his invention. After a year of tireless engineering work, he created a mechanical man dressed in an oriental costume, seated behind a wooden cabinet, and capable of playing chess. Kempelen intended his chess-playing machine to amuse the court and advance his career by impressing the empress. However, the success of his invention far exceeded his wildest dreams. “The Turk” – the name the public has given his machine because of its oriental-inspired costume design – achieved widespread fame throughout Europe and America, far beyond the Austria-Hungarian court. The Turk influenced many historical figures during its eighty-five-year life, including Benjamin Franklin, Catherine the Great, Napoleon Bonaparte, Charles Babbage, and Edgar Allan Poe. It has become the subject of numerous myths and anecdotes. It inspired countless legends and apocryphal tales, the truth of many of which we will never uncover. In fact, Kempelen’s Turk has become the most famous automaton in history. However, the chess-playing robot brought both triumph and severe anguish to his inventor. The Turk was a hoax! It was not a true automaton but an intricate mechanical contraption controlled by a human chess master hidden inside of it.

It took another 200 years before a real chess computer could outsmart the human Grand Masters. The Kasparov versus Deep Blue series of 1996-1997 captured our imagination. We were so thrilled and fascinated by this triumph of modern AI science that we declared it “the true machine intelligence.” The steady flow of computer science’s consequent impressive achievements, including AlphaGo, GPT-3, LaMDA, and DALL-E, seems to assure us that the dream of Artificial General Intelligence is almost a reality.

Is it, though? Are we indeed that close? Is it possible that upon a closer look, we would discover “The Turk” hidden inside almost all of our latest and greatest AI achievements? Is the focus of modern AI science still “to amuse the court and advance careers by impressing the empress?” Lets dig in.

  1. Turing’s Imitations

In 1952, at Manchester’s Royal Society Computer Laboratory first-floor office, Alick Glennie – a twenty-six-year-old computer researcher- gathered a few old chairs around a small round table in the middle of the poorly lit, cluttered room. A chessboard was carefully placed at the center of the table, along with a chess clock and a notepad. That untidy room was about to witness the first-ever genuine match between a human and a computer. Alick was playing a part of a human chess player, and British mathematician and computer scientist Alan Turing was playing the machine part.

Fifteen years before that historical match, in 1937, Turing published “On Computable Numbers,” a paper that later became a landmark in computer science. In this paper, he defined the theoretical limits of computing machinery, no matter how complex or powerful that machinery is. Also, just like his predecessor, Charles Babbage, and many others who came after him, Turing was interested in programming machines to play chess. And just like many scientists, including John von Neumann, Oskar Morgenstern, and Claude Shannon, Turing believed that a chess program was the first step toward developing an artificial mind and comparing human and machine intelligence. Claude Shannon put it very succinctly. In his 1950 paper “A Chess-Playing Machine,” he wrote, “The investigation of the chess-playing problem is intended to develop techniques that can be used for more practical applications. The chess machine is an ideal one to start with for several reasons. First, the problem is sharply defined both in the allowed operations and the ultimate goal. Second, it is neither so simple as to be trivial nor too difficult for a satisfactory solution. And such a machine could be pitted against a human opponent, giving an exact measure of the machine’s ability in this kind of reasoning.” In other words, if a program can successfully “imitate” a human chess player, it should be considered intelligent. The key word here, of course, is “IMITATE.”

Imitation as a measure of intelligence was first suggested by Turing and immortalized in his famous test named after him. In this test, a computer is deemed an intelligent “thinking machine” if a human conversing with it via typewritten messages cannot tell whether it is another human or a machine. “The Turing Test,” which he called “The Imitation Game,” shaped AI research for the next seven decades. Today, we still judge our AI systems by how well they imitate us. But is it right? If “The Turk” could imitate an automaton and GPT-3 could imitate a human chat, are not they both equally deceptive? Is imitation the right way of measuring intelligence? Perhaps imitation as a metric for intelligence does more harm than good. The excitement about teaching a computer to play Go has distracted us from the ultimate goal of non-biological intelligence.

There is no doubt that imitation is a very powerful learning tool. It is how a child learns a particular language from their parents. It is why religions and cultures persist over generations and why breaking traditions and established scientific theories is so hard. Imitation is a part of our identity. In fact, our ability to imitate is what makes us different from any other living organism. The urge to imitate is so deeply entrenched in our minds that we unconsciously reward good imitators and prize the ability of animals and computers to copy human behavior. We are fascinated by the “Turks,” and really, by anything that imitates us being either an animal or machine. That is why memes exist, and catchy YouTube cat videos become viral. We are blind to our constant imitations and do not register how often we do them and how rare we see them in other animals. No wonder imitation comes to mind when we try to solve a problem or create something new. For example, Frank Rosenblatt, the father of Artificial Neural Networks, created his Perceptron by imitating the workings of a brain cell. There are countless examples of successful imitations that led to useful inventions. However, imitation is a tool, not a purpose. We would never be able to fly if our planes were faithful imitations of birds. We need imitations to survive, but we must break free of them to enable creativity.

We should stop treating imitation as the goal, and we need to start using it as a means of teaching AI world models. We must focus on intelligence and not the pleasing of our cravings for mimicry. Can GPT-3 or other similar imitators allow us to do this? I doubt it very much. It is one thing to rely on ever-present human anthropomorphism, and it is a completely different thing to measure intelligence objectively.

An AI chatbot generator, known as LaMDA, played the imitation game so well that it has even convinced one of the Google engineers that it is intelligent and conscious. Why did it happen? Well, because we equate intelligence to mimicry. However, I believe intelligence is the ability to create and adapt world models, not the ability to trick us by imitation. Like a GPS, the world model can guide the machine to achieve something, whereas imitation has no goals or drives. To achieve true intelligence, we, AI designers, must find a way to motivate it. The non-biological intelligence is not going to develop goals on its own.

Many modern AI scientists have recognized the limitations of the Turing Test and its focus on imitation. In the past decade, some new intelligence tests have been suggested to match AI’s true goals with what is actually measured in those tests. For example, the Winograd schema challenge (WSC) is a machine intelligence test proposed by Hector Levesque, a computer scientist at the University of Toronto. In his schema, the challenge question should consist of three parts:

-              A sentence or brief discourse that contains two noun phrases of the same semantic class, an ambiguous pronoun, and a special word that changes the natural resolution of the pronoun.

-              A question asking the identity of the ambiguous pronoun.

-              A two-answer choice that corresponds to the noun phrases in question.

A typical example of WSC is the following construct:

The city councillors refused the demonstrators a permit because they [feared/advocated] violence.

The choices of “feared” and “advocated” turn the schema into its two instances:

The city councillors refused the demonstrators a permit because they feared violence.

The city councillors refused the demonstrators a permit because they advocated violence.

Although a definite improvement on the Turing Test, WSC is still focused on imitation, a more complex imitation, still imitation nevertheless.

Another recently proposed test is called the “Lovelace Test,” named after the mathematician and computational pioneer, Ada Lovelace. This variation of the test is focused on creativity, which in this case, becomes a proxy for intelligence. The researchers who developed this test proposed that an AI can be asked to create a story, a poem, or an image, and then if the AI’s programmer is unable to explain how it came up with its answer or response, the machine must be considered intelligent. Besides the obvious problem following logic where all forms of art can be considered derivative, it is also a reasonably unlikely scenario the programmer could not eventually work out how their algorithm reached its end result.

There are dozens of other tests, including “The Reversed Turing Test,” Upside Down Test,” etc., all of which claim the ability to objectively measure an AI’s intelligence. Unfortunately, I do not believe any of them are free of the Imitation Game.

  1. The Bogdan Test

A machine is intelligent if it can explain the meaning of a newly seen proverb to its audience.

In our pursuit of Artificial General Intelligence, we often forget that we have already been on this path. Our own history as homo sapiens is a true testament to developing intelligence by accumulating wisdom. Wisdom and what we call “Common Sense” is not only the challenge but also the only true test for Artificial Intelligence. It requires a multitude of world models and the ability to create analogies, associations, and abstractions. Without achieving wisdom, Artificial Intelligence as the way for a machine to obtain knowledge, analytical skills, ability to learn, reason, and adapt to the environment will never break free of “The Imitation Game.” So, how can we test for wisdom? Well, we have already developed those tools over the past few millennia. They are called “Proverbs.”

Every generation, every culture, and every geographical region has developed its own sayings that encapsulate condensed expressions of wisdom. Not only do proverbs possess profound abilities to capture the most global analogies, but they also rely on our abilities of abstract thinking and association. What could be a better test for intelligence, and what could be a better measure of its level?

If a machine can explain the meaning of phrases like “Don’t look a gift horse in the mouth,” “Soon ripe, soon rotten,” or “If you pay peanuts, you get monkeys,” I would certainly consider it as being intelligent. The proverbs are omnipresent, they are numerous, and they are well-documented. Just one book – “Little Oxford Dictionary of Proverbs” – contains more than ten thousand sayings from all over the world that are well organized by topic, geography, and culture. There is no need to come up with artificial questions, convoluted scenarios, or dialogue schemes; just randomly choose a proverb and see what a machine says about it. One could also mix and match existing proverbs to get an infinite number of possible choices to prevent the possibility of the AI designers cheating by “curve fitting” their models to a given set of proverbs. I propose we hold ourselves to this higher objective of passing “The Bogdan Test.”

Can you see how much more difficult this test is compared to “The Imitation “Game?” So, once imitation is no longer a purpose but a means, we could focus on intelligence and not the pleasing of our cravings for mimicry. Can GPT-3 or other similar imitators pass The Bogdan Test? I doubt it very much. It is one thing to rely on the ever-present human anthropomorphism, and it is completely different to measure intelligence by wisdom. An AI chatbot generator, known as LaMDA, played the imitation game so well that it convinced a Google engineer that it is conscious. Is it true? Perhaps “The Bogdan Test” could help us set things straight.

  1. Memes

In her book “The Meme Machine,” Susan Blackmore defined memes as “units of imitation.” Everything we learn by imitation is a meme, and every imitation is a means of spreading memes. By making the machine imitate us, we embed our memes into it. Memes are the “viruses of the mind” (Richard Brodie); lately, they have become viruses of Artificial Intelligence. Why are we so susceptive to memes? Well, imitation is infinitely easier than creation. Any scientist knows that by following an established school of thought and imitating others, they may achieve desired social and economic status much easier than by developing a more radical point of view.

By reading the works of others, we get an army of functional memes ready to be employed through imitation. We religiously follow the formats of the scientific articles established by our predecessors so that we may publish. We praise peers and credit people who have nothing to do with our work simply because it is a tradition. Memes limit our creative thinking by spreading themselves around indiscriminately without regard to whether they are useful, neutral, or positively harmful to us. Memes do not care. They are “selfish,” just like our genes. And just like our genes, they are the foundation of evolution.

There is, of course, a question equally applicable to genes and memes. Where the drive to survive and multiply is coming from? How does a gene (or meme) know it needs to replicate? That is indeed a much bigger question, one I might discuss some other time, but for the sake of this piece, let us assume that replication and survival are the inherent goals of memes. By replicating, memes like “The Imitation Game” become universally accepted and dictate how one should think of assessing Artificial Intelligence. They also require that artificial neurons similar to Frank Rosenblatt’s ones be input additive, thus creating a weighted sum of its inputs in each layer of artificial Neural Networks. The concept of Deep Learning has this additive mechanism at its heart in all its variations. Therefore, by using this concept, we can only create one static solution and then optimize it using a plethora of machine learning methods. By simply accepting the imitation meme, we limit ourselves to one model, frame of mind, and way of looking at things.

While proverbs are memes as well, they also act as wrappers of wisdom. Proverbs are indispensable in presenting the congested form of multi-layer cultural models. They have been a means of education, development, and testing intelligence for generations. There is simply no better way for language-based generalizations available to us today.

As memes, proverbs are extremely powerful, capable of propagating through time by robust self-replication. Unlike many modern memes that rely on our much improved social interaction networks like YouTube, Facebook, or TikTok, pearls of wisdom are mostly spread through spoken and written language, making them better “survivors of the fittest” than many other memes. By containing wisdom and humor in their bodies, common-sense sayings are exactly what we need in our AI development toolbox.

Biological motivations and instincts are the results of evolution, and evolution can only occur in a population of self-replicating entities. In nature, animals with certain drives replicate better than other animals, and under ever-present selection pressure, the evolution progresses towards more fitted species. The same goes for machines. A machine that is not self-replicating can neither develop its own goals nor achieve the imposed ones by humans. We know from our experience gained over the past thousands of years that without replication and selection we cannot breed better-yielding crops, faster-running racehorses, and perfectly symmetrical show dogs without replication and selection. So, why not use proverbs (such powerful and agile replicators) as the foundation for machine evolution?

  1. Evolutionary Algorithm

A hundred and fifty years ago, Wallace and Darwin conceived a theory of evolution by natural selection. It was the very first plausible mechanism for creating complexity without a designer. In my opinion, it is the most powerful and universal algorithm ever invented. It is elegant because it is so simple, yet its results are so complex and unpredictable that they often surprise us. It is intuitive and hard to grasp simultaneously, but the world will not be the same once you understand it. There is no longer any need for an expert to design a logical model for any solution. There is just a stark and mindless procedure by which we can create pretty much anything and by which we, ourselves, came to exist. Beautiful, but very scary!

In his 1994 book “Darwin’s Dangerous Idea: Evolution and the Meanings of Life,” Daniel Dennett described the evolutionary process as an algorithm that, when followed, must produce an outcome. As with any other algorithm, Evolutionary Algorithm is completely substrate-independent. An abacus, a pen and paper, a mechanical calculator, and a digital computer can all execute the same algorithm to solve a mathematical formula. In the same way, Dennett describes evolution as “A scheme for creating Design out of Chaos without the aid of Mind.”

Although, on the surface, the algorithms must always produce the same result no matter how many times they are executed, it is not so with the Evolutionary Algorithm. Why? Because evolution as a process includes at least one chaotic component. As with dripping taps, swinging pendulums, and weather patterns, chaotic changes make simple, mindless Evolutionary Algorithms produce highly complex and unpredictable results. And since chaotic systems are very sensitive to initial starting conditions, a tiny difference at any stage of an Evolutionary Algorithm may lead to an entirely different result. That is the beauty of evolution!

Evolution does not have to end up with a result we expect; however, it will always end up with something more than it started with – something that always will be more complex than the algorithm’s initial conditions.

  1. Evolutionary Computation

Evolutionary Computation is an area of Artificial Intelligence and Machine Learning that uses ideas from biological evolution to solve complex, non-trivial computational problems. It employs various Evolutionary Algorithms to search through a huge space of possibilities for solutions that are impossible to find using the “top-down design.” Such computational problems often seek highly adaptive solutions, that is, to continue to perform well in a changing environment.

Unfortunately, from the early 1950s and the 1960s, when several computer scientists independently invented different Evolutionary Algorithms, the Evolutionary Computation branch of Machine Learning has always been considered the “underdog” of Artificial Intelligence. There were, of course, several reasons for that. Unlike Artificial Neural Networks (ANNs) or Decision Trees, for example, Evolutionary Algorithms require much more computational resources and a longer time to find an acceptable solution for a specific problem than other Machine Learning methods. However, none of the other algorithms can produce outcomes that are as creative.

The creativity of Evolutionary Algorithms is the result of their replicating power. Regardless of the Evolutionary Algorithm type, its implementation requires reliable and agile replicators to get copied. Proverbs are perfect in this respect! As one of the most powerful memes that ever existed, proverbs have no foresight; they do not look ahead or have designs in mind. They just get copied. In the process, some do better than others – some obliterate others – and this is how the evolutionary design comes about and how highly inventive and creative solutions get generated.

I firmly believe that the proverb-based Evolutionary Algorithm is the way to achieve “requisite variety” (W. Ross Ashby) and create conditions for emerging intelligence. The most interesting part of such an evolutionary mechanism is the intelligence it produces might be completely different from ours. In fact, we might be unable to measure, compare, or even comprehend it using our human brains. But would it be such a bad thing? Is not it that the ultimate way to free our intelligence from the grip of our limited brain and fragile biology is to create machines that are intelligent like us but not dependent upon us? They would be intelligent agents that could ensure our survival beyond the lifetime of our solar system. These machines could inherit our memes but not our genes.

  1. Conclusion

There is no doubt that Artificial Intelligence research has produced many tangible results during the past decade. But most of its practical applications came as the result of statistical modelling. This approach works well when we already know what we need to find in the end. In other words, the current AI is usually given a solution, and all it needs to do is to optimize it. However, there are many problems to which good or even acceptable solutions are unknown. They need to be discovered first.

At the heart of modern AI, we always find a Machine Learning algorithm employed to deal with the optimization process of the initial solution. As a general rule, those ML algorithms are based on the process called “hill climbing.” For example, Neural Networks and Deep Learning follow a gradient that is determined by a purpose function calculated using examples given by a designer. In this approach, only a small area of vast design space is searchable by the ML algorithm, and only solutions that could be derived from the initial hypotheses could be explored.

Therefore, building complex computer systems using Evolutionary Algorithms is the only plausible way of achieving non-biological intelligence that compares to ours. Instead of trying to design the intelligence using logic, we should allow the Evolutionary Algorithms to “Climb Mount Improbable” (Dawkins) and “Lift in Design Space” (Dennet) using a “wedge” of selection to find and accumulate a comprehensive set of valuable tools. We should build our systems based on the efforts of all earlier “climbings” through “pearls of wisdom” such as proverbs.

The Evolutionary Algorithms are not necessarily steady, though. As with biological evolution, there could be long stasis between periods of rapid change. Some evolutionary designs might not change for long periods (like crocodiles), while others could evolve rapidly. And sometimes, millions of Evolutionary Algorithm iterations are suddenly wiped out, as when the dinosaurs became extinct. But that is a small price to pay for creating true non-biological intelligence.

There is little doubt in my mind that evolutionary computation is on the verge of a breakthrough in synthesizing non-biological creative AI. Like Deep Learning and all its ML predecessors, it can take advantage of the newest computational resources that are becoming available. With new GPUs, IPUs, and other AI-oriented processors, Evolutionary Computing can break the mold and switch our efforts from building “Imitation AI” to “Creative AI.”

As Risto Miikkulainen – a professor at the University of Texas at Austin – stated, “These techniques (Evolutionary Computing) make it possible to find creative solutions to practical problems in the real world, making creative AI through evolutionary computation the likely next Deep Learning.”

Evolutionary Algorithms have experienced strong headwinds and initial skepticism right from the beginning. However, Evolutionary Algorithms and computation have become a major field of AI science due to their ability to provide solutions when other machine learning methods fail. Many AI research groups at OpenAI,, DeepMind, Google, BEACON, etc., pay increasingly more attention to Evolutionary Computation methods when developing their products. It is very understandable. By including chaotic elements within their inner workings, Evolutionary Algorithms have several special advantages:

A.           With the lack of problem-specific preconceptions and biases of the algorithm designer, Evolutionary Algorithms can create unexpected, original solutions that often possess artistic value.

B.           Evolutionary Algorithms have proven competitive in solving hard problems, including non-differentiability, discontinuities, multiple local optima, noise, and nonlinear interactions among the variables.

C.           Any Evolutionary Algorithm is inherently suited to parallelization, meaning that the newest achievements in computer hardware design, including GPUs, IPUs, and even Quantum Computing, are immediately suitable for their implementation.

D.           The performance of Evolutionary Algorithms has reached the level of human experts, and there is now substantial and well-documented evidence of them producing measurably human-competitive results.

E.            Evolutionary Algorithms can easily be used in collaboration with existing methods of Machine Learning. Incorporating them with artificial neural networks, for example, often lead to superior solvers because they can exploit the best features of both approaches.

F.            The use of a population and randomized choices ensure that Evolutionary Algorithms are less likely to get trapped in sub-optimal solutions than other search methods. They also are less sensitive to noise or infidelity of the underlying datasets.

G.           Having a population of self-replicating entities means that Evolutionary Algorithms can produce solutions at any stage of their development. It is extremely valuable when working on problems with many possible local solutions and conflicting objectives.

I also foresee a fruitful cross-pollination between Evolutionary Computation and biology. The Evolutionary Algorithms will most likely advance our insights in molecular and evolutionary biology in the coming decade. On the other hand, much more sophisticated Evolutionary Algorithms will emerge from our deeper understanding of genetics. This bi-directional exchange of ideas might enable us to create a new kind of artificial evolution – The Evolution of Wisdom!

In 1992, John Maynard Smith wrote: “So far, we have been able to study only one evolving system, and we cannot wait for the interstellar flight to provide us with a second. But for now, if we want to discover generalizations about evolving systems, we must look at artificial ones.”

Artificial evolution implemented using proverbs is a real possibility for developing Artificial General Intelligence. Not only will it provide us with a means of creating artificial qualia, but it also can give us insights into how intelligence and the physical body co-evolve.

In fact, using a non-biochemical substrate to implement Evolutionary Algorithms is beginning to look technologically feasible. Such an embodiment will provide us with tools to understand the nature of feelings and to model them. In my opinion, artificial feelings are the last obstacle to Artificial General Intelligence. Suppose we can create self-replicating, non-biological entities with artificial qualia. In that case, we could start evolving them towards their ultimate goal – Artificial General Intelligence – using proverbs as the underlying learning mechanism.

This process will finally enable us to separate generic principles and ground truth from effects specific to carbon-based life as we know it. My hope is that the Bogdan Test can help science take one step closer to Artificial General Intelligence.


Dr. Alex Bogdan is Chief Scientific Officer at Castle Ridge Asset Management


The views expressed in this article are those of the author and do not necessarily reflect the views of AlphaWeek or its publisher, The Sortino Group

Content role

© The Sortino Group Ltd

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency or other Reprographic Rights Organisation, without the written permission of the publisher. For more information about reprints from AlphaWeek, click here.