The current state of the art in artificial intelligence (AI) is generative AI and large language models (LLMs). The emergent capabilities of these models have been surprising: They can perform logical reasoning, complete mathematical proofs, generate code for software developers, and, perhaps most notably, converse with humans in human-like ways. A natural question is how close these models are to artificial general intelligence (AGI), the term used to describe human-level intelligence capabilities. This article explores the answer and the implications of current and future AI capabilities for IT professionals.

Understanding in LLM

The first impression of LLMs was that they were grand statistical analysis models that used fine probabilities to produce the next word in a sequence. The experts building LLMs create novel architectures and refine performance with advanced training algorithms, but under the hood, it is a black box: Artificial neurons connected to each other, attenuated by line strengths; what exactly goes on between the neurons is unknown.

However, we do understand that as signals pass from one layer to another in an LLM model, an abstraction process takes place that leads to higher concepts being captured. This suggests that LLMs make sense of language conceptually, and concepts contain meaning. This is a shallow level of understanding an LLM possesses as it does not have the apparatus of the brain to develop a deeper understanding, but sufficient to perform simple reasoning.

Omdia is seeing AI researchers treating LLMs as experimental subjects and running various benchmarks and tests on these models to see how well they can perform. To test the logical reasoning of OpenAI's ChatGPT, I ran the following query: The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true? The correct answer, as I’m sure you worked out, is yes, it can be true. The doctor and mother could be the same person.

In what follows, I gave a shortened version of ChatGPT’s responses (in bold). The actual wording was quite long-winded. The free version of ChatGPT is based on GPT-3.5, and its initial response was: In a figurative or metaphorical sense, yes, it can be true. It then went on to say the son could be expressing gratitude … to the doctor [that] provided medical care … while not literally true.

ChatGPT using the latest GPT-4 requires a small monthly fee, which in the interests of science I paid up. This was the response: The statement presents a mix of literal and metaphorical interpretations of "giving birth." … Both statements can be true, depending on how the phrase "gave birth" is understood.

There is clearly an issue of metaphors here, so I added an initial prompt to the query: Treat the following statements in purely logical terms and not metaphor. The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true?

The response from ChatGPT (based on GPT-4) was that they cannot both be true simultaneously because they contradict each other regarding who actually gave birth to the son. Not a good response.

I added one more prompt at the query end to help guide the answer: Treat the following statements in purely logical terms and not metaphor. The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true? In answering consider who the doctor could in theory be.

ChatGPT (GPT-4) finally gave the correct answer: If the mother of the son is herself a doctor … then both statements could technically be true. However, ChatGPT (GPT-3.5) was still stuck: In purely logical terms, the statements given are contradictory.

To conclude this exercise, ChatGPT (GPT-4) can perform logical reasoning but needs prompts to guide it. It will be interesting to see how GPT-5 performs when it is launched in mid-2024. My guess is that at some point in the evolution of GPT, it will be able to answer this query correctly without the second prompt, whereas the first prompt is a reasonable one to give to ensure the machine understands the nature of the query.

Whichever way you analyze this exercise, what impresses me is that GPT was not trained to perform logical reasoning – it was trained to process language.

LLM: Hype or Substance?

If you read the press, there is a sense, at least by some commentators, that we’re in a bubble. However, Omdia’s view is that the bubble may be related to the stock market valuations of certain players in the market who make current LLM models possible. Clearly, companies come and go, and this is not the place to give stock-picking recommendations. There probably will be churn in which players sit at the top, but what will endure is a thread of continually improving AI technology of the generative kind. This has substance and will have a lasting impact, not least in our everyday work experience, as intelligent machines augment and assist people in their jobs. There will no doubt be some job displacement. As some jobs disappear through automation, others will open up that require a human in the loop. A major step change in how we use this technology will be LLM on the edge.

LLM on the Edge

LLM models tend to be rather large with billions of parameters and need significant GPU processing capabilities to train them. The parameters refer to variables known as weights that connect artificial neurons in the model and attenuate the connection strength between connected neurons. Each neuron also has a ‘bias’ parameter. The best way to think about parameters is as a proxy for the number of artificial neurons in the model. The more parameters, the bigger the artificial brain.

There is a trend that the larger the model, the better its performance on various benchmarks. This is true of OpenAI’s GPT models. However, some players in the market have resorted to techniques that keep the size of the model stable while finding algorithmic techniques to increase performance. Exploiting sparsity is one approach. For example, many neurons move very small data values (near zero) in any given process/calculation and contribute little to the outcome. Dynamic sparsity is a technique that ignores such neurons and thereby only a subset of neurons in any given process take part in the outcome. This reduces the size of the model. An example of this technique is used by ThirdAI on its Bolt2.5B LLM.

The key benefit of a smaller LLM is the ability to put it on the edge: in your smartphone, in an automobile, on the factory floor, etc. There are clear benefits for LLM on the edge:

Lower cost of training smaller models.
Reduced roundtrip latency in interrogating the LLM.
The privacy of data, keeping it local.

The following players are working on small LLM models and have published their Massive Multitask Language Understanding (MMLU) benchmark score – see Figure 1.

Alibaba: Qwen, open source models.
Google DeepMind: recently released Gemma LLM models.
Meta: Meta Llama (3 is the latest model), available in different sizes.
Microsoft: Phi-3 series, the latest in the Phi models.
Mistral: French-based startup.
OpenAI: GPT, large LLMs but included here for reference.

Figure 1. Range of relatively small LLM models with their MMLU benchmark scores. OpenAI GPT is included for reference. MMLU above 75% are highlighted.

Omdia

The data in the table is plotted on a log scale in Figure 2. Clearly, the larger the model, the better the MMLU score. Of the small models, Microsoft Phi-3-medium (14bn parameters) scores a high MMLU of 78. Microsoft is aiming to put a model like this in a smartphone. The day is nearing when phone users will be able to converse with a personal assistant to make appointments, make calls, take notes, as well as converse on personal matters such as health, and keep all the data confidential and private. This will open up a huge market. The emergent capability of logical reasoning will make a significant difference in autonomous driving, where the machine will be able to reason in scenarios it has not been trained on.

Figure 2. Range of relatively small LLM models with their MMLU benchmark scores. OpenAI GPT is included for reference.

Omdia

second chart showing range of relatively small LLM models with their MMLU benchmark scores

AI Implications for IT Professionals

Emergent properties of generative AI models based on reasoning are the most powerful features to make these models valuable in everyday work. There are multiple types of reasoning:

Logical
Analogical
Social
Visual
Implicit
Causal
Common sense

We would also want the AI models to perform deductive (reason based on given facts), inductive (be able to generalize), and abductive (identify the best explanation) reasoning. When LLMs can perform the above types of reasoning reliably, then we will have reached an important milestone on the path to AGI.

With the current LLM capabilities, they can augment people in their work and improve their productivity. Need to generate test cases from a set of requirements? That could be a three-hour job for a developer. It would take an LLM three minutes. Although the LLM results would likely be incomplete and may contain some poor choices, it would also create tests the developer would not have thought of. It would kickstart the process and save the developer time.

LLM models can be finetuned with private data. For example, the infrastructure details of an organization would be unique to that organization. Such an LLM finetuned to be queried on internal IT matters would be able to provide custom and reliable information relevant to that organization.

AI-based machine assistants will become normal in the workplace. Finetuned models can act as a source of knowledge, especially helpful for new workers. In the future, AI machines will be able to rapidly perform triage and be reliable enough to take remediation action. My view is this technology will be embraced by IT professionals as a reliable assistant to improve their productivity.

Appendix

Further Reading

Omdia Market Radar: AI Assisted Software Development, 2023-24, Omdia OM120509.

E M Azoff, Towards Human-Level Artificial Intelligence, CRC Press, to appear 2024.

Citation Policy

Request external citation and usage of Omdia research and data via [email protected].

Omdia Consulting

We hope that this analysis will help you make informed and imaginative business decisions. If you have further requirements, Omdia’s consulting team may be able to help you. For more information about Omdia’s consulting capabilities, please contact us directly at [email protected].

Copyright notice and disclaimer

The Omdia research, data, and information referenced herein (the “Omdia Materials”) are the copyrighted property of Informa Tech and its subsidiaries or affiliates (together “Informa Tech”) or its third-party data providers and represent data, research, opinions, or viewpoints published by Informa Tech, and are not representations of fact.

The Omdia Materials reflect information and opinions from the original publication date and not from the date of this document. The information and opinions expressed in the Omdia Materials are subject to change without notice and Informa Tech does not have any duty or responsibility to update the Omdia Materials or this publication as a result.

Omdia Materials are delivered on an “as-is” and “as-available” basis. No representation or warranty, express or implied, is made as to the fairness, accuracy, completeness, or correctness of the information, opinions, and conclusions contained in Omdia Materials.

To the maximum extent permitted by law, Informa Tech and its affiliates, officers, directors, employees, agents, and third-party data providers disclaim any liability (including, without limitation, any liability arising from fault or negligence) as to the accuracy or completeness or use of the Omdia Materials. Informa Tech will not, under any circumstance whatsoever, be liable for any trading, investment, commercial, or other decisions based on or made in reliance of the Omdia Materials.

Comments

Plain text

Artificial General Intelligence: Are We There Yet?