Where does knowledge reside?

Series 1: AI Basics — Chapter 5: How LLMs work

Large Language Models (LLMs) have changed the world forever

8 min readJun 12, 2024

This is the 5th chapter of seven in our first Series on AI Basics. In Chapter 2, I spoke about what AI is and the different technologies that live within its ecosystem. Today, I’m going to talk about one of those technologies — LLMs. Let’s get into it.

Chapter 1: Fundamentally Understanding AI

Chapter 2: Differences between AI, ML, DL, GenAI & LLMs

Chapter 3: A Brief History of Artificial Intelligence

Chapter 4: How AI Systems Work

Chapter 5: How LLMs Work [this article]

Today, we will cover:

How LLMs work
Their limitations
Their sizes
Reducing LLM hallucinations
What is RAG?
What is vector search?
Open- and closed-source LLMs

A question for the ages: where does knowledge reside?

Few people think about the fundamental existence of knowledge. Try this experiment with me and ask yourself — where does knowledge reside or exist? Before we see, read and understand knowledge from books, articles, images and videos, where did that knowledge first exist?

The answer to this question is that knowledge, at its most fundamental level, exists in the collective consciousness of humans. In today’s age of AI, knowledge has started to exist in systems called Large Language Models (LLMs).

1. How do LLMs work?

LLMs work by generating text based on patterns learned from vast amounts of training data. Using a technique called “next word prediction”, they predict the next word in a sequence of text, given the previous words. By repeating this process over millions of examples, the model learns to accurately predict the next word based on the previous context. Next word prediction is a fundamental capability that enables many language model applications, including:

Autocomplete and predictive text input (we see this in search engines)
Text generation and creative writing assistance (ChatGPT, Claude)
Machine translation (Google or Apple translator, Duolingo)
Conversational AI and chatbots (ChatGPT, Claude)
Text summarization (summarizing a large piece of text or an entire book)
Sentiment analysis and natural language understanding (categorizing whether an email is spam or not based on sentiment of email text)

The ability to understand and predict the flow of natural language is a key strength of large language models and enables them to generate coherent, contextually appropriate text for a wide range of tasks.

Using a technique called “next word prediction”, LLMs predict the next word in a sequence of text, given the previous words.

2. LLM Limitations

However, they have limitations — they can “hallucinate” information, lack up-to-date knowledge, and struggle with tasks requiring external information. Hallucinations occur because LLMs have no ground truth to rely on but instead choose each word purely based on statistical calculation.

OpenAI’s ChatGPT hallucination:

Once asked to provide sources for its answers, ChatGPT corrected itself.

Google’s Gemini hallucination:

Google’s AI Overview feature, which provides concise summaries atop search results, recently faced criticism for generating inaccurate, misleading, and even dangerous answers. In one example, it recommended adding glue to pizza making.

3. LLM Sizes

What makes LLMs so clever, in spite of these hallucinations, is the amount of data they’re trained on, specifically the number of parameters in the model. Parameters in LLMs refer to the numerical values that are learned during the training process on massive datasets. They determine how the model processes information and makes predictions.

Key points about LLM parameters:

Parameters act like adjustable dials that fine-tune the model’s understanding and generation of language.
The number of parameters in an LLM is often used as a proxy for the size or complexity of the model, with larger models having more parameters. For example, GPT-4 is speculated to have 1,700 billion parameters, while the largest Llama-2 model has 70 billion parameters.
More parameters generally allow for more complex representations and potentially better performance on language tasks. However, larger models require more computational resources for training and deployment, making them more expensive and less accessible.
The size of an LLM in memory is directly proportional to the number of parameters it contains. For example, the 70 billion parameter Llama-2 model requires at least 2 A100 GPUs (80GB) for inference or fine-tuning.

4. Reducing LLM hallucinations using RAG and Vector Search:

Hallucinations are a feature of LLMs and with each model growing exponentially large, it become difficult to prevent such hallucinations completely, although that is the ideal state.

Two ways AI companies are trying to minimize hallucinations are through what is called Retrieval-augmented generation (RAG) and Vector Search within a Vector Database. These are important concepts to understand how LLMs currently work and will continue to work in the future.

5. Retrieval-augmented generation (RAG):

LLMs trained solely on their original training data can sometimes hallucinate or generate plausible but false or unverified content, especially when asked about topics not well-covered in their training data.

RAG addresses this by allowing the LLM to retrieve relevant information from an external data source, such as a vector database, and return that information to the user in the form of a response (see figure below). RAG provides contextually-relevant, proprietary, and private data to an LLM enhancing its performance and accuracy.

A vector database is a special type of database that stores information in a way that makes it easy to find similar things, even if they don’t have the exact same words or numbers.

The key benefits of using RAG to reduce LLM hallucinations are:

Access to external and up-to-date information: RAG enables the LLM to retrieve recent facts and details from the vector database that may not have been present in its original training data. This helps ground the LLM’s responses in the latest available information.
Improved factual accuracy: By incorporating retrieved information, RAG-enhanced LLMs can provide more precise and detailed responses, reducing the likelihood of hallucinations or inaccuracies.
Ability to leverage private data: RAG allows LLMs to utilize private company data stored in the vector database, further enhancing the relevance and accuracy of responses for specific domains or use cases.
Efficient retrieval with vector search: The vector search capabilities of the underlying database enable fast retrieval of the most relevant information to include with the LLM’s prompt.

6. Vector Search:

Vector search is a way to find things that are similar to each other, even if they don’t have the exact same words. It’s like when you’re looking for a book, but you can’t remember the title. You might remember it had a blue cover and was about a dog. With vector search, the computer can look at the words in the book descriptions and find ones that are similar to “blue” and “dog”, even if they don’t have those exact words. All this happens inside a vector database.

Here’s an example: Let’s say you have a bunch of books, and you want to find ones that are similar to the book “The Hungry Caterpillar”. With vector search, the computer would look at the words in that book and find other books that have similar words and ideas, even if they have different titles. So it might find books like:

“The Busy Spider” (because it has a bug and is about being busy, which are similar to caterpillar and hungry)
“Goodnight Moon” (because it’s a classic children’s book, just like The Hungry Caterpillar)
“Where the Wild Things Are” (because it has animals and is about imagination, which are similar themes to The Hungry Caterpillar)

The computer can do this by turning the words in the books into special codes called vectors. These vectors have numbers that represent the meaning and ideas in the book. Books with similar meanings will have vectors that are close together.

When you search for a book, the computer turns your search into a vector too. It then looks at all the book vectors and finds the ones that are closest to your search vector. Those are the books that are most similar to what you’re looking for, even if they don’t have the exact same words. This is really helpful for finding things that are related, even if they don’t match perfectly. It’s like having a super smart friend who can figure out what you’re looking for, even if you don’t explain it perfectly.

Vector search is a way to find things that are similar to each other, even if they don’t have the exact same words. It’s like when you’re looking for a book, but you can’t remember the title. You might remember it had a blue cover and was about a dog.

7. Open- and closed-source LLMs:

Open- and closed-source refer to the licensing and accessibility of LLMs.

Open source LLMs have publicly available source code, model architecture, and pre-trained weights. This allows for transparency, as researchers can access the underlying models, inspect training data, and customize the code. Examples of open source LLMs include LLaMA, Mistral, and SDXL.

The key benefits of open-source LLMs are:

High levels of customization and flexibility to fit specific needs
Ability to build proprietary solutions on top of open technologies, potentially offering competitive advantage
Cost savings and optimized hardware infrastructure

Closed-source LLMs have proprietary source code and model weights that are not publicly accessible. This restricts customization and adaptation possibilities. Examples include GPT-3.5, GPT-4, GPT-4o by OpenAI and Claude by Anthropic and Gemini by Google.

The advantages of closed-source LLMs include:

Extensive research, development and continuous improvement backed by substantial resources
Dedicated support from the corporation that developed them
Clearly defined intellectual property rights, with enterprises typically not owning the underlying technology

The choice between open source and closed source LLMs depends on factors like budget, specific requirements, desired level of customization, and whether the priority is cost savings or cutting-edge performance. Open source offers more flexibility and innovation, while closed source provides more control, support and ease of use.

By understanding how LLMs work, even a non-technical person can hold a conversation with an AI/ML engineer or a Data Scientist.

Next, we’ll be talking about Generative AI by covering aspects like pre-training, fine-tuning and RAG in more detail. See you then!