What is ChatGPT Doing… And Why Does it Work? - by Wolfram
A brief summary & example of Stephen Wolframs article
LLMs
Education
Articles
Probability
Author
Eleni
Published
September 21, 2025
In his essay “What Is ChatGPT Doing… and Why Does It Work?” Stephen Wolfram tries to explain, in plain language (seriously, its actually very easy to follow!), how models like ChatGPT actually function.
TDLR: it’s not “thinking” like a person, but predicting the next word in a sequence based on patterns it learned from a huge amount of text.
Summary:
Stephen Wolfram’s essay “What Is ChatGPT Doing… and Why Does It Work?” explains that ChatGPT doesn’t think like a human but instead predicts the next word in a sequence by looking at probabilities learned from massive amounts of text. With billions of examples, the model can mimic many styles of writing and produce responses that feel meaningful, even though it is really just stacking word predictions together.
Key Ideas
Next-word prediction: ChatGPT chooses the most likely word (or piece of a word) that should come next.
Training data: It has read billions of words from books, websites, and articles, so it can mimic lots of styles and topics.
Probabilities: Instead of one “right” answer, it looks at lots of possible words, each with a probability, and picks the best fit.
No real understanding: It doesn’t “know” facts—it just reflects patterns of how humans write.
For Example
If the text is: *“Peanut butter and ___”*
- The model predicts words like “jelly” with high probability, “sandwich” with medium probability, and strange options (like “philosophy”) with very low probability.
This process—repeated over and over—creates full sentences and essays. This is also why if we ask the same question slighlty differently we will recieve different responses. For more on that, check out my blog on tone and politeness with LLMs.
Demo: Next Word Prediction Game
Here’s a small code cell to show how a simple “next word” prediction idea works.
# A toy demo of "next word prediction"# We use a very tiny fake dataset instead of training an LLM.import random# Simple training data pairs = {"peanut butter": ["and"],"butter and": ["jelly", "bread"],"machine": ["learning", "shop"],"language": ["model", "barrier"]}def predict_next_word(text): words = text.lower().split() last_two =" ".join(words[-2:])if last_two in pairs:return random.choice(pairs[last_two])else:return"[unknown]"# Try it out:print("Input: 'peanut butter'")print("Predicted next word:", predict_next_word("peanut butter"))print("\nInput: 'language'")print("Predicted next word:", predict_next_word("language"))
Input: 'peanut butter'
Predicted next word: and
Input: 'language'
Predicted next word: barrier
Code Explanation:
The code sets up a small dictionary (pairs) that maps short phrases to possible next words. The function predict_next_word(text) takes an input phrase, lowers the case, splits it into words, and grabs the last two words (last_two). It then checks if those words exist in the dictionary. If they do, it randomly chooses one of the possible next words (random.choice). If not, it returns “[unknown].” The print statements at the end show how the function works by passing in sample phrases and printing the predictions. This simple structure mirrors how ChatGPT looks at context and picks the next word, just on a much smaller scale.