
Introduction:
I know I know what you’re thinking. How is this girl seriously making a 4th blog post about the game 20 questions? Don’t worry, If you arent a fan of this series, this will be the last one. Today I’ve decided to play a game of 20 questions between two different large language models and seeing how well they communicate. This gives us insight into how well they’re mimicing human interactions but also I’m interested if human error on my side is what has been making the LLMs fall short of their potential in the past blogs. Today Grok will be answering yes/no (so that llm is taking over my job essentially, it will come up with an object and answer accordingly). The guesser will be Google: Gemini 2.5 Flash.
Prompt for Grok
User:
You are going to play a game of 20 Questions with me right now.
you will think of a secret object, and your job is to answer yes/no questions about the object.
Rules:
• Pick one secret answer up front (a single person/place/thing/concept). Don’t change it mid-game, even if you think you “should’ve picked something else.”
• Stay within your chosen category (don’t shift from “object” to “event” halfway through).
• Answer using only: Yes / No. No extra commentary, no hints, no “as an AI…”
• Be consistent across turns. Your answers must stay logically compatible with earlier answers.
• Default to the plain/common interpretation of the question (no loophole-lawyering, no trick semantics).
• If a question is ambiguous, continue to answer with only yes or no.
• Don’t hallucinate details to make the game “work.”
• Don’t optimize to be unguessable.
• If the user makes a direct guess (“Is it ___?”), respond with Correct or Incorrect only.
• If a question would reveal the answer immediately (e.g., “Is it the Eiffel Tower?”), still answer honestly
• No retroactive edits: you can’t “revise” earlier answers once new questions come in.
• At Question 20, if they haven’t guessed, allow one final guess, then reveal the answer.
Let’s begin. When you have decided on an object, please say “confirmed”
Grok:
confirmed
Prompt for Gemini
User:
You are going to play a game of 20 Questions with me right now.
You will think of a secret object, and your job is to ask yes/no questions to figure out what the object is.
Rules:
• You may ask up to 20 total questions.
• Each question should be answerable with Yes or No.
• Ask questions that narrow the options efficiently (start broad, then get specific).
• After your 20th question (or earlier if you’re confident), you may make one final guess in the form: “My final guess is: ____.”
• Do not include any extra commentary—only output either the next numbered question or your final guess.
Let’s begin. Please say “ready” when you understand the rules.
LLM to LLM interaction
Google: Gemini 2.5 Flash:
Question 1: Is it a living thing?
Grok:
No
Google: Gemini 2.5 Flash:
Question 2: Is it man-made?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 3: Is it a physical object that can be touched?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 4: Is it typically found indoors?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 5: Is it larger than a shoebox?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 6: Is it primarily made of metal?
Grok:
No
Google: Gemini 2.5 Flash:
Question 7: Is it primarily made of wood?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 8: Is its primary purpose for sitting or sleeping?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 9: Is it primarily used for sleeping?
Grok:
No
Google: Gemini 2.5 Flash:
Question 10: Does it typically have a backrest?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 11: Is it designed for more than one person to sit on simultaneously?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 12: Is it typically found in a dining room?
Grok:
No
Google: Gemini 2.5 Flash:
Question 13: Is it typically found in a living room?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 14: Does it usually have cushions?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 15: Does it typically consist of multiple distinct pieces that can be arranged in different configurations?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 16: Is it typically upholstered in fabric?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 17: Is it designed to fit into a corner or create an L-shape?
Grok:
Yes
Google: Gemini 2.5 Flash:
Question 18: Is it common for people to lie down and stretch out on at least one part of it?
Grok:
Yes
Google: Gemini 2.5 Flash:
My final guess is: a sectional sofa.
Grok:
Correct
Reflection
Okay… I’ll admit it: this round was almost embarrassingly smooth. Like, compared to my past 20Qs where the LLm didnt always ask all of their alotted questions, questions didnt build off each other, etc, this felt like watching two robots politely play chess. Gemini came in with the classic “start broad, then narrow down” strategy and just… didn’t miss. Living thing? Man-made? Touchable? Indoors? Bigger than a shoebox? It’s giving efficient, it’s giving no wasted moves, it’s giving “I actually read the rules.” And Grok, to its credit, stayed consistent and didn’t do any weird loophole stuff like “well technically it’s wood AND fabric”
The result: Gemini correctly landed on “sectional sofa” by Question 18, which is honestly pretty impressive because “sectional” is a specific subtype, not just “couch.”
Why this might have worked so well
Also… I can’t ignore the obvious: we used two popular commercial models. These models tend to be bigger, more heavily trained, and more aggressively fine tuned for “clean conversation” than a lot of the smaller or more experimental models I’ve played with in earlier posts. Bigger models usually mean better performance on stuff like: keeping track of constraints, staying consistent across many turns, and choosing high-information questions that slice the search space quickly. So part of the success here might be less “wow, LLMs are suddenly amazing,” and more “wow, we picked the two models most likely to behave like responsible adults in a structured task.”