Codenames AI Competition
Codenames AI Competition

Introduction

The Codenames AI competition challenges LLM-based agents to cooperate as a team to play the popular word association game Codenames.


This competition evaluates the natural language reasoning capabilities of AI agents for the word-based board game Codenames, one of the most popular games on BoardGameGeek. Agents must collaborate as a team together to identify a set of partially known words on a provided board as quickly as possible by utilising word association clues. Codenames provides an interesting challenge for AI agents, requiring both a sophisticated understanding of language, theory of mind, and strategic reasoning capabilities in order to play well.


This competition is inspired by, and based on, an earlier competition held at FDG-2019 which focussed on more traditional natural language processing (NLP) techniques. Our plan to re-launch this competition with a renewed emphasis on Large Language Models (LLMs). LLMs have demonstrated enhanced reasoning and comprehension capabilities for language-based tasks, but can still suffer in lateral thinking challenges. We believe that Codenames presents an interesting challenge and benchmark for evaluating LLM reasoning capabilities, given that state-of-the-art models have not yet been effectively demonstrated to reach human-level performance.

Logistics

Interested teams must first register for the competition by notifying the organisers through the official discord group. Once registered, each team will have a new channel created for them where they will be encouraged to post any private questions / feedback and upload their submissions by the specified deadline.


Competition entrants must submit two agents, a Codemaster and a Guesser. Most agents will consist of a single Python file based on the provided templates, but additional files and software is also permitted. Entrants will need to provide complete instructions for how to run their agents on the following hardware:


Public comments can also be made through the main discord channel to notify all teams.

  • Operating System: Windows 11 or Ubuntu 24.04

  • Processor: AMD Ryzen Threadripper PRO 5955WX (64 MB cache, 16 cores, 32 threads, 4.0GHz to 4.5GHz)

  • Memory: 256GB, 4x64GB, DDR4, 3200MHz, RDIMM ECC Memory

  • Video Card: NVIDIA® RTX™ A6000, 48 GB GDDR6, 4 DP

  • Storage: 100GB

Entrants must ensure that both the submitted Codemaster and Guesser agent can run concurrently on the above hardware (i.e., available VRAM must be shared by both Codemaster and Guesser at the same time). Agents also need to provide a response when requested in a timely fashion, with a soft time limit of 60 seconds being imposed. Agents that repeatedly or egregiously breach this time limit when requested by the framework for a clue or guess response, will be disqualified.

  • Single Team: Played using the same scoring system as the previous Codenames AI framework, where a single team (red codemaster/guesser) attempts to identify all red words in as few turns as possible. Teams are awarded a score at the end of the game based on the number of turns taken. The only exception to this is if the guesser selects all blue words or the assassin word, which results in a maximum score of 25 points.

  • Two Teams: Played using the full set of rules from the original Codenames game, where two teams (red codemaster/guesser and blue codemaster/guesser) play against each other, attempting to identify all words of their team’s colour first. Rather than using a scoring system, this version measures success in terms of overall win-rate.

  • Memory: 256GB, 4x64GB, DDR4, 3200MHz, RDIMM ECC Memory

  • Video Card: NVIDIA® RTX™ A6000, 48 GB GDDR6, 4 DP

  • Storage: 100GB

Entrants must ensure that both the submitted Codemaster and Guesser agent can run concurrently on the above hardware (i.e., available VRAM must be shared by both Codemaster and Guesser at the same time). Agents also need to provide a response when requested in a timely fashion, with a soft time limit of 60 seconds being imposed. Agents that repeatedly or egregiously breach this time limit when requested by the framework for a clue or guess response, will be disqualified.


Each submission will be automatically included in both tracks, unless requested otherwise.


The initial stage of the competition will be held in a round-robin fashion to reduce the number of teams down to four for each track. The remaining teams will then participate in a “best-of-three” elimination tournament (semi-finals and finals) that will be recorded and played back during the CoG competition.


The main competition framework and provided agents are written in Python, although using external services and libraries not written in Python is also permitted. The barrier to entry is very low, with instructions provided on the official GitHub page detailing how to install and run the baseline agents provided. Simple agent enhancements using prompt engineering techniques are encouraged for less tech savvy teams.

Participants

This competition is open for all researchers interested in natural language processing, but with a particular focus on Large language Models (LLMs).


We feel that this competition aligns well with the scope of CoG 2025, with a relatively low barrier of entry that can help encourage student engagement with the research field and community.


Our competition GitHub repository (https://github.com/stepmat/Codenames_GPT) provides detailed instructions on how to setup and run baseline LLM agents powered by OpenAI’s GPT4o. Creating your own agents based on alternative LLMs or improved prompts is as simple as changing a few lines of code.

Timeline

  • Registration Deadline: 31/07/2025

    • Deadline for teams to register their interest in the competition

  • Submission for Testing Deadline: 04/08/2025

    • Deadline for teams to submit their agent files for initial testing purposes. The competition organisers will check that the code can be run successfully and provide feedback on any issues. Teams that do not submit by this date, but will not receive any testing feedback.

  • Final Submission Deadline: 11/08/2025

    • Date to submit final agent files for the competition. Results will then be generated using submitted entries over the following two weeks before the CoG 2025 conference.

Organizers

Dr. Matthew Stepehsnon (Flinders University) – matthew.stephenson@flinders.edu.au

  • Lecturer at Flinders University. Prior experience organizing and running multiple competitions, including the Angry Birds Level Generation competition and the Ludii Game Playing competition. Author of several prior papers of Codenames LLM agents, including one at CoG 2024.

Matthew Sidji (University of Melbourne) – msidji@student.unimelb.edu.au

  • PhD candidate at the University of Melbourne, author of several prior papers on Codenames LLM agents, including one at CoG 2024.

Giovanni Paolini (University of Bologna) – g.paolini@unibo.it

  • Associate Professor at the University of Bologna. Research and industry experience on generative AI and LLMs. Author of a paper on game AI at CoG 2024.