LLMs4PCG Competition

LLMs4PCG Competition

LLMs4PCG Competition

Introduction

The 1st LLMs4PCG Competition continues the challenging and exciting spirit of the ChatGPT4PCG competition. In this edition, we challenge participants to come up with a prompt and incorporate more complex prompt engineering (PE) techniques to construct stable Science Birds levels resembling uppercase English characters based on three selected open-weight Large Language Models (LLMs) respectively.

We welcome participants to submit their programs, either by modifying the example prompt or coming up with a complex logic through PE. The submitted programs will be inspected for qualification, subject to the competition rules, and used to generate levels for each target English uppercase character. Similar to the second ChatGPT4PCG competition, the generated levels are then tested for stability using our Science Birds Evaluator, checked for similarity with the Vision Transformer classifier, and calculated for diversity through the diversity evaluation matrix. The prompts and/or PE techniques developed by participants must generate stable, similar, and diverse levels on these three open-weight LLMs.

The 1st LLMs4PCG competition utilizes three open-weight LLMs with sizes up to 32B, making it more challenging. We set the limit on the model size to call for more potential participants with limited resources. Based on the previous study, we evaluated more of the latest and popular open-weight LLMs under the specifier size limit. The evaluation results showed that the selected three open-weight LLMs have better performance than GPT-3.5 used in ChatGPT4PCG 2.

While past ChatGPT4PCG competitions have provided a unique and useful platform, their focus on ChatGPT has limited the investigation of other LLMs for procedural content generation (PCG) further. The past ChatGPT4PCG competitions, show that LLMs have abilities to generate stable character-like levels for Science Birds. However, we still know little about what lies behind this ability for other LLMs, especially the open-weight ones. Moreover, the closed-source nature of GPT family LLMs makes it unclear when changes affecting level generation might occur. Transferring our competition from ChatGPT to open-weight LLMs is suitable for us to do a deeper analysis of this ability in LLMs based on the results of the competition.

Logistics

This time, we still allow the submission of a program in which participants can build on top of our examples and packages, enabling the use of conditions and iterations in programming to develop their own advanced PE techniques and potentially create new ones. We also provide a tutorial on our website to guide participants in PE for our competition task. Python is selected as the programming language in this competition.

The submitted file should be zipped and provide a README.md file containing instructions on how to run the participant’s program along with any required dependencies in the zip file. The zip file must be submitted through the link provided later on our competition website.

The submitted programs will undergo an evaluation process that involves subjecting them to 10 trials for each of the 26 uppercase letters of the English alphabet (A-Z) for each LLMs. The levels generated for each character will be evaluated based on their similarity, stability, and diversity. The evaluation process is applied for each prompt score per model. After calculating the prompt scores for all three models, the scores are summed before normalization for the final ranking. The entire evaluation process will be conducted using automated scripts and programs. The team that has the highest normalized score and overcomes our baseline will be declared the winner. If there are multiple teams with the same highest score, the one with the shortest prompt will be chosen as the winner. However, if multiple teams still have the same score and the shortest prompt, they will be considered co-winners.

We plan to apply for support from IEEE CIS funding to provide the prize for our competition as in the previous ChatGPT4PCG competitions.

Participants

We target participants who are interested in using LLMs for procedural content generation.

To attract participants and promote the competition, we plan to collaborate with professors and industry experts to promote the competition in relevant courses and seminars. We plan to send targeted emails to our existing mailing list of potentialparticipants, past competitors, and interested parties as well.

The 1st ChatGPT4PCG: The number of participants: 18
The 2nd ChatGPT4PCG: The number of participants: 9

Timeline

Midterm submission: 30 May 2025 (23:59 JST)
Notification of midterm results: 15 June 2025 (23:59 JST)
Final submission: 30 July 2025 (23:59 JST)
Announcement of final results: 15 August 2025 (23:59 JST)

Midterm submission is optional, although we recommend it. Any team that submits during the midterm submission will be notified of the preliminary results. However, all teams, whether they submit during the midterm or not, must submit during the final submission period. Only submissions during the final submission period will be considered for the final ranking.

Organizers

Yi Xia, one of the organizers of the second ChatGPT4PCG 2024 competition, is familiar with LLMs and is engaged in research related to affective computing in games and LLMs for card games.

Affiliation: Graduate School of Information Science and Engineering, Ritsumeikan University (email: gr0666ih@ed.ritsumei.ac.jp)

Pratch Suntichaikul, the second ChatGPT4PCG 2024 competition organizer, is engaged in cutting-edge research on LLMs, specifically focusing on LLMs for PCG.

Affiliation: Graduate School of Information Science and Engineering, Ritsumeikan University (email: gr0665kx@ed.ritsumei.ac.jp)

Zifan Ye, is engaged in cutting-edge research on LLMs for games, with a specific focus on LLMs for card games.

Affiliation: Graduate School of Information Science and Engineering, Ritsumeikan University (email: gr0734sh@ed.ritsumei.ac.jp)

Febri Abdullah, one of the first and second ChatGPT4PCG competition organizers, is engaged in cutting-edge research on LLMs, specifically focusing on LLMs for PCG.

Affiliation: Graduate School of Information Science and Engineering, Ritsumeikan University (email: gr0397fs@ed.ritsumei.ac.jp)

Mury F. Dewantoro, one of the first and second ChatGPT4PCG competition organizers, is familiar with PCG and LLMs and has experience organizing previous competitions.

Affiliation: Graduate School of Information Science and Engineering, Ritsumeikan University. (email: gr0450xi@ed.ritsumei.ac.jp)

Ruck Thawonmas, specializes in artificial intelligence, computational intelligence, and their applications to interactive entertainment and entertainment computing, and has extensive experience in organizing competitions, including the ChatGPT4PCG Competition, DareFightingICE AI Competition, and DareFightingICE Sound-Design Competition.

Affiliation: College of Information Science and Engineering, Ritsumeikan University (email: ruck@is.ritsumei.ac.jp)

Julian Togelius, his research focuses on the intersection of artificial intelligence and games, particularly in areas such as procedural content generation, player modeling, and game AI, and has proven experience in organizing competitions such as the ChatGPT4PCG Competition, the General Video Game Playing Competition, and the Simulated Car Racing Competition.

Affiliation: Tandon School of Engineering, New York University (email: julian@togelius.com)

Jochen Renz, specializes in qualitative spatial reasoning, knowledge representation, and computational complexity, is also known for initiating the Angry Birds AI competition, and is the co-organizer of the ChatGPT4PCG Competition.

Affiliation: School of Computing, The Australian National University (email: jochen.renz@anu.edu.au

Introduction

Logistics

Participants

Timeline

Organizers

External links