Prompting System

Game Reasoning Arena uses a scalable prompting system.

Understanding the Observation Flow

Before diving into prompt creation, it’s important to understand how information flows from the game environment to the LLM and back. The complete observation flow works as follows:

Environment Creates Observation

Every game environment generates observations for each player through the _state_to_observation() method. This creates a structured dictionary containing everything an agent needs to make a decision:

  • state_string: A human-readable representation of the current game state

  • legal_actions: A list of valid moves the player can make

  • prompt: A carefully crafted natural language prompt generated by _generate_prompt()

Agent Receives Prompt

When it’s time for an LLM agent to act, it extracts the prompt field from its observation. This prompt contains all the context and instructions the LLM needs to understand the game situation and make an informed decision.

Backend Processing

The prompt then goes through backend-specific formatting. For example, chat-based models might have the prompt wrapped in conversation templates with system and user roles, while base models receive the raw text prompt.

Response Parsing

Finally, the agent extracts the chosen action and reasoning from the LLM’s structured JSON response, completing the decision-making cycle.

The Hierarchical Prompt Creation System

Game Reasoning Arena’s prompt creation follows a layered architecture that allows for both consistency and customization. Let’s explore each layer:

Environment-Level Prompt Generation

At the foundation of the system, each game environment implements its own _generate_prompt() method. This allows every game to create prompts that are perfectly tailored to its specific mechanics and information requirements.

Base OpenSpiel Environment

The base environment class OpenSpielEnv (in open_spiel_env.py) provides a generic template that works well for most traditional board games. It creates prompts containing:

  • The name of the game being played

  • The player’s symbol or identifier (like ‘X’ or ‘O’ in tic-tac-toe)

  • The current move number to provide temporal context

  • A visual representation of the board state

  • A clear list of available legal actions

Here’s what a base prompt might look like for a tic-tac-toe game:

You are playing the game: tic_tac_toe
and you are playing with the: X

the current move number is: 3
Board state:
X | O |
-----------
  |   |
-----------
  |   |

Available actions: [2, 3, 4, 5, 6, 7, 8]

What action do you choose? Reply only with the available action number.

Specialized Game Environments

Games with unique mechanics often need more sophisticated prompts. The Kuhn Poker environment class KuhnPokerEnv (in kuhn_poker_env.py) demonstrates this by including game-specific information.

For Kuhn Poker, the prompt includes:

  • The player’s private card (the most critical piece of hidden information)

  • Complete betting history to understand what has happened so far

  • Current pot size and each player’s contribution

  • Context-aware action labels that change based on the game situation

For example, if no one has bet yet, the actions are labeled as “Check” and “Bet”. But if an opponent has already bet, they become “Fold” and “Call” - much more intuitive for the LLM to understand.

Here’s what a real Kuhn Poker prompt looks like in practice:

You are Player 0 in the game Kuhn Poker.
Your private card: Jack
This is move number: 2
Betting history: ['Check']
Total pot size: 2 chips
Your contribution: 1 chips

Available actions:
0: Check (stay in the game without betting)
1: Bet (add a chip to the pot)

What action do you choose? Reply only with '0' or '1'.

The Prompt Formatting Layer

Once each environment creates its game-specific prompt, the system applies standardized formatting through the format_prompt() function (in llm_utils.py). This layer adds two crucial elements that ensure consistent, high-quality responses from LLMs.

Reasoning Request

The system encourages the LLM to think before acting by adding this instruction:

First, think through the game strategy and explain your reasoning.
Only after that, decide on the best action to take.

This “thinking out loud” approach often leads to better decisions and provides valuable insights for analysis and debugging.

JSON Output Format

To ensure reliable parsing of responses, the system enforces a structured output format:

Reply only in the following JSON format:
{
  'reasoning': <str>,
  'action': <int>
}

This structure allows the system to extract both the LLM’s strategic reasoning and its final action choice, enabling rich analysis of decision-making patterns.

Here’s how our Kuhn Poker prompt looks after formatting:

You are Player 0 in the game Kuhn Poker.
Your private card: Jack
This is move number: 2
Betting history: ['Check']
Total pot size: 2 chips
Your contribution: 1 chips

Available actions:
0: Check (stay in the game without betting)
1: Bet (add a chip to the pot)

What action do you choose? Reply only with '0' or '1'.

First, think through the game strategy and explain your reasoning.
Only after that, decide on the best action to take.

Reply only in the following JSON format:
{
  'reasoning': <str>,
  'action': <int>
}

Backend-Specific Chat Templates

Modern LLMs often work best with conversational formats rather than raw text prompts. The vLLM backend class VLLMBackend (in vllm_backend.py) handles this automatically by applying chat templates when appropriate.

Chat Template Detection

The system first checks if the model has a built-in chat template by examining the tokenizer. Models like ChatGPT, Claude, or Llama-2-Chat come with their own preferred conversation formats.

Automatic Chat Formatting

For chat-based models, the system wraps the prompt in a conversation structure:

[
  {
    "role": "user",
    "content": "You are Player 0 in the game Kuhn Poker..."
  }
]

Fallback Templates

If a model appears to be instruction-tuned but lacks a built-in template, the system applies a generic chat format that works well across different model families.

Role-based Structure

This conversion from plain text to conversation format helps models understand that they’re being asked to respond as a game-playing assistant, which often improves response quality and consistency.

Agent Integration and Response Processing

The final piece of the puzzle is how LLM agents (LLMAgent class in llm_agent.py) coordinate the entire process and handle the responses.

Receiving the Formatted Prompt

The LLM agent receives the fully formatted prompt from the environment and passes it directly to the backend system. This separation of concerns means agents don’t need to worry about game-specific formatting - they just handle the communication with the LLM.

Backend Communication

The agent sends the prompt to the appropriate backend (LiteLLM for API-based models, vLLM for local models), which handles all the technical details of model communication, chat template application, and generation parameters.

Response Parsing

When the LLM responds, the agent uses regular expressions to extract the action and reasoning from the JSON response:

# Extract action: looks for 'action': 1
action_match = re.search(r"'action'\s*:\s*(\d+)", response_text)

# Extract reasoning: looks for 'reasoning': 'text here'
reasoning_match = re.search(r"'reasoning'\s*:\s*'(.*?)'", response_text, re.DOTALL)

This robust parsing handles variations in JSON formatting and ensures reliable extraction of the LLM’s decisions.

Example Complete Response

Here’s what a complete LLM response might look like for our Kuhn Poker example:

{
  'reasoning': 'I have a Jack, which is the highest card in Kuhn Poker. My opponent checked, which could mean they have a weak card or are trying to trap me. Since I have the best possible card, I should bet to maximize my expected value. Even if they call, I will win the showdown.',
  'action': 1
}

The system extracts action: 1 (meaning “Bet”) and stores the reasoning for later analysis.

Customizing Prompts for New Games

When adding a new game to Game Reasoning Arena, you’ll likely want to customize the prompting to fit your game’s unique characteristics. Here’s how to do it effectively:

Override the _generate_prompt Method

Create your own implementation in your game environment:

def _generate_prompt(self, agent_id: int) -> str:
    # Get game-specific information
    special_info = self.get_special_game_info(agent_id)

    # Create your custom prompt
    prompt = f"""You are playing {self.game_name}.
    Special game information: {special_info}

    Current situation: {self.describe_current_situation()}
    Your options: {self.describe_actions_with_context(agent_id)}

    Choose your action wisely."""

    # Always use format_prompt to add reasoning and JSON formatting
    return format_prompt(prompt)