Experiments

This section covers how to design and run experiments with Game Reasoning Arena, including distributed execution capabilities.

Ray Integration for Parallel Execution

Game Reasoning Arena supports Ray for distributed and parallel execution, allowing you to:

  • Run multiple games in parallel across different cores/machines

  • Parallelize episodes within games for faster data collection

  • Distribute LLM inference for batch processing

  • Scale experiments on SLURM clusters or multi-GPU setups

Configuration Options

Option 1: Combined Configuration File (YAML)

# Combined config with all settings in one file
env_config:
  game_name: tic_tac_toe
num_episodes: 5
agents:
  player_0:
    type: llm
    model: litellm_groq/llama3-8b-8192
  player_1:
    type: random
use_ray: true
parallel_episodes: true
ray_config:
  num_cpus: 8
  include_dashboard: false

Option 2: Separate Ray Configuration (Recommended)

# Use any existing config + separate Ray settings
python3 scripts/runner.py \
  --base-config src/game_reasoning_arena/configs/multi_game_base.yaml \
  --ray-config src/game_reasoning_arena/configs/ray_config.yaml \
  --override num_episodes=10 \
  --override agents.player_0.model=litellm_groq/llama3-70b-8192

Option 3: Command-Line Override

# Enable Ray with any existing configuration
python3 scripts/runner.py --config src/game_reasoning_arena/configs/example_config.yaml \
  --override use_ray=true parallel_episodes=true

Option 4: Maximum Parallelization (Multi-Model Ray)

# Run multiple models in parallel with full Ray integration
# Parallelizes: Models + Games + Episodes simultaneously
python3 scripts/run_ray_multi_model.py \
  --config src/game_reasoning_arena/configs/ray_multi_model.yaml \
  --override use_ray=true

Ray Configuration Parameters

The ray_config.yaml file contains Ray-specific settings:

Ray Configuration Options

Parameter

Description

Default

use_ray

Enable/disable Ray

false

parallel_episodes

Parallelize episodes within games

false

ray_config.num_cpus

Number of CPUs for Ray

Auto-detect

ray_config.num_gpus

Number of GPUs for Ray

Auto-detect

ray_config.include_dashboard

Enable Ray dashboard

false

ray_config.dashboard_port

Dashboard port

8265

ray_config.object_store_memory

Object store memory limit

Auto

Performance Comparison

Execution Modes Performance

Execution Mode

Parallelization Level

Best For

Expected Speedup

scripts/runner.py (standard)

Episodes only

Single model, single game

~N_episodes

scripts/runner.py (Ray enabled)

Games + Episodes

Single model, multiple games

~N_games × N_episodes

scripts/run_ray_multi_model.py

Models + Games + Episodes

Multiple models, multiple games

~N_models × N_games × N_episodes

Recommendation: Use run_ray_multi_model.py for multi-model experiments to achieve maximum speedup.

Configuration Merging Order

The system merges configurations in this order (later overrides earlier):

  1. Default configuration

  2. Base config (--base-config)

  3. Main config (--config)

  4. Ray config (--ray-config)

  5. CLI overrides (--override)

SLURM Integration

For cluster environments, Ray automatically detects SLURM allocation:

# SLURM job with Ray
sbatch --nodes=2 --cpus-per-task=48 --gres=gpu:4 slurm_jobs/run_simulation.sh

The SLURM script (slurm_jobs/run_simulation.sh) handles:

  • Multi-node Ray cluster setup

  • Head node and worker initialization

  • GPU allocation across nodes

  • Environment variable configuration

Debug Commands

# Check Ray status
ray status

# Monitor Ray dashboard (if enabled)
# Navigate to: http://localhost:8265

Experiment Design

Configuration Management

Use YAML configuration files to define experiments:

experiment:
  name: "llm_comparison_study"
  description: "Compare different LLM models on strategic games"

games:
  - name: "connect_four"
    num_episodes: 100
  - name: "kuhn_poker"
    num_episodes: 200

agents:
  - type: "llm"
    model: "gpt-4"
    name: "GPT4_Player"
  - type: "llm"
    model: "claude-3-sonnet"
    name: "Claude_Player"

Running Experiments

Single Experiments

python scripts/simulate.py --config experiments/my_experiment.yaml

Batch Experiments

For large-scale studies:

# Using SLURM for cluster computing
sbatch slurm_jobs/run_simulation.sh

# Or parallel execution
python scripts/runner.py --parallel --jobs 8

Distributed Computing

Use Ray for distributed execution:

execution:
  backend: "ray"
  num_workers: 8
  resources_per_worker:
    cpu: 2
    memory: "4GB"

Statistical Analysis

Significance Testing

from game_reasoning_arena.analysis import statistical_tests

# Compare win rates between agents
p_value = statistical_tests.binomial_test(
    wins_a=75, games_a=100,
    wins_b=65, games_b=100
)