Skip to content

An AlphaZero-style AI agent for a custom triangle puzzle game. Features distributed self-play with Ray, a C++ MCTS core, and a PyTorch neural network. Includes async logging via MLflow & TensorBoard.

License

Notifications You must be signed in to change notification settings

lguibr/alphatriangle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

58 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

CI/CD Status - codecov - PyPI versionLicense: MIT - Python Version

AlphaTriangle

AlphaTriangle Logo

Overview

AlphaTriangle is a project implementing an artificial intelligence agent based on AlphaZero principles to learn and play a custom puzzle game involving placing triangular shapes onto a grid. The agent learns through headless self-play reinforcement learning, guided by Monte Carlo Tree Search (MCTS) and a deep neural network (PyTorch).

Key Features:

  • Core Game Logic: Uses the trianglengin>=2.0.7 library for the triangle puzzle game rules and state management, featuring a high-performance C++ core.
  • High-Performance MCTS: Integrates the trimcts>=1.2.1 library, providing a C++ implementation of MCTS for efficient search, callable from Python. MCTS parameters are configurable via alphatriangle/config/mcts_config.py.
  • Deep Learning Model: Features a PyTorch neural network with policy and distributional value heads, convolutional layers, and optional Transformer Encoder layers.
  • Parallel Self-Play: Leverages Ray for distributed self-play data generation across multiple CPU cores. The number of workers automatically adjusts based on detected CPU cores (reserving some for stability), capped by the NUM_SELF_PLAY_WORKERS setting in TrainConfig.
  • Asynchronous Stats & Persistence (NEW): Uses the trieye>=0.1.2 library, which provides a dedicated Ray actor (TrieyeActor) for:
    • Asynchronous collection of raw metric events from workers and the training loop.
    • Configurable processing and aggregation of metrics.
    • Logging processed metrics to MLflow and TensorBoard.
    • Saving/loading training state (checkpoints, buffers) to the filesystem and logging artifacts to MLflow.
    • Handling auto-resumption from previous runs.
    • All persistent data managed by trieye is stored within the .trieye_data/<app_name> directory (default: .trieye_data/alphatriangle).
  • Headless Training: Focuses on a command-line interface for running the training pipeline without visual output.
  • Enhanced CLI: Uses the rich library for improved visual feedback (colors, panels, emojis) in the terminal.
  • Centralized Logging: Uses Python's standard logging module configured centrally for consistent log formatting (including โ–ฒ prefix) and level control across the project. Run logs are saved to .trieye_data/<app_name>/runs/<run_name>/logs/.
  • Optional Profiling: Supports profiling worker 0 using cProfile via a command-line flag. Profile data is saved to .trieye_data/<app_name>/runs/<run_name>/profile_data/.
  • Unit Tests: Includes tests for RL components.

๐ŸŽฎ The Triangle Puzzle Game Guide ๐Ÿงฉ

This project trains an agent to play the game defined by the trianglengin library. The rules are detailed in the trianglengin README.


Core Technologies

  • Python 3.10+
  • trianglengin>=2.0.7: Core game engine (state, actions, rules) with C++ optimizations.
  • trimcts>=1.2.1: High-performance C++ MCTS implementation with Python bindings.
  • trieye>=0.1.2: Asynchronous statistics collection, processing, logging (MLflow/TensorBoard), and data persistence via a Ray actor.
  • PyTorch: For the deep learning model (CNNs, optional Transformers, Distributional Value Head) and training, with CUDA/MPS support.
  • NumPy: For numerical operations, especially state representation (used by trianglengin and features).
  • Ray: For parallelizing self-play data generation and hosting the TrieyeActor. Dynamically scales worker count based on available cores.
  • Numba: (Optional, used in features.grid_features) For performance optimization of specific grid calculations.
  • Cloudpickle: For serializing the experience replay buffer and training checkpoints (used by trieye).
  • MLflow: For logging parameters, metrics, and artifacts (checkpoints, buffers) during training runs. Provides the primary web UI dashboard for experiment management. Data stored in .trieye_data/<app_name>/mlruns/. Managed by trieye.
  • TensorBoard: For visualizing metrics during training (e.g., detailed loss curves). Provides a secondary web UI dashboard. Data stored in .trieye_data/<app_name>/runs/<run_name>/tensorboard/. Managed by trieye.
  • Pydantic: For configuration management and data validation (used by alphatriangle and trieye).
  • Typer: For the command-line interface.
  • Rich: For enhanced CLI output formatting and styling.
  • Pytest: For running unit tests.

Project Structure

.
โ”œโ”€โ”€ .github/workflows/      # GitHub Actions CI/CD
โ”‚   โ””โ”€โ”€ ci_cd.yml
โ”œโ”€โ”€ .trieye_data/           # Root directory for ALL persistent data (GITIGNORED) - Managed by Trieye
โ”‚   โ””โ”€โ”€ alphatriangle/      # Default app_name
โ”‚       โ”œโ”€โ”€ mlruns/         # MLflow internal tracking data & artifact store (for MLflow UI)
โ”‚       โ”‚   โ””โ”€โ”€ <experiment_id>/
โ”‚       โ”‚       โ””โ”€โ”€ <mlflow_run_id>/
โ”‚       โ”‚           โ”œโ”€โ”€ artifacts/ # MLflow's copy of logged artifacts (checkpoints, buffers, etc.)
โ”‚       โ”‚           โ”œโ”€โ”€ metrics/
โ”‚       โ”‚           โ”œโ”€โ”€ params/
โ”‚       โ”‚           โ””โ”€โ”€ tags/
โ”‚       โ””โ”€โ”€ runs/           # Local artifacts per run (source for TensorBoard UI & resume)
โ”‚           โ””โ”€โ”€ <run_name>/ # e.g., train_YYYYMMDD_HHMMSS
โ”‚               โ”œโ”€โ”€ checkpoints/ # Saved model weights & optimizer states (*.pkl)
โ”‚               โ”œโ”€โ”€ buffers/     # Saved experience replay buffers (*.pkl)
โ”‚               โ”œโ”€โ”€ logs/        # Plain text log files for the run (*.log) - App + Trieye logs
โ”‚               โ”œโ”€โ”€ tensorboard/ # TensorBoard log files (event files)
โ”‚               โ”œโ”€โ”€ profile_data/ # cProfile output files (*.prof) if profiling enabled
โ”‚               โ””โ”€โ”€ configs.json # Copy of run configuration
โ”œโ”€โ”€ alphatriangle/          # Source code for the AlphaZero agent package
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ cli.py              # CLI logic (train, ml, tb, ray commands - headless only, uses Rich)
โ”‚   โ”œโ”€โ”€ config/             # Pydantic configuration models (Model, Train, MCTS) - Stats/Persistence now in Trieye
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ features/           # Feature extraction logic (operates on trianglengin.GameState)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ logging_config.py   # Centralized logging setup function (for console)
โ”‚   โ”œโ”€โ”€ nn/                 # Neural network definition and wrapper (implements trimcts.AlphaZeroNetworkInterface)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ rl/                 # RL components (Trainer, Buffer, Worker using trimcts)
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ training/           # Training orchestration (Loop, Setup, Runner, WorkerManager) - Interacts with TrieyeActor
โ”‚   โ”‚   โ””โ”€โ”€ README.md
โ”‚   โ””โ”€โ”€ utils/              # Shared utilities and types (specific to AlphaTriangle)
โ”‚       โ””โ”€โ”€ README.md
โ”œโ”€โ”€ tests/                  # Unit tests (for alphatriangle components, excluding MCTS, Stats, Data)
โ”‚   โ”œโ”€โ”€ conftest.py
โ”‚   โ”œโ”€โ”€ nn/
โ”‚   โ”œโ”€โ”€ rl/
โ”‚   โ””โ”€โ”€ training/
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ .python-version
โ”œโ”€โ”€ LICENSE                 # License file (MIT)
โ”œโ”€โ”€ MANIFEST.in             # Specifies files for source distribution
โ”œโ”€โ”€ pyproject.toml          # Build system & package configuration (depends on trianglengin, trimcts, trieye, rich)
โ”œโ”€โ”€ README.md               # This file
โ””โ”€โ”€ requirements.txt        # List of dependencies (includes trianglengin, trimcts, trieye, rich)

Key Modules (alphatriangle)

  • cli: Defines the command-line interface using Typer (train, ml, tb, ray commands - headless). Uses rich for styling. (alphatriangle/cli.py)
  • config: Centralized Pydantic configuration classes (Model, Train, MCTS). Imports EnvConfig from trianglengin. Uses TrieyeConfig from trieye for stats/persistence. (alphatriangle/config/README.md)
  • features: Contains logic to convert trianglengin.GameState objects into numerical features (StateType). (alphatriangle/features/README.md)
  • logging_config: Defines the setup_logging function for centralized console logger configuration. (alphatriangle/logging_config.py)
  • nn: Contains the PyTorch nn.Module definition (AlphaTriangleNet) and a wrapper class (NeuralNetwork). The NeuralNetwork class implicitly conforms to the trimcts.AlphaZeroNetworkInterface protocol. (alphatriangle/nn/README.md)
  • rl: Contains RL components: Trainer (network updates), ExperienceBuffer (data storage, supports PER), and SelfPlayWorker (Ray actor for parallel self-play using trimcts.run_mcts). Workers now send RawMetricEvents to the TrieyeActor. (alphatriangle/rl/README.md)
  • training: Orchestrates the headless training process using TrainingLoop, managing workers, data flow, and interaction with the TrieyeActor for logging, checkpointing, and state loading. Includes runner.py for the callable training function. (alphatriangle/training/README.md)
  • utils: Provides common helper functions and shared type definitions specific to the AlphaZero implementation. (alphatriangle/utils/README.md)

Setup

  1. Clone the repository (for development):
    git clone https://github.com/lguibr/alphatriangle.git
    cd alphatriangle
  2. Create a virtual environment (recommended):
    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the package (including trianglengin, trimcts, trieye, and rich):
    • For users:
      # This will automatically install trianglengin, trimcts, trieye, and rich from PyPI if available
      pip install alphatriangle
      # Or install directly from Git (installs dependencies from PyPI)
      # pip install git+https://github.com/lguibr/alphatriangle.git
    • For developers (editable install):
      • First, ensure trianglengin, trimcts, and trieye are installed (ideally in editable mode from their own directories if developing all):
        # From the trianglengin directory (requires C++ build tools):
        # pip install -e .
        # From the trimcts directory (requires C++ build tools):
        # pip install -e .
        # From the trieye directory:
        # pip install -e .
      • Then, install alphatriangle in editable mode:
        # From the alphatriangle directory:
        pip install -e .
        # Install dev dependencies (optional, for running tests/linting)
        pip install -e .[dev] # Installs dev deps from pyproject.toml
    Note: Ensure you have the correct PyTorch version installed for your system (CPU/CUDA/MPS). See pytorch.org. Ray may have specific system requirements. trianglengin and trimcts require a C++ compiler (like GCC, Clang, or MSVC) and CMake.
  4. (Optional but Recommended) Add data directory to .gitignore: Ensure the .gitignore file in your project root contains the line:
    .trieye_data/
    

Running the Code (CLI)

Use the alphatriangle command for training and monitoring. The CLI uses rich for enhanced output.

  • Show Help:
    alphatriangle --help
  • Run Training (Headless Only):
    alphatriangle train [--seed 42] [--log-level INFO] [--profile] [--run-name my_custom_run]
    • --log-level: Set console logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL). Default: INFO. Logs are saved to .trieye_data/alphatriangle/runs/<run_name>/logs/.
    • --seed: Set the random seed for reproducibility. Default: 42.
    • --profile: Enable cProfile for worker 0. Generates .prof files in .trieye_data/alphatriangle/runs/<run_name>/profile_data/.
    • --run-name: Specify a custom name for the run. Default: train_YYYYMMDD_HHMMSS.
  • Launch MLflow UI: Launches the MLflow web interface, automatically pointing to the .trieye_data/alphatriangle/mlruns directory.
    alphatriangle ml [--host 127.0.0.1] [--port 5000]
    Access via http://localhost:5000 (or the specified host/port).
  • Launch TensorBoard UI: Launches the TensorBoard web interface, automatically pointing to the .trieye_data/alphatriangle/runs directory (which contains the individual run subdirectories with tensorboard logs).
    alphatriangle tb [--host 127.0.0.1] [--port 6006]
    Access via http://localhost:6006 (or the specified host/port).
  • Launch Ray Dashboard UI: Launches the Ray Dashboard web interface. Note: This typically requires a Ray cluster to be running (e.g., started by alphatriangle train or manually).
    alphatriangle ray [--host 127.0.0.1] [--port 8265]
    Access via http://localhost:8265 (or the specified host/port).
  • Interactive Play/Debug (Use trianglengin CLI): Note: Interactive modes are part of the trianglengin library, not this alphatriangle package.
    # Ensure trianglengin is installed
    trianglengin play [--seed 42] [--log-level INFO]
    trianglengin debug [--seed 42] [--log-level DEBUG]
  • Running Unit Tests (Development):
    pytest tests/
  • Analyzing Profile Data (if --profile was used): Use the provided analyze_profiles.py script (requires typer).
    python analyze_profiles.py .trieye_data/alphatriangle/runs/<run_name>/profile_data/worker_0_ep_<ep_seed>.prof [-n <num_lines>]

Configuration

  • AlphaTriangle Specific: Parameters for the Model (ModelConfig), Training (TrainConfig), and MCTS (AlphaTriangleMCTSConfig) are defined in the Pydantic classes within the alphatriangle/config/ directory.
  • Environment: Environment configuration (EnvConfig) is defined within the trianglengin library.
  • Stats & Persistence: Statistics logging and data persistence are configured via TrieyeConfig (which includes StatsConfig and PersistenceConfig) from the trieye library. These are typically instantiated in alphatriangle/training/runner.py or alphatriangle/cli.py.

Data Storage

All persistent data is now managed by the trieye library and stored within the .trieye_data/<app_name>/ directory (default: .trieye_data/alphatriangle/) in the project root. This directory should be added to your .gitignore.

  • .trieye_data/<app_name>/mlruns/: Managed by MLflow (via trieye). Contains MLflow's internal tracking data (parameters, metrics) and its own copy of logged artifacts. This is the source for the MLflow UI (alphatriangle ml).
  • .trieye_data/<app_name>/runs/: Managed by trieye. Contains locally saved artifacts for each run (checkpoints, buffers, TensorBoard logs, configs, profile data) before/during logging to MLflow. This directory is used for auto-resuming and is the source for the TensorBoard UI (alphatriangle tb).
    • Replay Buffer Content: The saved buffer file (buffer.pkl) contains Experience tuples: (StateType, PolicyTargetMapping, n_step_return). The StateType includes:
      • grid: Numerical features representing grid occupancy.
      • other_features: Numerical features derived from game state and available shapes.
    • Visualization: This stored data allows offline analysis and visualization of grid occupancy and the available shapes for each recorded step. It does not contain the full sequence of actions or raw GameState objects needed for a complete, interactive game replay.

Maintainability

This project includes README files within each major alphatriangle submodule. Please keep these READMEs updated when making changes to the code's structure, interfaces, or core logic, especially regarding the interaction with the trieye library.

About

An AlphaZero-style AI agent for a custom triangle puzzle game. Features distributed self-play with Ray, a C++ MCTS core, and a PyTorch neural network. Includes async logging via MLflow & TensorBoard.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages