ALE Vector Environment Guide

Introduction

The Arcade Learning Environment (ALE) Vector Environment provides a high-performance implementation for running multiple Atari environments in parallel. This implementation utilizes native C++ code with multi-threading to achieve significant performance improvements, especially when running many environments simultaneously.

The vector environment is equivalent to FrameStackObservation(AtariPreprocessing(gym.make("ALE/{AtariGame}-v5")), stack_size=4).

Key Features

  • Parallel Execution: Run multiple Atari environments simultaneously with minimal overhead

  • Standard Preprocessing: Includes standard preprocessing steps from the Atari Deep RL literature:

    • Frame skipping

    • Observation resizing

    • Grayscale conversion

    • Frame stacking

    • NoOp initialization at reset

    • Fire reset (for games requiring the fire button to start)

    • Episodic life modes

  • Performance Optimizations:

    • Native C++ implementation

    • Next-step autoreset (see blog for more detail)

    • Multi-threading for parallel execution

    • Thread affinity options for better performance on multi-core systems

    • Batch processing capabilities

  • Asynchronous Operation: Split step operation into send and recv for more flexible control flow

  • Gymnasium Compatible: Implements the Gymnasium VectorEnv interface

Installation

The vector implementation is packaged with ale-py that can be installed through PyPI, pip install ale-py.

Optionally, users can build the project locally, requiring VCPKG, that will install OpenCV to support observation preprocessing.

Basic Usage

Creating a Vector Environment

from ale_py.vector_env import VectorAtariEnv

# Create a vector environment with 4 parallel instances of Breakout
envs = VectorAtariEnv(
    game="Breakout",
    num_envs=4,
)

# Reset all environments
observations, info = envs.reset()

# Take random actions in all environments
actions = envs.action_space.sample()
observations, rewards, terminations, truncations, infos = envs.step(actions)

# Close the environment when done
envs.close()

Advanced Configuration

The vector environment provides numerous configuration options:

envs = VectorAtariEnv(
    # Required parameters
    game="Breakout",          # ROM name in snake_case
    num_envs=8,               # Number of parallel environments

    # Preprocessing parameters
    frame_skip=4,             # Number of frames to skip (action repeat)
    grayscale=True,           # Use grayscale observations
    stack_num=4,              # Number of frames to stack
    img_height=84,            # Height to resize frames to
    img_width=84,             # Width to resize frames to

    # Environment behavior
    noop_max=30,              # Maximum number of no-ops at reset
    fire_reset=True,          # Press FIRE on reset for games that require it
    episodic_life=False,      # End episodes on life loss
    max_episode_steps=108000, # Max frames per episode (27000 steps * 4 frame skip)
    repeat_action_probability=0.0,  # Sticky actions probability
    full_action_space=False,  # Use full action space (not minimal)

    # Performance options
    batch_size=0,             # Number of environments to process at once (default=0 is the `num_envs`)
    num_threads=0,            # Number of worker threads (0=auto)
    thread_affinity_offset=-1,# CPU core offset for thread affinity (-1=no affinity)
    seed=0,                   # Random seed
)

Observation Format

The observation format from the vector environment is:

observations.shape = (num_envs, stack_size, height, width)

Where:

  • num_envs: Number of parallel environments

  • stack_size: Number of stacked frames (typically 4)

  • height, width: Image dimensions (typically 84x84)

This differs from the standard Gymnasium Atari environment format which uses:

observations.shape = (num_envs, stack_size, height, width)  # Without num_envs

Performance Considerations

Number of Environments

Increasing the number of environments typically improves throughput until you hit CPU core limits. For optimal performance, set num_envs close to the number of physical CPU cores.

Send/Recv vs Step

Using the send/recv API can allow for better overlapping of computation and environment stepping:

# Send actions to environments
envs.send(actions)

# Do other computation here while environments are stepping

# Receive results when ready
observations, rewards, terminations, truncations, infos = envs.recv()

Batch Size

The batch_size parameter controls how many environments are processed simultaneously by the worker threads:

# Process environments in batches of 4
envs = VectorAtariEnv(game="Breakout", num_envs=16, batch_size=4)

A smaller batch size can improve latency while a larger batch size can improve throughput. When passing a batch size, the information will include the environment id of each observation which is critical as the first (batch size) observations are returned for reset and recv.

Thread Affinity

On systems with multiple CPU cores, setting thread affinity can improve performance:

# Set thread affinity starting from core 0
envs = VectorAtariEnv(game="Breakout", num_envs=8, thread_affinity_offset=0)

Examples

Training Example with PyTorch

import torch
import numpy as np
from ale_py.vector_env import VectorAtariEnv

# Create environment
envs = VectorAtariEnv(game="Breakout", num_envs=8)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize model (simplified example)
model = torch.nn.Sequential(
    torch.nn.Conv2d(4, 32, kernel_size=8, stride=4),
    torch.nn.ReLU(),
    torch.nn.Conv2d(32, 64, kernel_size=4, stride=2),
    torch.nn.ReLU(),
    torch.nn.Conv2d(64, 64, kernel_size=3, stride=1),
    torch.nn.ReLU(),
    torch.nn.Flatten(),
    torch.nn.Linear(3136, 512),
    torch.nn.ReLU(),
    torch.nn.Linear(512, envs.single_action_space.n)
).to(device)

# Reset environment
observations, _ = envs.reset()

# Training loop
for step in range(1000):
    # Convert observations to PyTorch tensors
    obs_tensor = torch.tensor(observations, dtype=torch.float32, device=device) / 255.0

    # Get actions from model
    with torch.no_grad():
        q_values = model(obs_tensor)
        actions = q_values.max(dim=1)[1].cpu().numpy()

    # Step the environment
    observations, rewards, terminations, truncations, infos = envs.step(actions)