Visualization

ALE offers screen display and audio capabilities via the Simple DirectMedia Layer (SDL). Screen display can be enabled using the boolean option display_screen (default: false), and sound playback using the boolean option sound (default: false).

Gymnasium API

Gymnasium provides two methods for visualizing an environment, human rendering and video recording.

Human visualization

Through specifying the environment render_mode="human" then ALE will automatically create a window running at 60 frames per second showing the environment behaviour. It is highly recommended to close the environment after it has been used such that the rendering information is correctly shut down.

import gymnasium
import ale_py

gymnasium.register_envs(ale_py)

env = gymnasium.make("ALE/Pong-v5", render_mode="human")
env.reset()
for _ in range(100):
    action = env.action_space.sample()

    obs, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        obs, info = env.reset()

env.close()

Recording videos

Specifying the render_mode="rgb_array" will return the rgb array from env.render(), this can be combined with the gymnasium.wrappers.RecordVideo where the environment renders are stored and saved as mp4 videos for episodes.

The example below will record episodes on every other episode (num % 2 == 0) using the episode_trigger and save the folders in saved-video-folder with filename starting video- followed by the video number.

import gymnasium
import ale_py

gymnasium.register_envs(ale_py)

env = gymnasium.make("ALE/Pong-v5", render_mode="rgb_array")
env = gymnasium.wrappers.RecordVideo(
    env,
    episode_trigger=lambda num: num % 2 == 0,
    video_folder="saved-video-folder",
    name_prefix="video-",
)
for episode in range(10):
    obs, info = env.reset()
    episode_over = False

    while not episode_over:
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)

        episode_over = terminated or truncated

env.close()

Python Interface

ALE now provides support for recording frames; if sound is enabled, it is also possible to record audio output. An example Python program is provided which will record both visual and audio output for a single episode of play.

import os
import sys
from random import randrange
from ale_py import ALEInterface

def main(rom_file, record_dir):
    ale = ALEInterface()
    ale.setInt('random_seed', 123)

    # Enable screen display and sound output
    ale.setBool('display_screen', True)
    ale.setBool('sound', True)

    # Specify the recording directory and the audio file path
    ale.setString("record_screen_dir", record_dir) # Set the record directory
    ale.setString("record_sound_filename",
                    os.path.join(record_dir, "sound.wav"))

    ale.loadROM(rom_file)

    # Get the list of legal actions
    legal_actions = ale.getLegalActionSet()
    num_actions = len(legal_actions)

    while not ale.game_over():
      a = legal_actions[randrange(num_actions)]
      ale.act(a)

    print(f"Finished episode. Frames can be found in {record_dir}")

if __name__ == '__main__':
    if len(sys.argv) < 3:
      print(f"Usage: {sys.argv[0]} rom_file record_dir")
      sys.exit()

    rom_file = sys.argv[1]
    record_dir = sys.argv[2]
    main(rom_file, "videos/")

Once frames and/or sound have been recorded, they may be joined into a video using an external program like ffmpeg. For example, you can run:

# -r frame_rate
# -i input
# -f format
# -c:a audio_codec
# -c:v video_codec

ffmpeg -r 60 \
       -i record/%06d.png \
       -i record/sound.wav \
       -f mov \
       -c:a mp3 \
       -c:v libx264 \
       agent.mov