Environment Specifications¶
This section provides additional information regarding the environment implemented in ALE.
Available Actions¶
The following regular actions are defined by the Action
enum in common/Constants.h
. These can also be accessed in Python through the enum object ale_py.Action
. These actions are interpreted by ALE as follows:
Index |
Action |
Description |
---|---|---|
0 |
|
No operation, do nothing. |
1 |
|
Press the fire button without updating the joystick position |
2 |
|
Apply a Δ-movement upwards on the joystick |
3 |
|
Apply a Δ-movement rightward on the joystick |
4 |
|
Apply a Δ-movement leftward on the joystick |
5 |
|
Apply a Δ-movement downward on the joystick |
6 |
|
Execute |
7 |
|
Execute |
8 |
|
Execute |
9 |
|
Execute |
10 |
|
Execute |
11 |
|
Execute |
12 |
|
Execute |
13 |
|
Execute |
14 |
|
Execute |
15 |
|
Execute |
16 |
|
Execute |
17 |
|
Execute |
40 |
|
Toggles the Atari 2600 reset switch, not used for resetting the environment |
1: Note that the RESET
action toggles the Atari 2600 reset switch, rather than reset the
environment, and as such is ignored by most interfaces.
Note: There are two main types of controllers on the Atari 2600 console. The joystick controller and the paddle controller. For paddle controllers all *RIGHT*
actions correspond to a Δ-movment to the right on the wheel, and all *LEFT*
actions correspond to a Δ-movement to the left.
Terminal States¶
Once the end of episode is reached (a terminal state in RL terminology), no further emulation takes place until the appropriate reset command is sent. This command is distinct from the Atari 2600 reset. This “system reset” avoids odd situations where the player can reset the game through button presses, or where the game normally resets itself after a number of frames. This makes for a cleaner environment interface. The interfaces described here all provide a system reset command or method.
Color Averaging¶
Many Atari 2600 games display objects on alternating frames (sometimes even less frequently).
This can be an issue for agents that do not consider the whole screen history.
By default, color averaging is not enabled, that is, the environment output is the actual frame from the emulator.
This behaviour can be turned on using setBool
with the color_averaging
key.
Action Repeat Stochasticity¶
Beginning with ALE 0.5.0, there is now an option (enabled by default) to add
action repeat stochasticity to the environment. With probability 𝗉 (default: 𝗉 = 0.25),
the previously executed action is executed again during the next frame, ignoring the agent’s
actual choice. This value can be modified using the option action_repeat_probability
.
The default value was chosen as the highest value for which human play-testers
were unable to detect any delay or control lag. (Machado et al. 2018).
The motivation for introducing action repeat stochasticity was to help separate trajectory optimization research from robust controller optimization, the latter often being the desired outcome in reinforcement learning (RL). We strongly encourage RL researchers to use the default stochasticity level in their agents, and clearly report the setting used.
Minimal Action Set¶
It may sometimes be convenient to restrict the agent to a smaller action set. This can be
accomplished by querying the RomSettings
class using the method
getMinimalActionSet
. This then returns a set of actions judged “minimal” to play a given
game. Due to the potentially high impact of this setting on performance, we encourage researchers
to clearly report the method used in their experiments.
Modes and Difficulties¶
ALE 0.6.0 introduces modes and difficulties, which can be set using the relevant methods setMode
, setDifficulty
. These introduce a whole range of new environments. For more details, see Machado et al. 2018.
References¶
[1] Machado et al. “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents” Journal of Artificial Intelligence Research (2018) URL: https://jair.org/index.php/jair/article/view/11182