No description, website, or topics provided.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

BA18_stdm_3: Using Reinforcement Learning to Play Bomberman


  • Joram Liebeskind
  • Silvan Wehner


  • Oliver Dürr
  • Thilo Stadelmann

Directory Structure Overview and Descriptions.

|-- experiment_plots/                  Plots of all experiments.
|   |
|   |-- dqn/                           Plots of DQN experiments.
|   |-- maddpg/                        Plots of MADDPG experiments.
|-- src/                               Project source code.
|   |-- ActorCriticModels/             All models used for the experiments.
|   |-- agents/                        DQN and MADDPG agents.
|   |-- docker/Dockerfile              Dockerfile to generate image used to run the experiments.
|   |--             Autoencoder training runner.
|   |--                     Generates LaTeX tables from hyperparameters.
|   |-- experiment_data/               Where all experiment data is saved.
|   |--               Experiment logger.
|   |-- experiments/                   Experiment classes which handle setup and handling of experiments and agents.
|   |--             All the hyperparameters of the experiments.
|   |-- p/                             Pommerman repository.
|   |--                        Creates plots from experiments.
|   |-- preprocessors/                 Observation preprocessors.
|   |--                Implementation of a replay memory.
|   |-- reward_shapers/                Reward shapers.
|   |--                      Runs the experiments.
|   |--          Handles all the tensorboard related stuff.
|   |--                        Utility functions.
|-- Bachelor-Thesis-Stdm-3-2018.pdf    The Bachelor Thesis.
|--                          This file.

Software Dependencies

To reproduce the experiments described in the thesis the following requirements have to be installed:

Programs and Libraries

  • Python 3.5 or higher
  • high performance message passing library headers. (In ubuntu repositories: libopenmpi-dev)
  • zlib compression library headers. (In ubuntu repositories: zlib1g-dev)

We recommend using the docker image provided in src/docker or on docker hub:

Python Packages

Running Experiments


This script runs an experiment and saves models and log data to experiment_data/<experiment_name>/.


 -e, --experiment      experiment name
 -p, --postfix         A postfix to be added to the end of the experiment name. Defaults to current date.
 -n, --no_postfix      Flag to ignore the prefix
 -d, --demo            Demo an experiment. No training is done and the game is rendered.
 -r, --render          Flag to enable rendering during training

Experiments on this data medium:

All experiments are without postfix, so the -n parameter has to be provided.

  • DQN_experiment_1
  • DQN_experiment_2
  • DQN_experiment_3
  • DQN_experiment_4
  • DQN_experiment_5
  • DQN_experiment_6
  • DQN_experiment_7
  • MADDPG_experiment_1
  • MADDPG_experiment_2

Example: Running DQN Experiment 1 Without Prefix

python --experiment=DQN_experiment_1 -n

Generating Plots


This script generates plots in the experiment_data/<experiment_name>/plots directory.


 -e, --experiment      Path of the experiment data directory, or the experiment name.
 -p, --postfix         A postfix to be added at the end of the experiment name. Defaults to the current date.

Example: Creating Plots of DQN Experiment 1

python --experiment=./experiment_data/DQN_experiment_1

Experiment Plots Descriptions

For some of the plots there are multiple, for agents 0 to 3. This is the case for experiments in multi-agent learning experiments.

  • evaluation_* From an evaluation period: typically 100 games every n training iterations.

  • training_episode_* From episodes run during training.

  • training_history_*: Over the past 500 training iterations.

  • *_action_count: Stacked plot of all actions.

  • *_action_probability: Stacked plot of probabilities of each action.

  • *_results: Game outcomes.

  • *_rewards: Rewards received.

  • *_q_values: Q-value predictions.

  • *_q_values_mean: average Q-values.

  • *_q_values_stddev: standard deviation of Q-values.

  • *_action_entropy: The entropy over actions.

  • training_history_epsilon: Exact value of epsilon over training iterations.