No description, website, or topics provided.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
experiment_plots
src
.gitignore
Bachelor-Thesis-Stdm-3-2018.pdf
README.md

README.md

BA18_stdm_3: Using Reinforcement Learning to Play Bomberman

Authors:

  • Joram Liebeskind
  • Silvan Wehner

Supervisors:

  • Oliver Dürr
  • Thilo Stadelmann

Directory Structure Overview and Descriptions.

|-- experiment_plots/                  Plots of all experiments.
|   |
|   |-- dqn/                           Plots of DQN experiments.
|   |-- maddpg/                        Plots of MADDPG experiments.
|
|-- src/                               Project source code.
|   |-- ActorCriticModels/             All models used for the experiments.
|   |-- agents/                        DQN and MADDPG agents.
|   |-- docker/Dockerfile              Dockerfile to generate image used to run the experiments.
|   |-- encoder_trainer.py             Autoencoder training runner.
|   |-- exp2tex.py                     Generates LaTeX tables from hyperparameters.
|   |-- experiment_data/               Where all experiment data is saved.
|   |-- ExperimentLog.py               Experiment logger.
|   |-- experiments/                   Experiment classes which handle setup and handling of experiments and agents.
|   |-- hyperparameters.py             All the hyperparameters of the experiments.
|   |-- p/                             Pommerman repository.
|   |-- plot.py                        Creates plots from experiments.
|   |-- preprocessors/                 Observation preprocessors.
|   |-- ReplayMemory.py                Implementation of a replay memory.
|   |-- reward_shapers/                Reward shapers.
|   |-- runner.py                      Runs the experiments.
|   |-- tensorboard_runner.py          Handles all the tensorboard related stuff.
|   |-- util.py                        Utility functions.
|
|-- Bachelor-Thesis-Stdm-3-2018.pdf    The Bachelor Thesis.
|-- README.md                          This file.

Software Dependencies

To reproduce the experiments described in the thesis the following requirements have to be installed:

Programs and Libraries

  • Python 3.5 or higher
  • high performance message passing library headers. (In ubuntu repositories: libopenmpi-dev)
  • zlib compression library headers. (In ubuntu repositories: zlib1g-dev)

We recommend using the docker image provided in src/docker or on docker hub: https://hub.docker.com/r/dujoram/pommerman-tensorflow-gpu/

Python Packages

Running Experiments

Script: runner.py

This script runs an experiment and saves models and log data to experiment_data/<experiment_name>/.

Parameters

 -e, --experiment      experiment name
 -p, --postfix         A postfix to be added to the end of the experiment name. Defaults to current date.
 -n, --no_postfix      Flag to ignore the prefix
 -d, --demo            Demo an experiment. No training is done and the game is rendered.
 -r, --render          Flag to enable rendering during training

Experiments on this data medium:

All experiments are without postfix, so the -n parameter has to be provided.

  • DQN_experiment_1
  • DQN_experiment_2
  • DQN_experiment_3
  • DQN_experiment_4
  • DQN_experiment_5
  • DQN_experiment_6
  • DQN_experiment_7
  • MADDPG_experiment_1
  • MADDPG_experiment_2

Example: Running DQN Experiment 1 Without Prefix

python runner.py --experiment=DQN_experiment_1 -n

Generating Plots

Script: plot.py

This script generates plots in the experiment_data/<experiment_name>/plots directory.

Parameters

 -e, --experiment      Path of the experiment data directory, or the experiment name.
 -p, --postfix         A postfix to be added at the end of the experiment name. Defaults to the current date.

Example: Creating Plots of DQN Experiment 1

python plots.py --experiment=./experiment_data/DQN_experiment_1

Experiment Plots Descriptions

For some of the plots there are multiple, for agents 0 to 3. This is the case for experiments in multi-agent learning experiments.

  • evaluation_* From an evaluation period: typically 100 games every n training iterations.

  • training_episode_* From episodes run during training.

  • training_history_*: Over the past 500 training iterations.

  • *_action_count: Stacked plot of all actions.

  • *_action_probability: Stacked plot of probabilities of each action.

  • *_results: Game outcomes.

  • *_rewards: Rewards received.

  • *_q_values: Q-value predictions.

  • *_q_values_mean: average Q-values.

  • *_q_values_stddev: standard deviation of Q-values.

  • *_action_entropy: The entropy over actions.

  • training_history_epsilon: Exact value of epsilon over training iterations.