Document

Modified

Method

Run Time (s)

Status

appendix/intro-to-probability-theory

2024-01-04 21:40

cache

3.54

multi-agent/backward-induction

2024-01-04 21:41

cache

20.46

multi-agent/extensive-form

2024-01-04 21:41

cache

2.02

multi-agent/multi-agent-rl

2024-01-04 21:41

cache

2.89

multi-agent/normal-form

2024-01-04 21:41

cache

2.89

single-agent/MDPs

2024-01-04 21:41

cache

3.56

single-agent/function-approximation

2024-01-05 00:18

cache

29.51

single-agent/mcts

2024-01-04 21:41

cache

2.5

single-agent/multi-armed-bandits

2024-01-04 21:42

cache

14.75

single-agent/n-step

2024-01-04 21:42

cache

2.4

single-agent/policy-gradients

2024-01-04 21:43

cache

94.76

single-agent/policy-iteration

2024-01-04 21:43

cache

5.18

single-agent/reward-shaping

2024-01-04 21:44

cache

20.96

single-agent/temporal-difference-learning

2024-01-05 00:03

cache

27.2

single-agent/value-iteration

2024-01-04 21:44

cache

17.15

Introduction#

This book provides a foundational introduction to the problem of reinforcement learning. It combines narrative, maths, and code, to help the reader gain an introduction to the area, why it exists, how to solve reinforcement learning problems, and the strengths and weaknesses of different approaches.

The aim of the book is to provide the reader with sufficient foundation that they can design and implement agents using reinforcement learning, and can understand more advanced topics covered in research papers.

We start with dynamic programming approaches, such as value iteration. While the current hype is around deep reinforcement learning, understanding value iteration helps learners to appreciate what techniques like deep policy gradients and actor critic methods are optimising, and how they work. Oh and, for many MDP problems, value iteration is a much better solution than deep reinforcement learning.

In Part I of these notes, we introduce Markov Decision Processes (MDPs). MDPs allow us to model problems in which the outcomes of actions are probabilistic; that is, we do not know the outcome beforehand, but we know there is some probability distribution over a set of possible outcomes. We look at model-based techniques, where these probabilistic outcomes are given to use, and model-free techniques, which are flexible enough the probabilities are unknown, but we can sample enough times that we can still learn good behaviour.

In Part II of these notes, we look at multi-agent MDPs (sometimes called games), in which there are multiple (possibly adversarial) actors in a problem, and we need to plan our actions while also considering what the other actors in the environment will do. Again, we look at both model-based and model-free techniques.

The book#

This book is written using Markdown, and compiled into HTML using Jupyter Book — an extension of Jupyter notebooks.

Each individual HTML page can be downloaded individuallly as a Jupyter notebook, but note that you need to install the code and dependencies below.

Code#

All code in this book is executable. You can download the code from here.

Note

The code in this book is written for understandability rather than efficiency. It is not intended to be production-level code, such as the Keras RL reinforcement learning package.

If you understand the code in these notes, you will have little problem using production-level packages such as Keras RL.

The code in this book is written with the attempt to use very little Python-specific syntax, to enable those less familiar with Python to understand code snippets.

The code in this book is written using as few external libraries as possible, to make this easy to download and run yourself.

Once you have downloaded, unzip the code and add the folder to your PYTHONPATH variable if you want to download the Jupyter notebooks.

Most files in the code have a main function that can be run using just python <filename>py. For most of these, no external libraries are required. However, if you want to plot the graphs or draw the trees, you will need to install:

  1. The Matplotlib library for plotting graphs. You can download from the website or install with pip install matplotlib.

  2. The Scipy library for helping with the graph plotting. You can download from the website or install with pip install scipy.

  3. The Graphviz Python library for drawing trees. You can download from the website or use pip install graphviz. To render the generated graphs, you will also need to install Graphviz the tool, which is called by the Python package.

The Author#

These notes are written and maintained by Tim Miller, Professor of Artifical Intelligence at The University of Queensland, Brisbane/Meaanjin, Australia.

If you find any errors or would like to provide other feedback, feel free to email me.

If you use this as part of your teaching or learning in a course, please let me know! I’d love to hear from you.

Acknowledgements#

Thanks to Alan Lewis for his excellent idea of demonstrating policy gradients using a logistic regression policy; and furthermore, for implementing the source for this and the deep policy gradient agent. Thanks also to Alan for setting up the library for play GIF files, which supports the interactive visualisations that are so useful in this book.

Thanks to Emma Baillie for the idea and implementation of the Contested Crossing examples.