Reinforcement learning provides a method for learning in games. In this program you will implement a simple reinforcement learning mechanism to learn in a simple maze game involving a starting point, a goal, obstacles, and an opponent that is trying to catch you.
In this code (prog4.lisp) a simple maze game is implemented in lisp. The code implements two routines of interest:
To see how the game works you can load the initial code (load "prog4.lisp") and then type (play-games 5 T nil) to play 5 games interactively. the code shows the maze to the keyboard and requests that a user (you) supply an action to choose. The Maze shown has Xs for obstacles, a p for the player, an O for the opponent who is chasing you and a G for the goal. Actions are 1 (left), 2 (up), 3 (right), 4 (down). The code as written queries the user about the move to make. You will replace this code with code to have a computer learner play the game. The relevant routines you need to implement are:
You will need to create a description of the board (the state of the board). Since the obstacles are all fixed and the goal is always in the same location, the only thing that changes about the board is the position of the player and the opponent. You will need to come up with a mechanism to assign a unique state number to each possible combination of positions of the player and opponent (I suggest you write a function to do this).
You will then need to create a Q table and a Visits table. The former will need to have a row for each possible state number and then 4 columns for each action. The initial values of this table should all be 0.0. The Visits table will count how many times you have tried each state/action combination. This will also have a row for each possible state number and 4 columns for each action. The initial values of this table should be 0.
Once you have your code learning you should use the code run-experiment to generate five learning curves (call it five times). This code trains your model using 100 games and then plays 100 games with learning turned off, then 100 more games (for a total of 200 training games), then plays 100 games without learning, then 100 more, etc. This code should produce something like this:
Training up to 100 Training up to 200 Training up to 300 Training up to 400 Training up to 500 Training up to 600 Training up to 700 Training up to 800 Training up to 900 Training up to 1000 Training up to 1250 Training up to 1500 Training up to 2000 Training up to 2500 Training up to 3000 100: 21 Wins, 78 Losses, 1 Draws 200: 36 Wins, 42 Losses, 22 Draws 300: 40 Wins, 25 Losses, 35 Draws 400: 41 Wins, 22 Losses, 37 Draws 500: 54 Wins, 22 Losses, 24 Draws 600: 55 Wins, 21 Losses, 24 Draws 700: 63 Wins, 6 Losses, 31 Draws 800: 68 Wins, 7 Losses, 25 Draws 900: 56 Wins, 6 Losses, 38 Draws 1000: 62 Wins, 8 Losses, 30 Draws 1250: 63 Wins, 12 Losses, 25 Draws 1500: 72 Wins, 5 Losses, 23 Draws 2000: 77 Wins, 2 Losses, 21 Draws 2500: 73 Wins, 0 Losses, 27 Draws 3000: 72 Wins, 3 Losses, 25 Draws
The first part (e.g., Training up to 100, etc. simply shows the progress of the training), the second part shows how many wins/losses/draws the player gets after that amount of training (so 200: 36 Wins, 42 Losses, 22 Draws indicates that after 200 training games the computer player wins 36 games, loses 42 and has 22 draws when testing). Average these results for your five runs and present the result as a graph with the x axis being the number of training games and the y axis showing the average number of wins, losses and draws (three separate lines). Discuss your results in the material you hand in.
Turn in a commented copy of your code, a printout of the five runs of run-experiment you run and the graph of your results along with a discussion of these results.