Course Home | Syllabus | Assignments | Schedule | Notes | Downloads | [print]
CS 4320: Machine Learning
Assignment: Temporal Difference Q-Function (Reinforcement Learning)
Train a reinforcement agent to perform in the
MountainCar-v0
environment.
It is expected that you will use the CartPole
example code
as a starting point for your code development.
Train a Q-Function agent to obtain the highest score on 20 scoring epochs.
Consider hyper parameters such as gamma, epsilon-chance-factor, learning policies, neural network architecture, modified training rewards, serial training sessions, and any other potentially useful modifications to the Q-Function or training process.
Create a report that includes:
- A description of the states, actions, and rewards of the
MountainCar-v0
environment. - An outline of code changes needed to work with the
MountainCar-v0
environment. - A description of modifications attempted in the Q-Function architecture and learning process.
- A description of the best Q-Function obtained, the process used to train it, and the score obtained. This should include number of training sessions, number of epochs per session, hyper parameters used in each session, etc.
- A discussion of the effect of Q-Function and learning modifications attempted.
Required Steps
- Download the starter code.
- Verify it runs for the
CartPole
environment. - Modify the code to run with the
MountainCar
environment. - Attempt to train many agents with various combinations.
- Create a report with the contents mentioned above.
- Commit and push your code in the git repository.
- Submit the report (as PDF) to Canvas.
Last Updated 03/20/2023