On Reinforcement Learning

Hi!

This is kind of a long story, so feel free to skip to the tl;dr. :)

In the winter semester 2018/19, I attended a lecture offered by the Intelligent Autonomous Systems (IAS) [1] group at TU Darmstadt. These lectures have the reputation to be very time-intensive, but they are just as much rewarding. Only for that semester, there was a rare opportunity to get the whole swoop of reinforcement learning in the form of a lecture, a project, and a seminar paper. Though this meant sacrificing some CP for the semester, I felt like this was a great chance to get into the topic. So I did.

As I was thinking about a topic for my seminar paper, most other students chose something pragmatic like a survey, or comparisons between approaches. Since I had a good understanding of the basics of machine learning from the previous lecture in Statistical Machine Learning, I wanted to do something that fits my style better. So I thought about writing an introductory paper to reinforcement learning. Especially focused on policy gradients, as I intuitively liked the implications of directly working with a policy.

The first draft I came up with attempted to bring in a ton of references to current SotA approaches. I showed it to my supervisor, Samuele Tosatto (who was very patient with me; thanks for that!), who gave me a ton of criticism. Constructive criticism, that is. After that meeting, I had a much clearer idea about what I wanted to achieve with the paper. In essence, I wanted a formal introduction to policy gradient approaches. I changed the name to “An Introduction to Policy Gradients”, and set myself three terms I wanted to adhere to. My paper should be

complete, i.e., formally define everything we use,
concise, i.e., only introduce necessary definitions,
comprehensible, i.e., avoid logical shortcuts.

To be honest, my understanding is that all papers should follow these rules. Alas that this has not been my experience. Don’t get me wrong, I know this is hard, and it gets harder the more sophisticated the ideas we want to convey become. But you notice when someone is not trying hard enough. ;)

When I was done with the seminar, and sent in the finished paper, I thought about the naming again. I feel like “An Introduction” is not what people want to read. Hence I figured it was better to go with the name Samuele suggested.

TL;DR: I would like to introduce you to my paper “On Policy Gradients”. Feel free to check it out at https://arxiv.org/abs/1911.04817.

[1] https://www.ias.informatik.tu-darmstadt.de/

On Reinforcement Learning

Share: