This is kind of a long story, so feel free to skip to the tl;dr. :)

In the winter semester 2018/19, I attended a lecture offered by the Intelligent Autonomous Systems (IAS) [1] group at TU Darmstadt. These lectures have the reputation to be very time-intensive, but they are just as much rewarding. Only for that semester, there was a rare opportunity to get the whole swoop of reinforcement learning in the form of a lecture, a project, and a seminar paper. Though this meant sacrificing some CP for the semester, I felt like this was a great chance to get into the topic. So I did.

As I was thinking about a topic for my seminar paper, most other students chose something pragmatic like a survey, or comparisons between approaches. Since I had a good understanding of the basics of machine learning from the previous lecture in Statistical Machine Learning, I wanted to do something that fits my style better. So I thought about writing an introductory paper to reinforcement learning. Especially focused on policy gradients, as I intuitively liked the implications of directly working with a policy.

The first draft I came up with attempted to bring in a ton of references to current SotA approaches. I showed it to my supervisor, Samuele Tosatto (who was very patient with me; thanks for that!), who gave me a ton of criticism. Constructive criticism, that is. After that meeting, I had a much clearer idea about what I wanted to achieve with the paper. In essence, I wanted a formal introduction to policy gradient approaches. I changed the name to “An Introduction to Policy Gradients”, and set myself three terms I wanted to adhere to. My paper should be

  • complete, i.e., formally define everything we use,
  • concise, i.e., only introduce necessary definitions,
  • comprehensible, i.e., avoid logical shortcuts.

To be honest, my understanding is that all papers should follow these rules. Alas that this has not been my experience. Don’t get me wrong, I know this is hard, and it gets harder the more sophisticated the ideas we want to convey become. But you notice when someone is not trying hard enough. ;)

When I was done with the seminar, and sent in the finished paper, I thought about the naming again. I feel like “An Introduction” is not what people want to read. Hence I figured it was better to go with the name Samuele suggested.

TL;DR: I would like to introduce you to my paper “On Policy Gradients”. Feel free to check it out at https://arxiv.org/abs/1911.04817.

[1] https://www.ias.informatik.tu-darmstadt.de/