Deep reinforcement learning with constraints and demonstrations

Reinforcement Learning (RL) has been successfully applied to a number of problems, such as robotic control, task scheduling, and telecommunications.
In the progressive learning process, RL agents are generally free to explore all potential behaviors. However, this freedom is not acceptable in many real-world applications, since "free" exploration could lead to dangerous actions that damage the system or even hurt people. In these types of situations, it must be ensured that the exploration is completely safe and controlled. The first objective of this thesis is therefore to propose a novel method that is capable to handle general constraints that occur commonly in real-world applications (such as discounted cumulative, mean value, state-wise constraints). The respect of constraints throughout the learning process is expected in order to guarantee the security requirements.
The second objective of this thesis is to accelerate the convergence speed. This is motivated by the fact that the convergence of RL algorithms, when it occurs, is often very slow. One way to speed it up is to take advantage of human knowledge, which indicates usually data from expert demonstrations. The method developed in this thesis will be able to utilize experts demonstrations of IFPEN, including both measured data and optimal solutions from determinist optimizations.
Moreover, it is known that successfully applying an RL algorithm to a real-world application is often a challenge. The last objective of this thesis is then to make the proposed method user-friendly : easy to apply to real-world applications. The method should be tested on some IFPEN applications, such as eco-driving, closed-loop control of wind farms, and electrical network control.

Keywords: reinforcement learning, constrained markov decision processes, optimal control, optimization

Academic supervisor CR, BUSIC Ana, Inria Paris / Département d’Informatique de l’ENS, Université PSL
Doctoral School ED386 DI ENS, http://ed386.sorbonne-universite.fr/fr/index.html
IFPEN supervisor Dr, ZHU Jiamin, Control, Signal and System, jiamin.zhu@ifpen.fr
PhD location Département d’Informatique de l’ENS, Paris, France IFP Energies nouvelles, Rueil-Malmaison, France
Duration and start date 3 years, starting in fourth quarter 2021
Employer INRIA, Paris, France
Academic requirements University Master degree in relevant disciplines
Language requirements Fluency in English, willingness to learn French
Other requirements Knowledge on information, probability/ statistics and data science, optimization/ optimal control

Contact

Encadrant IFPEN :

Dr, ZHU Jiamin

PhD student of the thesis:

Claire BIZON MONROC

Promotion 2021-2024