Deep reinforcement learning with constraints and demonstrations

Status

Ongoing

Scientific disciplines

Mathematics

Research direction

Digital Science and Technology

Affiliate site

Rueil-Malmaison

Reinforcement Learning (RL) has been successfully applied to a number of problems, such as robotic control, task scheduling, and telecommunications.
In the progressive learning process, RL agents are generally free to explore all potential behaviors. However, this freedom is not acceptable in many real-world applications, since "free" exploration could lead to dangerous actions that damage the system or even hurt people. In these types of situations, it must be ensured that the exploration is completely safe and controlled. The first objective of this thesis is therefore to propose a novel method that is capable to handle general constraints that occur commonly in real-world applications (such as discounted cumulative, mean value, state-wise constraints). The respect of constraints throughout the learning process is expected in order to guarantee the security requirements.
The second objective of this thesis is to accelerate the convergence speed. This is motivated by the fact that the convergence of RL algorithms, when it occurs, is often very slow. One way to speed it up is to take advantage of human knowledge, which indicates usually data from expert demonstrations. The method developed in this thesis will be able to utilize experts demonstrations of IFPEN, including both measured data and optimal solutions from determinist optimizations.
Moreover, it is known that successfully applying an RL algorithm to a real-world application is often a challenge. The last objective of this thesis is then to make the proposed method user-friendly : easy to apply to real-world applications. The method should be tested on some IFPEN applications, such as eco-driving, closed-loop control of wind farms, and electrical network control.

Keywords: reinforcement learning, constrained markov decision processes, optimal control, optimization

  • Academic supervisor    CR, BUSIC Ana, Inria Paris / Département d’Informatique de l’ENS, Université PSL
  • Doctoral School    ED386 DI ENS, http://ed386.sorbonne-universite.fr/fr/index.html
  • IFPEN supervisor    Dr, ZHU Jiamin, Control, Signal and System, jiamin.zhu@ifpen.fr
  • PhD location    Département d’Informatique de l’ENS, Paris, France IFP Energies nouvelles, Rueil-Malmaison, France   
  • Duration and start date    3 years, starting in fourth quarter 2021
  • Employer    INRIA, Paris, France
  • Academic requirements    University Master degree in relevant disciplines
  • Language requirements    Fluency in English, willingness to learn French
  • Other requirements    Knowledge on information, probability/ statistics and data science, optimization/ optimal control
     
Contact
Encadrant IFPEN :
Dr, ZHU Jiamin
PhD student of the thesis:
Promotion 2021-2024