Status
Scientific disciplines
Research direction
Digital Science and Technology
Affiliate site
Rueil-Malmaison
Reinforcement Learning (RL) has been successfully applied to a number of problems, such as robotic control, task scheduling, and telecommunications.
In the progressive learning process, RL agents are generally free to explore all potential behaviors. However, this freedom is not acceptable in many real-world applications, since "free" exploration could lead to dangerous actions that damage the system or even hurt people. In these types of situations, it must be ensured that the exploration is completely safe and controlled. The first objective of this thesis is therefore to propose a novel method that is capable to handle general constraints that occur commonly in real-world applications (such as discounted cumulative, mean value, state-wise constraints). The respect of constraints throughout the learning process is expected in order to guarantee the security requirements.
The second objective of this thesis is to accelerate the convergence speed. This is motivated by the fact that the convergence of RL algorithms, when it occurs, is often very slow. One way to speed it up is to take advantage of human knowledge, which indicates usually data from expert demonstrations. The method developed in this thesis will be able to utilize experts demonstrations of IFPEN, including both measured data and optimal solutions from determinist optimizations.
Moreover, it is known that successfully applying an RL algorithm to a real-world application is often a challenge. The last objective of this thesis is then to make the proposed method user-friendly : easy to apply to real-world applications. The method should be tested on some IFPEN applications, such as eco-driving, closed-loop control of wind farms, and electrical network control.
Keywords: reinforcement learning, constrained markov decision processes, optimal control, optimization
- Academic supervisor CR, BUSIC Ana, Inria Paris / Département d’Informatique de l’ENS, Université PSL
- Doctoral School ED386 DI ENS, http://ed386.sorbonne-universite.fr/fr/index.html
- IFPEN supervisor Dr, ZHU Jiamin, Control, Signal and System, jiamin.zhu@ifpen.fr
- PhD location Département d’Informatique de l’ENS, Paris, France IFP Energies nouvelles, Rueil-Malmaison, France
- Duration and start date 3 years, starting in fourth quarter 2021
- Employer INRIA, Paris, France
- Academic requirements University Master degree in relevant disciplines
- Language requirements Fluency in English, willingness to learn French
- Other requirements Knowledge on information, probability/ statistics and data science, optimization/ optimal control