CONCEPÇÃO E APLICAÇÃO DE ALGORITMO DE APRENDIZADO POR REFORÇO COMBINADO COM DEEP LEARNING
BECCHI, Gabriel Chaves1; COELHO, Leandro dos Santos2;
Introdução:Reinforcement Learning (RL) refers to a kind of goal-oriented algorithm of the Machine Learning (ML) field for engines that adapt and optimize the control over an environment. In RL the agent receives a delayed reward in the next time step to evaluate its previous action with focus in suitable action to maximize reward in a particular situation. In this context, Q-Learning (QL) is a classical RL model and Deep Q-Learning (DQL) is an improvement of QL using deep learning (DL) concepts. Both QL and DQL approaches have application on model-free control systems.
Objetivo:The main objective of this project is to analyze the QL and the DQL approaches when applied to a case study related to control system field. The secondary objectives are (i) to analyze the literature of RL and its variants, (ii) implement and test a QL approach; (iii) implement and test a DQL approach; (iv) test and results analysis of the implemented system in a control system application.
Metodologia:The control environment adopted was the OpenAI’s cart pole, where a pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center. QL and DQL approaches to control the cart pole system were implemented and validated in Python language (version 3.6).
Resultados:The results of cart pole control were promising. QL has shown faster training results that DQL. On the other hand, DQL took a great number of episodes to reach a suitable control state to control in closed-loop the cart pole system.
Conclusões:For the chosen cart pole case study, QL has a better result than the DQL in terms of control performance and training speed. In general, the DQL could be a viable approach to tests in more complex environments, which the number of possible states would make impracticable for a q-table.
Palavras-chave: Machine learning. Reinforcement learning. Q-learning. Deep reinforcement learning. Deep q-learning.