Planejamento em grafos de computação estocástica: resolvendo problemas estocásticos não-lineares com retropropagação de erros

Thiago Pereira Bueno

Full text
Author(s):	Thiago Pereira Bueno Total Authors: 1
Document type:	Doctoral Thesis
Press:	São Paulo.
Institution:	Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI)
Defense date:	2021-08-31
Examining board members:	Leliane Nunes de Barros; Fabio Gagliardi Cozman; Felipe Trevisan Jurgensen; Felipe Rech Meneguzzi; Scott Sanner
Advisor:	Leliane Nunes de Barros; Denis Deratani Mauá
Abstract
Deep Learning has achieved remarkable success in a range of complex perception tasks, games, and other real-world applications. At a high level, it can be argued that the main reason behind the astonishing performance of deep neural networks is the stochastic gradient descent method, which is based on the well-known error backpropagation algorithm. Inspired by the recent applications of deep learning, we propose to investigate the opportunities and challenges in adapting the backpropagation algorithm as a planning technique in continuous sequential decision-making problems. We make the key observation that if a differentiable model of the dynamics of a system can be made available, then an autonomous agent can leverage the advanced gradient-based optimizers developed in the context of learning algorithms to solve long-horizon planning problems. Besides reformulating the recently-proposed deterministic planning through backpropagation algorithm as a form of gradient-based trajectory optimization technique, we propose several extensions to the more general setting of stochastic decision processes in AI planning. In particular, we propose a framework to train Deep Reactive Policies offline for fast decision-making based on stochastic computation graphs and the re-parametrization trick. In addition, we investigate how the duality theory of information relaxation can be adapted to obtain a gradient-based online planning algorithm that interleaves optimization and execution. Empirical experiments show the effectiveness of our proposed approaches in a variety of sequential decision-making problems exhibiting nonlinear dynamics and stochastic exogenous events, such as path planning, multi-reservoir control and HVAC systems. (AU)

FAPESP's process:	16/22900-1 - Markov decision processes specified by probabilistic logic programming: representation and solution
Grantee:	Thiago Pereira Bueno
Support Opportunities:	Scholarships in Brazil - Doctorate (Direct)

Short URL