Advanced search
Start date
Betweenand


Planning in stochastic computation graphs: solving stochastic nonlinear problems with backpropagation

Full text
Author(s):
Thiago Pereira Bueno
Total Authors: 1
Document type: Doctoral Thesis
Press: São Paulo.
Institution: Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI)
Defense date:
Examining board members:
Leliane Nunes de Barros; Fabio Gagliardi Cozman; Felipe Trevisan Jurgensen; Felipe Rech Meneguzzi; Scott Sanner
Advisor: Leliane Nunes de Barros; Denis Deratani Mauá
Abstract

Deep Learning has achieved remarkable success in a range of complex perception tasks, games, and other real-world applications. At a high level, it can be argued that the main reason behind the astonishing performance of deep neural networks is the stochastic gradient descent method, which is based on the well-known error backpropagation algorithm. Inspired by the recent applications of deep learning, we propose to investigate the opportunities and challenges in adapting the backpropagation algorithm as a planning technique in continuous sequential decision-making problems. We make the key observation that if a differentiable model of the dynamics of a system can be made available, then an autonomous agent can leverage the advanced gradient-based optimizers developed in the context of learning algorithms to solve long-horizon planning problems. Besides reformulating the recently-proposed deterministic planning through backpropagation algorithm as a form of gradient-based trajectory optimization technique, we propose several extensions to the more general setting of stochastic decision processes in AI planning. In particular, we propose a framework to train Deep Reactive Policies offline for fast decision-making based on stochastic computation graphs and the re-parametrization trick. In addition, we investigate how the duality theory of information relaxation can be adapted to obtain a gradient-based online planning algorithm that interleaves optimization and execution. Empirical experiments show the effectiveness of our proposed approaches in a variety of sequential decision-making problems exhibiting nonlinear dynamics and stochastic exogenous events, such as path planning, multi-reservoir control and HVAC systems. (AU)

FAPESP's process: 16/22900-1 - Markov decision processes specified by probabilistic logic programming: representation and solution
Grantee:Thiago Pereira Bueno
Support Opportunities: Scholarships in Brazil - Doctorate (Direct)