Advanced search
Start date

Deep Boltzmann machines for event recognition in videos

Grant number: 19/07825-1
Support type:Scholarships in Brazil - Master
Effective date (Start): May 01, 2019
Effective date (End): April 30, 2021
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Cooperation agreement: Microsoft Research
Principal Investigator:João Paulo Papa
Grantee:Mateus Roder
Home Institution: Faculdade de Ciências (FC). Universidade Estadual Paulista (UNESP). Campus de Bauru. Bauru , SP, Brazil
Company:Universidade Estadual Paulista (UNESP). Campus de Rio Claro. Instituto de Geociências e Ciências Exatas (IGCE)
Associated research grant:17/25908-6 - Weakly supervised learning for compressed video analysis on retrieval and classification tasks for visual alert, AP.PITE


This research project takes into account the problem of event recognition in videos, which covers various domains such as monitoring and security, medicine, high-performance industry, and smart houses. Usually, some of these domains are treated with machine learning techniques, highlighting the use of deep learning, capable of generating enough accurate answers from a large set of labeled data, that is, they use the supervised learning paradigm. That said, the scientific community has a part of its efforts focused on techniques that employ the unsupervised learning paradigm, that is, unlabeled data used to extract patterns and deep features in the various problems. However, for some of these tasks, we usually have at least a small amount of labeled data, which its use as a "tool" may positively drive the whole learning process. In this project, we intend to investigate the analysis, retrieval, and classification of videos in the compressed domain using small datasets for training. The main goal of this project is to examine Deep Boltzmann Machines (DBMs), capable of analyzing compressed video sequences and extracting features to feed supervised classifiers. The challenge of the research is to make use of DBMs to investigate, represent and classify videos using restricted labeled data. The proposed approach aims to explore the maximum amount of information available to make the approach appropriate to operate with small sets of training data. We intend to explore: (I) Representations of deep learning; (II) Unsupervised contextual measures and; (III) Fusion techniques, to increase the initially labeled data. The first challenge involves the analysis and representation of videos in the compressed domain, using deep learning techniques. Based on these representations, we intend to investigate strategies to expand the training sets using unsupervised contextual measures. Given the labeled sets obtained, merging strategies will be used to combine several classification methods. Although the methods that will be investigated can be used in several domains, we intend to select domains to validate the proposed approaches, considering the existence of datasets available to carry out experimental evaluations.