Digital videos have become the medium of choice for a growing number of people communicating via Internet and their mobile devices. Over the past decade, world has witnessed an explosive growth in the amount of video data fostered by astonishing technological developments. In this scenario, there is a growing demand for efficient systems to reduce the work and information overload for people. Making efficient use of video content requires the development of intelligent tools capable to understand videos in a similar way as humans do. This has been the goal of a quickly evolving research area known as video understanding. A crucial step toward video understanding is to understand human actions and activities. One of the main issues concerning the human activity understanding problem is the extraction of useful information from video content. Recently, deep learning has been successfully used to train discriminative models able to learn powerful and interpretable features for understanding visual content. However, due to the temporal dimension, training deep learning models on video data faces a number of practical difficulties, like limited training samples and high computational cost. The goal of this research proposal is to tackle the computational overhead of training a deep learning model in order to improve its capacity to handle video data and advance the state-of-the-art on human activity understanding. For this, we plan to exploit relevant information pertaining to visual content available in the compressed representation used for video storage and transmission. This enables to save high computational load in full decoding the video stream and therefore greatly speed up the training time, which has become a big bottleneck of deep learning.
News published in Agência FAPESP Newsletter about the scholarship: