Advanced search
Start date

Learning representations through deep generative models on video


Automatic media generation (or synthesis) is a field that had an incredible boost in recent years, with the advent of deep generative models. Nowadays, neural networks can create text, images and videos based on class labels or other media. The common task is to generate content. However, we can take advantage of the learned feature representations on these tasks to understand the relevant features and as a source of interpretability. That is, what features are relevant for the creation of different content, and how can we interpret what the models are learning or paying attention too. In this project, we propose to investigate how to learn efficient and rich representations for video data based on deep generative tasks. We focus on two particular problems for learning effective representations. The first one is semantic transfer between data modalities, in particular video and (written) language. And the second one is disentanglement within the same domain, that is, separate different variations and modalities of the data. The separation of semantics (intra and inter domain) will allow us to better understand the type of features that are learned by the different architectures on these tasks. Our objective is to train the deep generative models on different video reconstruction tasks and study their learning capabilities. We will perform experiments on the existing benchmark datasets for the particular problems. (AU)