Lung diseases and COVID diagnosis from computed tomography images using explainabl...
System for robust reading of text in images using deep learning
Multimodal Models for Images and 3D Representations in a Unified Vision and Langua...
Grant number: | 17/16597-7 |
Support Opportunities: | Scholarships in Brazil - Doctorate |
Effective date (Start): | December 01, 2017 |
Effective date (End): | February 28, 2022 |
Field of knowledge: | Physical Sciences and Mathematics - Computer Science - Computer Systems |
Principal Investigator: | Gerberth Adín Ramírez Rivera |
Grantee: | Darwin Danilo Saire Pilco |
Host Institution: | Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil |
Associated scholarship(s): | 19/18678-0 - Semantic Segmentation Using Hourglass Learning Model, BE.EP.DR |
Abstract Semantic segmentation task aims to create a dense classification by labeling pixel-wise each object present in images or videos. Convolutional neural network (CNN) approaches have been proved useful by exhibiting the best results in this task. Some challenges remain, however, such as the low-resolution of feature maps and the loss of spatial precision, both produced in the last convolution layer of the CNNs. How to solve these problems and obtain consistent results is still an open problem on images and even more on videos; thus, making semantic segmentation on video a rather difficult problem. In this Ph.D. project, to solve these problems, we propose to create an hourglass-shaped CNN architecture to address the semantic segmentation task on video. Our proposed architecture is end-to-end trainable and extracts spatiotemporal information to discriminate between several object classes present in video. In this way, the final result of our proposed architecture is the generation of densely labeled videos. To achieve this goal we need to learn meaningful spatiotemporal features that differentiate the objects of the video (by learning convolution kernels) while remaining consistent within frame's variations, learn multidimensional up-sampling and fusion kernels that use the predictions of lower resolution levels and the existing spatiotemporal features to maintain the relations between voxels through the learned nonlinearities, and create an end-to-end learning framework (data augmentation and loss functions) that uses the existing tags (both densely annotated and bounding boxes) on video datasets to train the network. | |
News published in Agência FAPESP Newsletter about the scholarship: | |
More itemsLess items | |
TITULO | |
Articles published in other media outlets ( ): | |
More itemsLess items | |
VEICULO: TITULO (DATA) | |
VEICULO: TITULO (DATA) | |