Abstract
Computer Vision methods are used to extract information from images and videos, but their contextual elements are not always sufficient to extract correct and accurate information. In these cases, content from other sources and types of data such as audio and text, or other information external to the data, such as a priori knowledge, can be used to complement and enrich the context of th…