Abstract
Visual attention models take inspiration from how the human eye works, i.e., we can acquire a broad view of the scene we are looking at, but we tend to focus on a particular region only. Usually, such a region is the one that, for some reason, calls our attention to it. When we work with compressed images, e.g., JPG, PNG, and GIF, we first need to decode them before feeding Convolutional …