Video action recognition based on visual rhythm representation

Moreira, Thierry Pinheiro; Menotti, David; Pedrini, Helio

Texto completo
Autor(es):	Moreira, Thierry Pinheiro ; Menotti, David ; Pedrini, Helio Número total de Autores: 3
Tipo de documento:	Artigo Científico
Fonte:	JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION; v. 71, p. 14-pg., 2020-08-01.
Resumo
Advances in video acquisition and storage technologies have promoted a great demand for automatic recognition of actions. The use of cameras for security and surveillance purposes has applications in several scenarios, such as airports, parks, banks, stations, roads, hospitals, supermarkets, industries, stadiums, schools. An inherent difficulty of the problem is the complexity of the scene under usual recording conditions, which may contain complex background and motion, multiple people on the scene, interactions with other actors or objects, and camera motion. Most recent databases are built primarily with shared recordings on YouTube and with snippets of movies, situations where these obstacles are not restricted. Another difficulty is the impact of the temporal dimension since it expands the size of the data, increasing computational cost and storage space. In this work, we present a methodology of volume description using the Visual Rhythm (VR) representation. This technique reshapes the original volume of the video into an image, where two-dimensional descriptors are computed. We investigated different strategies for constructing the representation by combining configurations in several image domains and traversing directions of the video frames. From this, we propose two feature extraction methods, Naive Visual Rhythm (Naive VR) and Visual Rhythm Trajectory Descriptor (VRTD). The first approach is the straightforward application of the technique in the original video volume, forming a holistic descriptor that considers action events as patterns and formats in the visual rhythm image. The second variation focuses on the analysis of small neighborhoods obtained from the process of dense trajectories, which allows the algorithm to capture details unnoticed by the global description. We tested our methods in eight public databases, one of hand gestures (SKIG), two in first person (DogCentric and JPL), and five in third person (Weizmann, KTH, MuHAVi, UCF11 and HMDB51). The results show that the developed techniques are able to extract motion elements along with format and appearance information, achieving competitive accuracy rates compared to state-of-the-art action recognition approaches. (c) 2020 Elsevier Inc. All rights reserved. (AU)

Processo FAPESP:	17/12646-3 - Déjà vu: coerência temporal, espacial e de caracterização de dados heterogêneos para análise e interpretação de integridade
Beneficiário:	Anderson de Rezende Rocha
Modalidade de apoio:	Auxílio à Pesquisa - Temático


Processo FAPESP:	14/12236-1 - AnImaLS: Anotação de Imagem em Larga Escala: o que máquinas e especialistas podem aprender interagindo?
Beneficiário:	Alexandre Xavier Falcão
Modalidade de apoio:	Auxílio à Pesquisa - Temático


Processo FAPESP:	15/03156-7 - Reconhecimento de atividades em vídeos
Beneficiário:	Thierry Pinheiro Moreira
Modalidade de apoio:	Bolsas no Brasil - Doutorado

URL curto