Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Multimodal data fusion for sensitive scene localization

Texto completo
Autor(es):
Moreira, Daniel [1] ; Avila, Sandra [1, 2] ; Perez, Mauricio [1] ; Moraes, Daniel [1] ; Testoni, Vanessa [3] ; Valle, Eduardo [2] ; Goldenstein, Siome [1] ; Rocha, Anderson [1]
Número total de Autores: 8
Afiliação do(s) autor(es):
[1] Univ Estadual Campinas, Inst Comp, Campinas, SP - Brazil
[2] Univ Estadual Campinas, Sch Elect & Comp Engn, Campinas, SP - Brazil
[3] Samsung Res Inst Brazil, Campinas, SP - Brazil
Número total de Afiliações: 3
Tipo de documento: Artigo Científico
Fonte: Information Fusion; v. 45, p. 307-323, JAN 2019.
Citações Web of Science: 4
Resumo

The very idea of hiring humans to avoid the indiscriminate spread of inappropriate sensitive content online (e.g., child pornography and violence) is daunting. The inherent data deluge and the tediousness of the task call for more adequate approaches, and set the stage for computer-aided methods. If running in the background, such methods could readily cut the stream flow at the very moment of inadequate content exhibition, being invaluable for protecting unwary spectators. Except for the particular case of violence detection, related work to sensitive video analysis has mostly focused on deciding whether or not a given stream is sensitive, leaving the localization task largely untapped. Identifying when a stream starts and ceases to display inappropriate content is key for live streams and video on demand. In this work, we propose a novel multimodal fusion approach to sensitive scene localization. The solution can be applied to diverse types of sensitive content, without the need for step modifications (general purpose). We leverage the multimodality data nature of videos (e.g., still frames, video space-time, audio stream, etc.) to effectively single out frames of interest. To validate the solution, we perform localization experiments on pornographic and violent video streams, two of the commonest types of sensitive content, and report quantitative and qualitative results. The results show, for instance, that the proposed method only misses about five minutes in every hour of streamed pornographic content. Finally, for the particular task of pornography localization, we also introduce the first frame-level annotated pornographic video dataset to date, which comprises 140 h of video, freely available for downloading. (AU)

Processo FAPESP: 17/12646-3 - Déjà vu: coerência temporal, espacial e de caracterização de dados heterogêneos para análise e interpretação de integridade
Beneficiário:Anderson de Rezende Rocha
Modalidade de apoio: Auxílio à Pesquisa - Temático