Large-Scale Fully-Unsupervised Re-Identification

Bertocco, Gabriel; Andalo, Fernanda; Boult, Terrance E.; Rocha, Anderson

Texto completo
Autor(es):	Bertocco, Gabriel ; Andalo, Fernanda ; Boult, Terrance E. ; Rocha, Anderson Número total de Autores: 4
Tipo de documento:	Artigo Científico
Fonte:	IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE; v. 7, n. 2, p. 14-pg., 2025-04-01.
Resumo
Fully-unsupervised Person and Vehicle Re-Identification have received increasing attention due to their broad applicability in areas such as surveillance, forensics, event understanding, and smart cities, without requiring any manual annotation. However, most of the prior art has been evaluated in datasets that have just a couple thousand samples. Such small-data setups often allow the use of costly techniques in terms of time and memory footprints, such as Re-Ranking, to improve clustering results. Moreover, some previous work even pre-selects the best clustering hyper-parameters for each dataset, which is unrealistic in a large-scale fully-unsupervised scenario. In this context, this work tackles a more realistic scenario and proposes two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each iteration without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n(2)) to O(kn) with k << n . To avoid the need for pre-selection of specific hyper-parameter values for the clustering algorithm, we also present a novel scheduling algorithm that adjusts the density parameter during training, to leverage the diversity of samples and keep the learning robust to noisy labeling. Finally, due to the complementary knowledge learned by different models in an ensemble, we also introduce a co-training strategy that relies upon the permutation of predicted pseudo-labels, among the backbones, with no need for any hyper-parameters or weighting optimization. The proposed methodology outperforms the state-of-the-art methods in well-known benchmarks and in the challenging large-scale Veri-Wild dataset, with a faster and memory-efficient Re-Ranking strategy, and a large-scale, noisy-robust, and ensemble-based learning approach. (AU)

Processo FAPESP:	22/02299-2 - Aprendizado auto-supervisionado para biometria e outras aplicações
Beneficiário:	Gabriel Capiteli Bertocco
Modalidade de apoio:	Bolsas no Exterior - Estágio de Pesquisa - Doutorado Direto


Processo FAPESP:	19/15825-1 - Mineração de pessoas, objetos e lugares de interesse em fontes heterogêneas de dados
Beneficiário:	Gabriel Capiteli Bertocco
Modalidade de apoio:	Bolsas no Brasil - Doutorado Direto


Processo FAPESP:	23/12865-8 - Horus: técnicas de inteligência artificial para detecção e análise de realidades sintéticas
Beneficiário:	Anderson de Rezende Rocha
Modalidade de apoio:	Auxílio à Pesquisa - Temático

URL curto