Busca avançada
Ano de início
Entree


Addressing the gap between current language models and key-term-based clustering

Texto completo
Autor(es):
Cabral, Eric M. ; Rezaeipourfarsangi, Sima ; Oliveira, Maria Cristina F. ; Milios, Evangelos E. ; Minghim, Rosane
Número total de Autores: 5
Tipo de documento: Artigo Científico
Fonte: PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023; v. N/A, p. 10-pg., 2023-01-01.
Resumo

This paper presents MOD-kt, a modular framework designed to bridge the gap between modern language models and key-term-based document clustering. One of the main challenges of using neural language models for key-term-based clustering is the mismatch between the interpretability of the underlying document representation (i.e. document embeddings) and the more intuitive semantic elements that allow the user to guide the clustering process (i.e. key-terms). Our framework acts as a communication layer between word and document models, enabling key-term-based clustering in the context of document and word models with a flexible and adaptable architecture. We report a comparison of the performance of multiple neural language models on clustering, considering a selected range of relevance metrics. Additionally, a qualitative user study was conducted to illustrate the framework's potential for intuitive user-guided quality clustering of document collections. (AU)

Processo FAPESP: 18/22214-6 - Rumo à convergência de tecnologias: de sensores e biossensores à visualização de informação e aprendizado de máquina para análise de dados em diagnóstico clínico
Beneficiário:Osvaldo Novais de Oliveira Junior
Modalidade de apoio: Auxílio à Pesquisa - Temático