Advanced search
Start date

Improving the effectiveness of similarity operators in medical data by means of diversity considerations

Grant number: 21/06564-0
Support type:Scholarships in Brazil - Post-Doctorate
Effective date (Start): August 01, 2021
Effective date (End): July 31, 2022
Field of knowledge:Health Sciences - Medicine - Medical Radiology
Principal researcher:Paulo Mazzoncini de Azevedo Marques
Grantee:Marcos Vinicius Naves Bêdo
Home Institution: Faculdade de Medicina de Ribeirão Preto (FMRP). Universidade de São Paulo (USP). Ribeirão Preto , SP, Brazil
Associated research grant:16/17078-0 - Mining, indexing and visualizing Big Data in clinical decision support systems (MIVisBD), AP.TEM


Exploring large medical data repositories with distance-based query criteria is troublesome whenever the objects are too close to each other.For instance, previous studies suggest recovering too similar images may spoil the semantics of Content-based Medical Image Retrieval (CBMIR) applications as well as their usability in a clinical workflow. The main semantic drawback is too close images probably bring (nearly) no new information related to the query object, which impairs the decision-making analysis by either cluttering the relevance of the results or reinforcing the expert confirmation bias. Experts are also unable to explore similar yet diverse images in a single search, which may cause several feedback cycles or even induce the users to give up on the CBMIR query by presuming the repository does not include insightful answers. Due to the distance concentration phenomenon, a particular scenario where objects are expected to be too close to each other is in dense and high-dimensional spaces, which is the domain of representations generated by deep-learning-based descriptors such as word2vec or autoencoders. Classical similarity-based operators, e.g., similarity range or k-Nearest Neighbors, lose their bias for discerning "close" and "far" objects in dense and high-dimensional spaces, which implies that they not only struggle in the ranking of closest objects but also that their result sets are likely to contain several elements similar among themselves. The adding of a "diversity degree" into content-based retrieval is an alternative for enhancing distance-based operators, which may use both similarity and diversity to extend the meaning of proximity. In this project, we model diversity following the dynamic separations produced by distance influence criteria in such a way the recovered objects are selected not just by similarity to the query element but also by dissimilarity from each other. Although diversity has been employed as a complement for similarity-driven tasks, e.g., data visualization, its use for querying high-dimensional medical data in practice is still an open issue. We aim at fulfilling that gap by extending current similarity operators so that they can handle medical data embedded in high-dimensional and dense spaces while avoiding the retrieval of a vast amount of too similar objects. The project hypothesis is that influence-based diversity queries may soften the distance concentration problem by adding new dissimilarity criteria beyond the "closeness to the query object" into the search. Accordingly, the first expected contribution is the design of a full and seamless integration of influence-based diversity into existing similarity operators, including the proposal of efficient algorithms that can be used in real-world CBMIR applications. The second expected contribution for the research project is the theoretical and empirical characterization of the new extended operators by observing the local intrinsic dimensionality (LID) within medical image datasets, i.e., the operators' best, average and worst behaviors, biases, and limitations. The research effort will be directed, in particular, (i) to address the tuning of diversity algorithms for varying LIDs, (ii) to compare both similarity and diversity-oriented queries regarding the content output and computational cost for different LID ranges within medical image datasets, (iii) to discuss the relationship between diversity and LID for determining whether or not the query answer is related to the amount of influence-based diversity found within medical image datasets, and (iv) to implement a real-world CBMIR application with a LID-oriented cost model for choosing the most suitable algorithms, and data structures for the execution of diversity-oriented queries. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items