One of the main subjects in machine learning is data clustering, which aims at finding clusters that describe a set of objects in such a way that the intra-cluster similarity and inter-cluster dissimilarity are both maximized. In general, such techniques are based on measures that take into account all the available features of a dataset. However, in several real-world situations the clusters contained in the data are defined only by a subset of all features. For this reason, the biclustering paradigm aims at providing algorithms capable of simultaneously clustering the rows and columns of a data matrix in order to find homogeneous submatrices. This paradigm became widely used after its importance for gene expression data analysis was shown. However, one of the main problems of the biclustering field is the fact that there is no universal definition of which patterns define a bicluster. So, each algorithm relies on different heuristics, mathematical formulations and assumptions, which implies in different outcomes for the same input data. Therefore, the development of techniques that are capable of combining the solutions of several different algorithms that search forsimilar patterns and/or the same algorithm subject to different experimental parameters can be animportant step in order to provide more meaningful and robust results, which may not be identifiedby the individual application of a single algorithm.
News published in Agência FAPESP Newsletter about the scholarship: