Advanced search
Start date
Betweenand

Data mining in song lyrics and development of predictive models of commercial success

Grant number: 12/12130-3
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: September 01, 2012
End date: December 31, 2012
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Reinaldo Alvarenga Bergamaschi
Grantee:Gabriel Massaki Wakano Bezerra
Host Institution: Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil

Abstract

This project will conduct research and development on algorithms for data mining in song lyrics and in the generation of a predictive model of commercial success. These algorithms will analyze a significant number of lyrics and extract metrics related to the words used, like example word frequency, word similarity, word patterns such as common sequences or co-locations of words, and word clusters. The set of such metrics for a given song will embody the song's lyric characteristics. Informally speaking, the goal is to devise and compute a set of metrics based on the lyrics, which will allow us to compare songs and artists, and establish the correlation between the metrics and the commercial success of a song. For example, considering only the frequency and types of words used in the lyrics, how would a Lady Gaga song compare with a Bob Dylan song? After the generation of metrics, the following objects will be to generate a predictive model, in the form of an equation, that will evaluate the potential commercial success of a song. The word metrics, together with a quantitative measure of the commercial success of the song, will be used to generate a model, using two approaches: polynomial regression, and machine learning. The student will learn and develop complex algorithms for text-based data mining, as well as model generation techniques based on regression and machine learning. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)