Medeiros, Beatriz Raposo de
Cabral, Joao Paulo
Meireles, Alexsandro R.
Baceti, Andre A.
Total Authors: 4
 Univ Sao Paulo, Sao Paulo - Brazil
 Trinity Coll Dublin, Dublin - Ireland
 Univ Fed Espirito Santo, Vitoria, ES - Brazil
 Murabei Data Sci, Sao Paulo - Brazil
Total Affiliations: 4
Web of Science Citations:
Speaking and singing are mechanisms of vocal production that have distinct articulatory properties and consequently produce sounds that are normally perceived as different. Several papers indicate that the tonal stability characteristic in singing associated with pre-defined fundamental frequency (f0) target tones, i.e., musical notes, is an important differentiating factor relatively to the observed non-predefined f0 target tones with greater f0 variability in speech. However, they are mainly grounded on perceptual experiments and little has been done to demonstrate this difference in terms of acoustic measurements. The aim of this paper is to compare measures of f0 variability between singing and speech to test the hypothesis that singing has lower f0 variability, as it would be expected according with the higher tonal stability in singing. In order to perform this comparison, we built a database with parallel singing and speech recordings. In a first experiment, these two signals were compared using the common statistical measures of f0 variability during linguistic units (syllable and phone), which have been used before in other works, specifically based on f0 variance. Although the results were not conclusive about the hypothesis, a more detailed analysis performed in this first experiment allowed us to find characteristic f0 effects in both speech and singing data that should be taken into account in our sub-sequent study of f0 stability. Thus, another experiment was conducted with the same recorded data but using a different statistical analysis of f0 variance to take into account these factors. In contrast with the first experiment, the results confirmed the hypothesis of higher f0 stability in singing. The final experiment in this work consisted of using a deep neural network classifier to test if speech and singing can be differentiated directly from the f0 values measured at syllable level, without using statistical measures. The results are consistent with the positive results of the second experiment. The findings of this research are important to better understand the acoustic properties of intonation that permit to distinguish spoken from sung sounds. It also provides cues to derive suitable f0 models for applications depending on the modalities used, such as synthesis or transformation of speech/singing signals. (AU)