Advanced search
Start date
Betweenand

Separation of audio objects based on the principle of sparsity

Grant number: 22/16168-7
Support Opportunities:Scholarships in Brazil - Doctorate (Direct)
Start date: May 01, 2023
End date: April 30, 2024
Field of knowledge:Engineering - Electrical Engineering - Telecommunications
Principal Investigator:Bruno Sanches Masiero
Grantee:Arthur Nicholas dos Santos
Host Institution: Faculdade de Engenharia Elétrica e de Computação (FEEC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil

Abstract

In Scene-Based Audio (SBA) format, it is possible to determine the number and directionof sources in a sound field using spatial filters, or masks, to estimate the sound levels thatarrive from different angles on a microphone array, similar to the human perception ofauditory scenes. However, this technique usually results in low resolution outputs, dueto the small number of microphones that normally constitute these arrays. Formerly,many methods have been proposed to enhance the quality of SBA without increasingthe number of microphones in arrays, such as deconvolution techniques to eliminate theeffects of spectral spreading, or the use of regularization techniques to promote sparsity,which is usually a valid assumption for sound fields containing only a few sources. Whilethese algorithms promise a better separation of audio objects, if they are employed onsparse scenes recorded in reverberant environments, results won't be so exciting, sincereverberation reduces the detected sparsity of a sound field. Recently, many researchersresort to Deep Learning (DL) techniques to enhance the separation of audio objects inthese conditions, however, these models only produce monaural outputs. Concurrently,recently proposed analytical methods of Time-Frequency Spatial (TFS) masking can beused to enhance SBA recordings whilst preserving its spatial information. Hence, thisstudy aims at the separation of audio objects from SBA captured with a Rigid SphericalMicrophone Array (RSMA), that yields for Spherical Harmonics Decomposition (SHD),which in turn yields for Sparse Plane Wave Decomposition (SPWD), to identify diffusecomponents of sound fields, thereby estimating the Direction Of Arrival (DOA) of directsound sources. Following this, an enhancement stage is proposed, using an Artificial NeuralNetwork (ANN) to estimate TFS masks that preserve binaural cues, therefore preservingthe spatial information without the constraints of analytical methods. The separationstage is performed with a State-of-the-Art (SOTA) ANN, that produces monaural outputs.Thus, a final stage of 3-D audio synthesis is also performed, using the DOA estimatesobtained from the SPWD as metadata for the audio objects, resulting in the conversionfrom SBA format to both Object-Based Audio (OBA) or enhanced SBA, so to speak. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)