Manifold learning for spatial audio rendering

Felipe Leonel Grijalva Arévalo

Full text
Author(s):	Felipe Leonel Grijalva Arévalo Total Authors: 1
Document type:	Doctoral Thesis
Press:	Campinas, SP.
Institution:	Universidade Estadual de Campinas (UNICAMP). Faculdade de Engenharia Elétrica e de Computação
Defense date:	2018-07-05
Examining board members:	Luiz César Martini; Tiago Fernandes Tavares; Levy Boccato; Luiz Wagner Pereira Biscainho; Laurindo de Sousa Britto Neto
Advisor:	Luiz César Martini; Bruno Sanches Masiero
Abstract
The objective of binaurally rendered spatial audio is to simulate a sound source in arbitrary spatial locations through the Head-Related Transfer Functions (HRTFs). HRTFs model the direction-dependent influence of ears, head, and torso on the incident sound field. When an audio source is filtered through a pair of HRTFs (one for each ear), a listener is capable of perceiving a sound as though it were reproduced at a specific location in space. Inspired by our successful results building a practical face recognition application aimed at visually impaired people that uses a spatial audio user interface, in this work we have deepened our research to address several scientific aspects of spatial audio. In this context, this thesis explores the incorporation of spatial audio prior knowledge using a novel nonlinear HRTF representation based on manifold learning, which tackles three major challenges of broad interest among the spatial audio community: HRTF personalization, HRTF interpolation, and human sound localization improvement. Exploring manifold learning for spatial audio is based on the assumption that the data (i.e. the HRTFs) lies on a low-dimensional manifold. This assumption has also been of interest among researchers in computational neuroscience, who argue that manifolds are crucial for understanding the underlying nonlinear relationships of perception in the brain. For all of our contributions using manifold learning, the construction of a single manifold across subjects through an Inter-subject Graph (ISG) has proven to lead to a powerful HRTF representation capable of incorporating prior knowledge of HRTFs and capturing the underlying factors of spatial hearing. Moreover, the use of our ISG to construct a single manifold offers the advantage of employing information from other individuals to improve the overall performance of the techniques herein proposed. The results show that our ISG-based techniques outperform other linear and nonlinear methods in tackling the spatial audio challenges addressed by this thesis (AU)

FAPESP's process:	14/14630-9 - Machine learning for signal processing applied to spatial audio
Grantee:	Felipe Leonel Grijalva Arévalo
Support Opportunities:	Scholarships in Brazil - Doctorate

Short URL