Audio-Visual Speech Processing by Machine Learning


This research plan addresses a common basis for a number of areas in signal processing such as speech analysis, speech coding and audio coding, speech recognition and audio feature recognition as well as source separation with regularizations to carry out adjustments suitable to the desired application. Traditionally, speech analysis, in addition to its own importance, also provides signal representations and model parameters that are necessary to the other areas. In this role it is losing appeal with deep learning and parallels are set to be established in order to bring about some interpretation. Beyond usual types of time-frequency decomposition and modification and autoregressive analysis, new algorithms will be explored and proposed based on machine learning and deep learning for enhancement, separation and synthesis of speech and audio signals, partially or totally replacing traditional analysis. Research will focus on generative machines capable of handling video signals and time series as well.Additionally, the parameters and representations of the speech signal will also be used to model and elaborate non-intrusive speech quality metrics; for this purpose, the speech signal is degraded using different communication system parameters. (AU)

