Sistema de reconhecimento de comandos isolados de voz para aplicação em robótica cirúrgica.

Eric Tavares Pereira Santos

Full text
Author(s):	Eric Tavares Pereira Santos Total Authors: 1
Document type:	Master's Dissertation
Press:	São Paulo.
Institution:	Universidade de São Paulo (USP). Escola Politécnica (EP/BC)
Defense date:	2003-08-25
Examining board members:	Idágene Aparecida Cestari; Cinthia Itiki; Lucas Antonio Moscato
Advisor:	Idágene Aparecida Cestari
Abstract
Minimally invasive surgeries are normally performed with the assistance of an endoscope, that allows the visualization of the patient\'s internal anatomy. The use of a robotic system for automated endoscope positioning requires a simple and intuitive command interface which allows the surgeon himself to determine the camera movements. A speech recognition system for endoscope positioning must recognize spoken commands in real-time and classify them correctly. In this work, three isolated-word speech recognition algorithms were developed. Two of them are based on linear predictive coding autoregressive modelling (LPC), using different time alignment and euclidean distance (LPC-LE) and Itakura distance (LPC-DI) as spectral distortion measures. The third algorithm is based on the commercial IBM SMAPI programming library. A serial communication protocol was developed for integration of the speech recognition command interface and a low-level control system. The algorithms were tested using 22 voice commands recorded from 42 male and female volunteers. The performance of the algorithms as a function of their input parameters was measured by training them for each individual speaker (speaker-dependent mode) and without training for individual speakers (speaker-independent mode) and using vocabularies with and without phonetic and spectral ambiguities. The algorithms LPC-LE and LPC-DI were evaluated varying signal sampling frequency and model order. Both algorithms showed recognition rates near 94% for speaker-dependent mode. The best performance of the LPC-LE algorithm was obtained for frequencies between 3.7 and 7.4 kHz, with lowest processing time of 0.18 s. The best performance of the LPC-DI algorithm was obtained between 3.7 and 22.1 kHz, using model of order 10, 16 and 20, with lowest processing time of 1 s. The SMAPI-based algorithm was evaluated against its rejection threshold, its recognition speed and the signal-to-noise ratio of the voice signals.This algorithm identified about 93% of the commands correctly in the speaker-independent mode and about 98.5% in the speaker-dependent mode. The mean processing time was about 0.53 s for the recognition speed configurations studied and its performance was not affected by signal-to-noise ratios up to 45 dB. The presence of phonetically-similar words in the vocabulary increased error rates for the three algorithms developed. (AU)

Short URL