Development and integration of a web server system based on smartphones with cloud...
Finger and hand movement recognition and classification aiming at hand prosthesis ...
![]() | |
Author(s): |
Eric Tavares Pereira Santos
Total Authors: 1
|
Document type: | Master's Dissertation |
Press: | São Paulo. |
Institution: | Universidade de São Paulo (USP). Escola Politécnica (EP/BC) |
Defense date: | 2003-08-25 |
Examining board members: |
Idágene Aparecida Cestari;
Cinthia Itiki;
Lucas Antonio Moscato
|
Advisor: | Idágene Aparecida Cestari |
Abstract | |
Minimally invasive surgeries are normally performed with the assistance of an endoscope, that allows the visualization of the patient\'s internal anatomy. The use of a robotic system for automated endoscope positioning requires a simple and intuitive command interface which allows the surgeon himself to determine the camera movements. A speech recognition system for endoscope positioning must recognize spoken commands in real-time and classify them correctly. In this work, three isolated-word speech recognition algorithms were developed. Two of them are based on linear predictive coding autoregressive modelling (LPC), using different time alignment and euclidean distance (LPC-LE) and Itakura distance (LPC-DI) as spectral distortion measures. The third algorithm is based on the commercial IBM SMAPI programming library. A serial communication protocol was developed for integration of the speech recognition command interface and a low-level control system. The algorithms were tested using 22 voice commands recorded from 42 male and female volunteers. The performance of the algorithms as a function of their input parameters was measured by training them for each individual speaker (speaker-dependent mode) and without training for individual speakers (speaker-independent mode) and using vocabularies with and without phonetic and spectral ambiguities. The algorithms LPC-LE and LPC-DI were evaluated varying signal sampling frequency and model order. Both algorithms showed recognition rates near 94% for speaker-dependent mode. The best performance of the LPC-LE algorithm was obtained for frequencies between 3.7 and 7.4 kHz, with lowest processing time of 0.18 s. The best performance of the LPC-DI algorithm was obtained between 3.7 and 22.1 kHz, using model of order 10, 16 and 20, with lowest processing time of 1 s. The SMAPI-based algorithm was evaluated against its rejection threshold, its recognition speed and the signal-to-noise ratio of the voice signals.This algorithm identified about 93% of the commands correctly in the speaker-independent mode and about 98.5% in the speaker-dependent mode. The mean processing time was about 0.53 s for the recognition speed configurations studied and its performance was not affected by signal-to-noise ratios up to 45 dB. The presence of phonetically-similar words in the vocabulary increased error rates for the three algorithms developed. (AU) |