Advanced search
Start date
Betweenand


Generalization and robustness: learning in neural networks in the presence of noise

Full text
Author(s):
Roberta Simonetti
Total Authors: 1
Document type: Doctoral Thesis
Press: São Paulo.
Institution: Universidade de São Paulo (USP). Instituto de Física (IF/SBI)
Defense date:
Examining board members:
Nestor Felipe Caticha Alfonso; Carlos Eugenio Imbassahy Carneiro; Silvio Roberto de Azevedo Salinas; Rita Maria Zorzenon dos Santos; Alba Graciela Rivas de Theumann
Advisor: Nestor Felipe Caticha Alfonso
Abstract

In this work online supervised learning is investigated with emphasis on the generalization abilities of feedforward neural networks. The study of optimal learning algorithms, in the sense of generalization, is extended to two different classes of architectures; the tree parity machine (PM) with K hidden units and the reverse wedge perceptron (RWP), a single layer machine with a non monotonic transfer function. The role of noise is of fundamental importance in learning theory, and we study noise processes which can be parametrized by a single quantity, the noise level. For the PM we analize learning in the presence of multiplicative or output noise. The optimal algorithm is far superior than previous learning algorithms, such as the Least Action Algorithm (LAA), since for example, the generalization error\'s decay is proportional to l /p instead of l/\'p POT. 1/3\' for the LAA, after p examples have been used for training. Furthermore there is no critical noise level, beyond which no generalization ability is attainable, as is the case for the LAA. For the RW perceptron in addition to multiplicative noise we also consider additive noise. The optimal algorithm modulation function and the learning curves are analized. Optimal learning requires using certain usually unavailable parameters. In this case, we study the influence that misevaluation of the noise levels has on the learning curves. The results are presented in terms of what we have called Robustness Phase Diagrams (RPD), in a space of real noise level against assumed noise level. The RPD boundary lines separate between different dynamical behaviours. Among the most interesting properties, we have found the universality of the RPD for multiplicative noise, since it is exactly the same for the PM, RWP and the tree committee machine. However this universality does not hold for the additive noise case, since RPD\'s are shown to be architecture dependent. (AU)