Advanced search
Start date
Betweenand

Bootstrap and model selection for stochastic chains with memory of variable length

Grant number: 09/09494-0
Support type:Scholarships in Brazil - Post-Doctorate
Effective date (Start): October 01, 2009
Effective date (End): September 30, 2011
Field of knowledge:Physical Sciences and Mathematics - Probability and Statistics
Principal Investigator:Jefferson Antonio Galves
Grantee:Matthieu Pierre Lerasle
Home Institution: Instituto de Matemática e Estatística (IME). Universidade de São Paulo (USP). São Paulo , SP, Brazil

Abstract

Resampling methods have been used with success in the practical implementation of statistical tools. Recently, non asymptotic results have been obtained theoretically, in particular in modelselection, leading to great improvements of the practical procedures. We propose to study rigorously some resampling methods for chains with memory of variable length. These chains,introduced by Rissanen in 1983, have been intensively studied since then. This is due not only to the mathematical interest of this new class of processes, but also to its possibilities as models for scientific data coming from domains as different as biology and linguistics. From a statistical point of view, given a sample the basic question is to retrieve the smallest contexttree associate to a chain with memory of variable length which best fits the sample. Several methods have been proposed to achieve this goal, but they all depend on unknown constants or onunknown properties of the estimator. Resampling methods seems particularly suitable to handle this kind of difficulty.Besides its intrinsic interest as a theoretical statistics project, this proposal has also an interdisciplinary and applied aspect, motivated by the linguistic challenge of retrieving rhythmicfeatures from a written text. Preliminary studies strongly suggests that context trees are good candidates as signatures of the rhythmic classes whose existence has been conjectured in thelinguistics literature. The practical goal of this project is to apply the new theoretical results obtained as tools to analyse linguistic samples, trying to obtain new evidences, with arigorous statistical methodology, supporting the linguistic conjecture of the existence of rhythmic classes of languages. (AU)

Scientific publications (5)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
LERASLE, MATTHIEU; TAKAHASHI, DANIEL Y. Sharp oracle inequalities and slope heuristic for specification probabilities estimation in discrete random fields. BERNOULLI, v. 22, n. 1, p. 325-344, FEB 2016. Web of Science Citations: 3.
GALLO, S.; LERASLE, M.; TAKAHASHI, D. Y. Markov Approximation of Chains of Infinite Order in the (d)over-bar-metric. Markov Processes and Related Fields, v. 19, n. 1, p. 51-82, 2013. Web of Science Citations: 4.
LERASLE, MATTHIEU. Optimal model selection in density estimation. ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, v. 48, n. 3, p. 884-908, AUG 2012. Web of Science Citations: 11.
LERASLE, MATTHIEU. OPTIMAL MODEL SELECTION FOR DENSITY ESTIMATION OF STATIONARY DATA UNDER VARIOUS MIXING CONDITIONS. ANNALS OF STATISTICS, v. 39, n. 4, p. 1852-1877, AUG 2011. Web of Science Citations: 8.
LERASLE, MATTHIEU; TAKAHASHI, DANIEL Y. An oracle approach for interaction neighborhood estimation in random fields. ELECTRONIC JOURNAL OF STATISTICS, v. 5, p. 534-571, 2011. Web of Science Citations: 4.

Please report errors in scientific publications list by writing to: cdi@fapesp.br.