Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Google Scholar, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Solvable null model for the distribution of word frequencies

Texto completo
Autor(es):
Fontanari‚ JF ; Perlovsky‚ LI
Número total de Autores: 2
Tipo de documento: Artigo Científico
Fonte: Physical Review E; v. 70, n. 4, p. 042901, 2004.
Resumo

Zipf's law asserts that in all natural languages the frequency of a word is inversely proportional to its rank. The significance, if any, of this result for language remains a mystery. Here we examine a null hypothesis for the distribution of word frequencies, a so-called discourse-triggered word choice model, which is based on the assumption that the more a word is used, the more likely it is to be used again. We argue that this model is equivalent to the neutral infinite-alleles model of population genetics and so the degeneracy of the different words composing a sample of text is given by the celebrated Ewens sampling formula [Theor. Pop. Biol. 3, 87 (1972)], which we show to produce an exponential distribution of word frequencies. (AU)

Processo FAPESP: 99/09644-9 - Evolução molecular teórica
Beneficiário:José Fernando Fontanari
Modalidade de apoio: Auxílio à Pesquisa - Temático