The data generated by large scale sequencing projects is complex, high-dimensional, multivariate discrete data. In studies of evolutionary biology, the parameter space of evolutionary trees is an unusual additional complication from a statistical perspective. In this talk I will briefly introduce the general approaches to utilizing sequence data in phylogenetic inference. A particular issue of interest in phylogenetic inference is assessments of uncertainty about the true tree or structures that might be present in it. The primary way in which uncertainty is assessed in practice is through bootstrap support (BP) for splits, large values indicating strong support for the split. A difficulty with this measure, however, has been deciding how large is large enough. We discuss the interpretation of BP and ways of adjusting it so that it has an interpretation similar to a p-value. A related issue, having to do with the behaviour of methods when data are generated from a star tree, gives rise to an interesting example in which, due to the unusual statistical nature, Bayesian and maximum likelihood methods give strinkingly different results, even asymptotically.
Groupe d’études et de recherche en analyse des décisions