### G-2005-29

# A Robust Prediction Error Criterion for Pareto Modeling of Upper Tails

## Debbie J. Dupuis and Maria-Pia Victoria-Feser

Estimation of the Pareto tail index from extreme order statistics is an important
problem in many settings such as income distributions (for inequality measurement),
finance (for the evaluation of the value at risk), and insurance (determination of loss
probabilities) among others. The upper tail of the distribution in which the data are
sparse is typically fitted with a model such as the Pareto model from which quantities
such as probabilities associated with extreme events are deduced. The success of this
procedure relies heavily not only on the choice of the estimator for the Pareto tail index
but also on the procedure used to determine the number *k* of extreme order statistics
that are used for the estimation. For the choice of *k* most of the known procedures are
based on the minimization of (an estimate of) the asymptotic mean square error of the
maximum likelihood (or Hill) estimator (MLE) which is the traditional choice for the
estimator of the Pareto tail index. In this paper we question the choice of the estimator
and the resulting procedure for the determination of *k*, because we believe that the
model chosen to describe the behaviour of the tail distribution can only be considered
as approximate. If the data in the tail are not exactly but only approximately Pareto,
then the MLE can be biased, i.e. it is not robust, and consequently the choice of *k*
is also biased. We propose instead a weighted MLE for the Pareto tail index that
downweights data "far" from the model, where "far" will be measured by the size of
standardized residuals constructed by viewing the Pareto model as a regression model.
The data that are downweighted this way do not systematically correspond to the
largest quantiles. Based on this estimator and proceeding as in Ronchetti and Staudte
(1994), we develop a robust prediction error criterion, called *RC*-criterion, to choose *k*. In simulation studies, we will compare our estimator and criterion to classical ones
with exact and/or approximate Pareto data. Moreover, the analysis of real data sets
will show that a robust procedure for selection, and not just for estimation, is needed.

Published **March 2005**
,
30 pages