We consider regression experiments involving a response variable and a large number of predictor variables, many of which may be irrelevant for the prediction of the response and thus need to be removed before predicting the response from the predictors. Similarly, the variables that are related to the response need to be selected and their relationship to the response analyzed. This paper uses local linear methods with bandwidths chosen to provide a high probability of selecting the relevant variables. Our approach avoids the curse of dimensionality by basing bandwidth selection on a local signal to noise ratio, called efficacy, which automatically and adaptively selects relatively large local neighborhoods. We develop an algorithm called EARTH (Efficacy Adaptive Regression Tube Hunting) based on the conditional expectation of the response given all but one of the predictor variables, and we derive some of its properties. Computer simulations show that EARTH successfully and efficiently selects the relevant variables in situations with a large number of irrelevant predictor variables for a variety of models. When it is combined with the model selection and prediction procedure MARS or the tree-based prediction procedure GUIDE, the combinations lead to improved prediction accuracy. This is joint work with Shijie Tang and Kam Tsui.
Group for Research in Decision Analysis