Groupe d’études et de recherche en analyse des décisions

Combining forward selection and shrinkage techniques for variable selection in regression and classification

Subhashis Ghosal

Variable selection is a major statistical issue in contemporary data analysis because modern data typically involve a lot of predictors, many of which are nearly irrelevant. Various sparse regression methods such as the LASSO have been developed in the literature to estimate the regression function and make predictions by setting many regression coefficients to zero using specially devised penalty functions. We propose new variable selection techniques for regression in high dimensional linear models based on a forward selection version of the LASSO (or its variants) to be called Forward Iterative Regression and Shrinkage Technique (FIRST). We exploit the fact that LASSO-type methods have closed form solutions when the predictor is one-dimensional. The explicit formula is then repeatedly used in an iterative fashion until convergence occurs. A simulation study shows that our method works better for extremely sparse high dimensional linear models. We apply the method in a gene expression study. We also develop similar iterative procedures for classification problems based on a variant of the support vector machines (SVM). In this case, even for a one-dimensional predictor, the classifier does not have a closed form solution. However, a careful study of the objective function reveals an efficient algorithm for locating the optimal solution, which can then be iterated as in the case of linear regression, leading to a new procedure, to be called the CLASsification and Selection using Iterative Cycles (CLASSIC). We consider several variations of CLASSIC and compare their performance with other standard classification algorithms such as L1-SVM, SCAD-SVM, LASSO and penalized logistic regression through simulations. Although CLASSIC generally needs more computational time, our simulations show that the misclassification rate of CLASSIC is significantly smaller than its competitors, and generally it leads to more parsimonious models. The talk is based on joint work with Wookyeon Hwang and Hao Helen Zhang