Practitioners often use data collected form complex surveys (such as labour force and health surveys involving stratified cluster sampling) to fit logistic regression and other models of interest. A great deal of effort over the last two decades has been spent on developing methods to analyze survey data that take account of design features. Specialized programs, such as SUDAAN and WESVAR, are also available to implement some of these methods. But these methods require additional information such as survey weights, design effects or cluster identification of micro data. Inverse sampling (Hinkins et al., Survey Methodology, 1977) provides an alternative approach by undoing the complex data structures so that standard methods can be applied. Repeated subsamples with simple random structure are drawn and each sub-sample analyzed by standard methods and combined to increase the efficiency. This method has also the potential to preserve confidentiality of micro data file, although computer intensive. A drawback of the method is that it can lead to biased estimates of regression parameters when the sub-sample sizes are small (as in the case of stratified cluster sampling). In this paper, we propose an estimating equation approach that combines the subsamples before estimation and thus leads to nearly unbiased estimates of regression parameters regardless of sub-sample sizes. Also, the method is computationally less intensive than the original method. We apply the method to cluster correlated data generated from a nested error linear regression model to illustrate its advantages. We also study a bio-statistics application to clustered data when the cluster size is related to response. Standard Liang-Zeger method for clustered data is not applicable in this case. We apply the combined estimating equation approach as well as a mean estimating equation approach to this problem and obtain valid inference. Illustrations of the new methods are provided.
Group for Research in Decision Analysis