Inferences about the prevalence of a given disease or condition can be drawn from results of a diagnostic test applied to a sample from the target population. For example, knowledge about disease clustering in tuberculosis (TB) can be estimated from the nearest genetic distance (NGD), a continuous test measuring the relatedness of TB strains. Most diagnostic tests, including the NGD test for TB clustering, are imperfect, which for continuous tests implies overlap in the measure between positive and negative cases. The resulting misclassification errors must be taken into account when estimating the prevalence. In creating models for continuous test results, one can use either a standard parametric form, such as normally distributed data through a bi-normal model, or attempt to fit a nonparametric model that makes fewer distributional assumptions. Nonparametric models include those based on Dirichlet Process priors and Polya trees. While Polya tree models have been applied to continuous diagnostic testing data, their properties in this context concerning prevalence estimation have not been rigorously examined. We extend this past work in three directions. First, we use simulations to learn about the performance of the model in practice. Second, we derive a method to calculate the Bayes Factor to select between a parametric and a nonparametric model. Third, we investigate the dependence of a fixed partition Polya tree model on the particular partition selected by comparison of the results with a random partition Polya tree model. Finally, we apply our methods to estimate the prevalence of TB clustering from NGD data.
Published November 2015 , 17 pages