Group for Research in Decision Analysis

A Flexible and Powerful Bayesian Hierarchical Model for ChIP-Chip Data

Raphaël Gottardo

Chromatin-immunoprecipitation microarrays (ChIP-chip) that enable researchers to identify regions of a given genome that are bound by specific DNA binding proteins present new challenges for statistical analysis due to the large number of probes, the high noise to signal ratio, and the spatial dependence between probes. We propose a method called BAC (Bayesian Analysis of ChIP-chip) to detect transcription factor bound regions, which incorporate the dependence between probes while making very little assumptions about the bound regions (e.g. length). BAC is robust to probe outliers with an exchangeable prior for the variances, which allows different variances for the probes but still shrink extreme empirical variances. Parameter estimation is carried out using Markov chain Monte Carlo and inference is based on the joint distribution of the parameters. Bound regions are detected using posterior probabilities computed from the joint posterior distribution of neighboring probes. We show that these posterior probabilities are well calibrated and can be used to obtain an estimate of the false discovery rate.

The method is illustrated using two publicly available ChIP-chip data sets containing 18 experimentally validated regions. We compare our method to three other baseline and commonly used techniques, namely the Wilcoxon's rank sum test, TileMap and HGMM. We found BAC and HGMM to perform best at detecting validated regions. However, HGMM appears to be very sensitive to probes outliers compared to BAC.