Groupe d’études et de recherche en analyse des décisions

G-2017-40

Forestogram: A visualization framework for hierarchical biclustering

, et

Many biological datasets such as microarrays, metabolomics, and proteomics involve observations (or subjects) in rows, and attributes (or genes, metabolites, proteins) in columns. Often simultaneous grouping of rows and columns, i.e. biclustering, is desired. Each bicluster consists of a group of observations highly correlated in a group of attributes. Despite great efforts on developing biclustering algorithms, a proper visualization seems to be lacking in the literature. A visualization tool helps practitioners to understand how biclusters evolve. Here we provide this tool using forestogram. Forestogram combines rows or columns iteratively towards constructing a forest over a collection of dendrograms with a common root. We develop a simple strategy for extracting natural biclusters by cutting the forest using a simple information criterion. The effectiveness of our technique is tested on simulated data, and on real data.

, 16 pages