Network-constrained bi-clustering of patients and multi-scale omics data

Abstract
Recent advances in omics profiling technologies yield ever larger amounts of molecular data. Yet, the elucidation of the molecular basis of human diseases remains an unsolved challenge. The analysis of multi-scale omics data requires integrative bioinformatic tools capable of multi-modal computing and multi-scale modeling. Unsupervised learning approaches are frequently employed to identify biomolecules and pathways involved in specific diseases. However, classical clustering is hardly suitable to analyse, e.g., gene expression data conjointly with experimental conditions and molecular pathway information. Since we are interested in gene sets displaying a consistent behaviour across different conditions, both genes and samples have to be clustered simultaneously employing models respecting the heterogeneity of such multi-scale data. To this end, we aim for extending bi-clustering approaches by including information encoded in biological networks. Methods BiCluE (Sun et al. 2013) has been the first software package tackling the weighted bi-cluster editing problem. It pro- vides an exact algorithm based on fixed-parameter tractability (FPT). The bi-cluster editing problem is formulated as a bi-partite graph connecting features and samples. We then transform this graph into a disjunct set of bi-cliques while minimizing the editing costs (e.g., number of edges to be added/removed). Even though BiCluE yields potent solutions in many scenarios such as novel genotype-phenotype associations in GWAS data, it does not consider intrinsic feature relationships, e.g., interactions between proteins or regulatory interactions between genes. Therefore, we propose an extension of the BiCluE algorithm by mapping molecular interaction networks onto the bi-partite graph such that we impose constraints that force bi-cliques to respect intrinsic feature relationships. This reduces the computational com- plexity from O(4k) to O(2k), with k being the cluster editing costs due to a drastic reduction of the search space. Ad- ditionally, this model straight-forwardly allows incorporation of multi-scale data depending on the integrated network. Results and conclusions We demonstrate the validity and efficiency of our extension to BiCluE on simulated data. In general, such network- constrained bi-clustering approaches do not only allow for more stable feature selection, they also lead to more coherent functional enrichment, improving interpretability with respect to systems biology and systems medicine while being straight-forwardly applicable to multi-scale omics data.
Anno
2018
Autori IAC
Tipo pubblicazione
Altri Autori
Olga Lazareva, Simon J. Larsen, Paolo Tieri, Jan Baumbach, Tim Kacprowski