A Machine Learning Approach for Disease Genes Signatures

In the context of network medicine, disease genes, i.e. genes that have been experimentally associated to the onset or progression of a pathology, show a complex set of features that are not easily reduced to, and grasped by a simple network approach (e.g., studying centrality measures or clustering characteristics of the gene network). Here, to overcome such limitations and to exploit a larger set of informational attributes available, we analyze a sizeable integrated set of biological, ontological and topological features (including interaction data and GO categories, among others) related to different collections of disease genes (including, but not limited to sets related to several inflammatory and dysmetabolic diseases) via a comprehensive machine learning (ML) approach, in order to discover recurring patterns of attributes associated to families of disease genes. In this way the chances of revealing complex, hidden topological, ontological and statistical properties of the genes under scrutiny is wider and the derived "signature" can be heuristically used in a discovery process to find further yet unknown disease genes. We show hurdles, discriminating capabilities and main results in sorting out and in reconstructing the feature sets, in selecting the appropriate ML approach and in analyzing the datasets.
Tipo pubblicazione
Altri Autori
Annalisa Longo, Venkata Pochiraju, Daniele Santoni, Davide Vergni, Paolo Tieri