Signalling pathway database usability: lessons learned

BACKGROUND: issues and limitations related to accessibility, understandability and ease of use of signalling pathway databases may hamper or divert research workflow, leading, in the worst case, to the generation of confusing reference frameworks and misinterpretation of experimental results. In an attempt to retrieve signalling pathway data related to a specific set of test genes, we queried and analysed the results from six of the major curated signalling pathway databases: Reactome, PathwayCommons, KEGG, InnateDB, PID, and Wikipathways. FINDINGS: although we expected differences - often a desirable feature for the integration of each individual query, we observed variations of exceptional magnitude, with disproportionate quality and quantity of the results. Some of the more remarkable differences can be explained by the diverse conceptual designs and purposes of the databases, the types of data stored and the structure of the query, as well as by missing or erroneous descriptions of the search procedure. To go beyond the mere enumeration of these problems, we identified a number of operational features, in particular inner and cross coherence, which, once quantified, offer objective criteria to choose the best source of information. CONCLUSIONS: in silico biology heavily relies on the information stored in databases. To ensure that computational biology mirrors biological reality and offers focused hypotheses to be experimentally validated, coherence of data codification is crucial and yet highly underestimated. We make practical recommendations for the end-user to cope with the current state of the databases as well as for the maintainers of those databases to contribute to the goal of the full enactment of the open data paradigm.
Tieri, P. and Nardini, C.
Royal Society of Chemistry,
Molecular bioSystems (Online)