Department

Computer Science

Additional Department

Statistics and Analytical Sciences

Document Type

Article

Publication Date

12-17-2018

Abstract

Background: Predicting prognosis in patients from large-scale genomic data is a fundamentally challenging problem in genomic medicine. However, the prognosis still remains poor in many diseases. The poor prognosis maybe caused by high complexity of biological systems, where multiple biological components and their hierarchical relationships are involved. Moreover, it is challenging to develop robust computational solutions with high-dimension, low-sample size data. Results: In this study, we propose a Pathway-Associated Sparse Deep Neural Network (PASNet) that not only predicts patients’ prognoses but also describes complex biological processes regarding biological pathways for prognosis. PASNet models a multilayered, hierarchical biological system of genes and pathways to predict clinical outcomes by leveraging deep learning. The sparse solution of PASNet provides the capability of model interpretability that most conventional fully-connected neural networks lack. We applied PASNet for long-term survival prediction in Glioblastoma multiforme (GBM), which is a primary brain cancer that shows poor prognostic performance. The predictive performance of PASNet was evaluated with multiple cross-validation experiments. PASNet showed a higher Area Under the Curve (AUC) and F1-score than previous long-term survival prediction classifiers, and the significance of PASNet’s performance was assessed by Wilcoxon signed-rank test. Furthermore, the biological pathways, found in PASNet, were referred to as significant pathways in GBM in previous biology and medicine research. Conclusions: PASNet can describe the different biological systems of clinical outcomes for prognostic prediction aswell as predicting prognosis more accurately than the current state-of-the-art methods. PASNet is the first pathway-based deep neural network that represents hierarchical representations of genes and pathways and their nonlinear effects, to the best of our knowledge. Additionally, PASNet would be promising due to its flexible model representation and interpretability, embodying the strengths of deep learning. The open-source code of PASNet is available athttps://github.com/DataX-JieHao/PASNet.

Journal Title

BMC Bioinformatics

Journal ISSN

1471-2105

Volume

19

Digital Object Identifier (DOI)

10.1186/s12859-018-2500-z

Comments

© The Author(s). 2018 This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Share

COinS