The promise of modern personalized medicine is to use molecular and clinical
information to better diagnose, manage, and treat disease, on an individual
patient basis. These functions are predominantly enabled by molecular
signatures, which are computational models for predicting phenotypes and
other responses of interest from high-throughput assay data. Data-analytics
is a central component of molecular signature development and can jeopardize
the entire process if conducted incorrectly. While exploratory data analysis
may tolerate suboptimal protocols, clinical-grade molecular signatures are
subject to vastly stricter requirements. Closing the gap between standards
for exploratory versus clinically successful molecular signatures entails a
thorough understanding of possible biases in the data analysis phase and
developing strategies to avoid them.
Methodology and Principal Findings
Using a recently introduced data-analytic protocol as a case study, we
provide an in-depth examination of the poorly studied biases of the
data-analytic protocols related to signature multiplicity, biomarker
redundancy, data preprocessing, and validation of signature reproducibility.
The methodology and results presented in this work are aimed at expanding
the understanding of these data-analytic biases that affect development of
clinically robust molecular signatures.
Conclusions and Significance
Several recommendations follow from the current study. First, all molecular
signatures of a phenotype should be extracted to the extent possible, in
order to provide comprehensive and accurate grounds for understanding
disease pathogenesis. Second, redundant genes should generally be removed
from final signatures to facilitate reproducibility and decrease
manufacturing costs. Third, data preprocessing procedures should be designed
so as not to bias biomarker selection. Finally, molecular signatures
developed and applied on different phenotypes and populations of patients
should be treated with great caution.
Publisher: Public Library of Science
Date Published: 1-June-2011
Author(s): Lytkin N., McVoy L., Weitkamp J., Aliferis C., Statnikov A.