The phylogenetic profile of a gene is a reflection of its evolutionary history
and can be defined as the differential presence or absence of a gene in a set of
reference genomes. It has been employed to facilitate the prediction of gene
functions. However, the hypothesis that the application of this concept can also
facilitate the discovery of bacterial virulence factors has not been fully
examined. In this paper, we test this hypothesis and report a computational
pipeline designed to identify previously unknown bacterial virulence genes using
group B streptococcus (GBS) as an example. Phylogenetic profiles of all GBS
genes across 467 bacterial reference genomes were determined by
candidate-against-all BLAST searches,which were then used to identify candidate
virulence genes by machine learning models. Evaluation experiments with known
GBS virulence genes suggested good functional and model consistency in
cross-validation analyses (areas under ROC curve, 0.80 and 0.98 respectively).
Inspection of the top-10 genes in each of the 15 virulence functional groups
revealed at least 15 (of 119) homologous genes implicated in virulence in other
human pathogens but previously unrecognized as potential virulence genes in GBS.
Among these highly-ranked genes, many encode hypothetical proteins with possible
roles in GBS virulence. Thus, our approach has led to the identification of a
set of genes potentially affecting the virulence potential of GBS, which are
potential candidates for further
other bacterial pathogens.
Publisher: Public Library of Science
Date Published: 4-April-2011
Author(s): Lin F., Lan R., Sintchenko V., Gilbert G., Kong F., Coiera E.