Abstract
Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 1.0 (RF-Phos 1.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 1.0, which uses random forest classifiers to integrate various sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 1.0 compares favorably to other existing phosphosite prediction methods, such as PhosphoSVM, GPS2.1 and Musite.