2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE)
Download PDF

Abstract

The free energy (evaluation) models used in RNA secondary structure prediction are one of the most important reasons that makes the prediction a challenging computational problem in Bioinformatics. These models are the key factor determining the accuracy of the prediction algorithms. Previously we have developed a method called GAknot that has obtained good performance on predicting RNA secondary structures with pseudoknots. In this paper, we propose a new free energy model. We first select a number of RNA sequences from a database which contains known RNA secondary structures as a training dataset for learning this new model. From the training dataset, we then extract base pairs patterns in subsequences of pairs of k-mers from the stems of each sequence in the training data and use the patterns to formulate penalty factors. We modify the energy model by adding these penalty factors. Combined with the new modified energy model, the prediction performance of GAknot has been improved significantly. GAknot with the new modified energy model is shown to be the best method in comparison with two state-of-the-art algorithms using a commonly used testing dataset. The penalty factors of the new energy model and dataset can be downloaded at http://appsrv.cse.cuhk.edu.hk/~kktong/NewModel
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles