Abstract
Sequence-structure alignment is the most important part for an algorithm that can search genomes and identify non-coding RNAs. A model that can accurately describe the secondary structure of a noncoding RNA family is crucial to the search accuracy of genome annotation. In this paper, we develop a novel machine learning approach that can capture the crucial structure features of a noncoding RNA family and estimate the parameters in its secondary structure model. One advantage of this approach is that these estimated parameters contain structure features that are generally missing in the Conventional Covariance Model (CM). Our experiments showed that compared with the conventional CM, structure models obtained with our approach can provide a more accurate description of the secondary structure in a noncoding RNA family and thus significantly improve the accuracy of genome annotation.