Abstract
Aligning multiple homologous protein sequences (MSA) helps biologists identify the relationship between species and possibly predict the structure and functionality of the protein. However, optimally aligning multiple sequences has been proven to be intractable by Wang and Jiang in [1]. For the last two decades, researchers have often taken different heuristic approaches to solve this problem without a consistent and reliable scoring method. In this paper, we have developed a scoring metric (hierarchical expected matching probability [HEP]), that measures the probability of residue mutations and the biological correctness of MSA results. Both theoretical and manual selected test sequences have shown that our quantitative metric is more reliable, consistent, and biologically meaningful than many commonly used scoring metrics.