2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Download PDF

Abstract

The longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data. First, we present a new algorithm for the LCS problem. Then, we introduce an LCS-motivated reference-based compression scheme using the components of the LCS, rather than the LCS itself. For the Homo sapiens genome (original size 3,080,436,051 bytes), our proposed scheme compressed the genome to 5,267,656 bytes). This can be compared with the previous best results of 19,666,791 bytes (Wang and Zhang, 2011) and 17,971,030 bytes (Pinho, Pratas, and Garcia, 2011). Thus, our compression ratio is about 3.73 to 3.41 times better than those from the state-of-the-art reference-based compression algorithms.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles