2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Abstract

Since the introduction of the Sanger sequencing technology in 1977 by Frederic Sanger and his colleagues, we observe an explosion of sequence data. The cost of storage, processing, and analyzing the data is getting excessively high. As a result, it is extremely important that we develop efficient data compression and data reduction techniques. But standard data compression tools are not suitable to compress biological data since they contain many repetitive regions. There could exist high similarities among the sequences. In this context we need specialized algorithms to effectively compress biological data. In this paper we propose novel algorithms for compressing FASTQ files. We have done extensive and rigorous experiments that reveal that our proposed algorithm is indeed competitive and performs better than the best known algorithms for this problem.

Related Articles