2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Download PDF

Abstract

Understanding the nature of many diseases, including cancer, requires locating somatically acquired rearrangements corresponding to large-scale chromosomal aberrations. Computational methods to detect inter-chromosomal rearrangements based on next-generation sequencing platforms face the big challenge of accurately predicting the location of sites spanned by a typically small number of reads, while the entire sample contains hundreds of millions of reads. In this work, we propose a method called TDJD that identifies the location of interchromosomal breakpoints corresponding to a large scale structural variations, in particular translocations and insertions. To reduce the huge dimension of the search space, we split candidate reads that can be potential break points into windows, and represent the windows as a sequence of binary fingerprints. We then search for the location of the breakpoint in the reference genome using Jaccard distance. We use a combination of parallel computing, search using Jaccard distance to solve the exact nearest neighbor problem. The dimensionality reduction takes advantage of an SSE multi-thread architecture to achieve efficient search. We applied our algorithm to identify several reads with breakpoints, including those characterizing the PAX8-PPARγ rearrangement, a frequent modification occurring in follicular thyroid cancer. Our results show that we could identify the breakpoints much faster than the previous method. We also compared our results to several recently published methods, and found that our method is faster than all other compared methods with high accuracy.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles