2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Download PDF

Abstract

In recent years, thanks to the efforts of individual scientists and research consortiums, a huge amount of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experimental data have been accumulated. Although several recent studies have demonstrated that a wealth of insights can be gained by integrative analysis of these data, owing to cost, time or sample material availability, it is not always possible for researchers to obtain binding profiles for every proteins in every sample of interest, which considerably limits the power of integrative studies. In this paper, we propose a novel method called Low Rank Convex Co-Embedding (LRCCE) for imputing new ChIP-seq datasets. In LRCCE, a diverse collection of available ChIP-seq data are fused together by mapping proteins, samples, and genomic positions simultaneously into the Euclidean space, thereby making their underling associations directly evaluable using simple calculations. In contrast with previous approaches which mainly use of the local correlations between available datasets, LRCCE can better estimate the overall data structure by formulating the representation learning of all involved entities as a single unified optimization problem. Experimental evaluations on the ENCODE data illustrate the usefulness of the proposed model.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles