2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Download PDF

Abstract

Cluster analysis has become a popular method for gene expression data, which can be used for the diagnosis of diseases accurately and rapidly through the class label. However, more attributes and less samples of gene expression data will produce a mass of redundant or disturbed information, resulting in the decline of the accuracy of the direct clustering acting on high dimensional data. Principal Component Analysis (PCA) is a classical method for dimension reduction which can transform high dimension data into low space. The shortcoming of PCA is the lack of strong interpretation because the loadings have no characteristic of sparsity. In this paper, a sparse PCA method based on Truncated Power, which can minimizes the cardinality of loadings as well as maximizes the percentage explained variances of principal components (PCs), was applied into the feature extraction method for gene expression, then the sparse PCs was fed into K-means process for clustering. Finally, the experimental results on three typical gene datasets verify that the sparse gene data can improve the efficiency and accuracy on clustering analysis.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles