2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)
Download PDF

Abstract

Gene selection is a crucial step when building a classifier from microarray or metagenomic data. As the number of observations is small, the gene selection tends to be unstable. It is common that two gene subsets, obtained from different datasets but dealing with the same classification problem, do not overlap significantly. Although it is a crucial problem, few works have been done on the selection stability. In this paper, we first present some stability quantification methods, then we study the variations of those measures with various parameters (dimensionality, sample size, feature distribution, selection threshold) on both artificial and real data, as well as the resulting classification performance. Feature selection was performed with t-test and classification with linear discriminant analysis. We point out a strong empiric correlation between the dimensionality/sample size ratio and selection instability.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles