Experimental analysis of feature selection stability for high-dimension and low-sample size gene expression classification task

Blaise Hanczar

doi:10.1109/BIBE.2012.6399649

2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)

Experimental analysis of feature selection stability for high-dimension and low-sample size gene expression classification task

Year: 2012, Pages: 350-355

DOI Bookmark: 10.1109/BIBE.2012.6399649

Authors

Blaise Hanczar, LIPADE, Université Paris Descartes, 45 rue des Saint-Pères, Paris, F-75006 France

Abstract

Gene selection is a crucial step when building a classifier from microarray or metagenomic data. As the number of observations is small, the gene selection tends to be unstable. It is common that two gene subsets, obtained from different datasets but dealing with the same classification problem, do not overlap significantly. Although it is a crucial problem, few works have been done on the selection stability. In this paper, we first present some stability quantification methods, then we study the variations of those measures with various parameters (dimensionality, sample size, feature distribution, selection threshold) on both artificial and real data, as well as the resulting classification performance. Feature selection was performed with t-test and classification with linear discriminant analysis. We point out a strong empiric correlation between the dimensionality/sample size ratio and selection instability.

Like what you’re reading?

Already a member?

Get this article FREE with a new membership!

Using Gene Clustering to Identify Discriminatory Genes with Higher Classification Accuracy
2006 IEEE Symposium on Bioinformatics and Bioengineering
The Effect of the Characteristics of the Dataset on the Selection Stability
2011 IEEE 23rd International Conference on Tools with Artificial Intelligence
Stable Gene Selection from Microarray Data via Sample Weighting
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Class Balanced Multifactor Dimensionality Reduction to Detect Gene–Gene Interactions
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality
IEEE Transactions on Pattern Analysis & Machine Intelligence
On the Classification of a Small Imbalanced Cytogenetic Image Database
IEEE/ACM Transactions on Computational Biology and Bioinformatics
On Dimensionality, Sample Size, and Classification Error of Nonparametric Linear Classification Algorithms
IEEE Transactions on Pattern Analysis & Machine Intelligence
Examing and Evaluating Dimension Reduction Algorithms for Classifying Alzheimer’s Diseases using Gene Expression Data
2021 17th International Conference on Mobility, Sensing and Networking (MSN)
MMCo-Clus – An Evolutionary Co-clustering Algorithm for Gene Selection (Extended abstract)
2023 IEEE 39th International Conference on Data Engineering (ICDE)
Y-SPCR: A new dimensionality reduction method for gene expression data classification
2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Experimental analysis of feature selection stability for high-dimension and low-sample size gene expression classification task

Authors

Abstract

Related Articles