| Abstract |
|
Current semi-supervised incremental learning approaches
select unlabeled examples with predicted high confidence for
model re-training. We show that for many applications this
data selection strategy is not correct. This is because the
confidence score is primarily a metric to measure the
classification correctness on a particular example, rather
than one to measure the examples contribution to the training
of an improved model, especially in the case that the
information used in the confidence annotator is correlated
with that generated by the classifier. To address this problem,
we propose a performance-driven principle for unlabeled data
selection in which only the unlabeled examples that help to
improve classification accuracy are selected for semisupervised
learning. Encouraging results are presented for a
variety of public benchmark datasets.
|
Additional Information
|
Citation:
Rong Zhang, Alexander I. Rudnicky,
"A New Data Selection Principle for Semi-Supervised Incremental Learning,"
icpr,
pp. 780-783,
18th International Conference on Pattern Recognition (ICPR'06) Volume 2,
2006
|