Abstract
Kernel functions can be viewed as a non-linear transformation that increases the separability of the input data by mapping them to a new high dimensional space. The incorporation of kernel function enables the K-Means algorithm to explore the inherent data pattern in the new space. However, the recent applications of kernel K-Means algorithm are confined to small corpora due to its expensive computation and storage cost. To overcome these obstacles, we propose a new clustering scheme which changes the clustering order from the sequence of samples to the sequence of kernels, and employs a disk-based strategy to control data. The new clustering scheme has been demonstrated to be very efficient for large corpus by our experiments on hand-written digits recognition, in which more than 90% of the running time was saved.