Abstract
This paper addresses the problem of video scene classification based on the small amount of natural language description created for the video stream. The approach incorporates a conventional tf·idf term-document matrix with scene class specific information derived using the maximum a posteriori (MAP) estimates and the chi-square statistic. Further latent semantic analysis (LSA) is applied to find co-occurrence terms between documents. The experiment adopts the k-nearest neighbour (kNN) and the support vector machine (SVM) classifiers to evaluate the effectiveness of scene class information and co-occurrence terms. They achieved 83.86% (kNN) and 98.11% (SVM) when the MAP estimates and the chi-square statistic were combined with the tf·idf term-document matrix, followed by LSA approximation.