Abstract
We investigate the problem of automated video classification by analysing the low-level audio-visual signal patterns along the time course in a holistic manner. Five popular TV broadcast genre are studied including sports, cartoon, news, commercial and music. A novel statistically based approach is proposed comprising two important ingredients designed for implicit semantic content characterisation and class identities modelling. First, a spatial-temporal audio-visual "concatenated" feature vector is composed, aiming to capture crucial clip-level video structure information inherent in a video genre. Second, the feature vector is further processed using principal component analysis to reduce the spatial-temporal redundancy while exploiting the correlations between feature elements. This gives rise to a compact representation fro effective probabilistic modelling of each video genre. Extensive experiments are conducted assessing various aspects of the approach and their influence on the overall system performance.