Abstract
In web topic detection, detecting “hot” topics from enormous User-Generated Content (UGC) on web data poses two main difficulties that conventional approaches can barely handle: 1) poor feature representations from noisy images and short texts; and 2) uncertain roles of modalities where visual content is either highly or weakly relevant to textual cues due to less-constrained data. In this paper, following the detection by ranking approach, we address the problem by learning a robust shared representation from multiple, noisy and complementary features, and integrating both textual and visual graphs into a k-Nearest Neighbor Similarity Graph (k-N2SG). Then Non-negative Matrix Factorization using Random walk (NMFR) is introduced to generate topic candidates. An efficient fusion of multiple graphs is then done by a Latent Poisson Deconvolution (LPD) which consists of a poisson deconvolution with sparse basis similarities for each edge. Experiments show significantly improved accuracy of the proposed approach in comparison with the state-of-the-art methods on two public data sets.