Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

2007 Seventh IEEE International Conference on Data Mining   pp. 625-630
Local Word Bag Model for Text Categorization

Full Article Text: Download PDF of full textBuy this article

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2007.69
Send link to a friend

Abstract
Many text processing applications adopted the Bag of Words (BOW) model representation of documents, in which each document is represented as a vector of weighted terms or n-grams, and then cosine distance between two vectors is used as the similarity measurement. Although the great success in information retrieval and text categorization, the conventional BOW model ignores the detailed local text information, i.e. the co-occurrence pattern of words at sentence or paragraph level. In this paper, we propose a novel approach to represent a document as a set of local tf-idf vectors, or what we called local word bags (LWB). By encapsulating local information distributed around a document into multiple LWBs, we can measure the similarity of two documents via the partial match of their corresponding local bags. To perform the matching efficiently, we introduce the Local Word Bag kernel (LWB kernel), a variant of VGPyramid match kernel. The new kernel enables the discriminative machine learning methods like SVM to compute the partial matching between two sets of LWBs in linear time after an one time hierarchical clustering procedure over all local bags at the initialization stage. Experiments on real world datasets demonstrate the effectiveness of our new approach.
Additional Information

Citation:  Wen Pu, Ning Liu, Shuicheng Yan, Jun Yan, Kunqing Xie, Zheng Chen, "Local Word Bag Model for Text Categorization," icdm, pp. 625-630,  2007 Seventh IEEE International Conference on Data Mining,  2007

Similar Articles

Abstract Contents
Abstract
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

Peer Review Notice

Give us Feedback