Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)   pp. 529-535
An EM Based Training Algorithm for Cross-Language Text Categorization

Full Article Text: Download PDF of full textBuy this article

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2005.29
Send link to a friend

Abstract
Due to the globalization on the Web, many companies and institutions need to efficiently organize and search repositories containing multilingual documents. The management of these heterogeneous text collections increases the costs significantly because experts of different languages are required to organize these collections. Cross-Language Text Categorization can provide techniques to extend existing automatic classification systems in one language to new languages without requiring additional intervention of human experts. In this paper we propose a learning algorithm based on the EM scheme which can be used to train text classifiers in a multilingual environment. In particular, in the proposed approach, we assume that a predefined category set and a collection of labeled training data is available for a given language L₁. A classifier for a different language L₂ is trained by translating the available labeled training set for L₁ to L₂ and by using an additional set of unlabeled documents from L₂. This technique allows us to extract correct statistical properties of the language L₂ which are not completely available in automatically translated examples, because of the different characteristics of language L₁ and of the approximation of the translation process. Our experimental results show that the performance of the proposed method is very promising when applied on a test document set extracted from newsgroups in English and Italian.
Additional Information

Citation:  Leonardo Rigutini, Marco Maggini, Bing Liu, "An EM Based Training Algorithm for Cross-Language Text Categorization," wi, pp. 529-535,  2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05),  2005

Similar Articles

Abstract Contents
Abstract
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

Peer Review Notice

Give us Feedback