|
Published Articles >> Table of Contents >> Abstract
2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)
pp. 529-535
An EM Based Training Algorithm for Cross-Language Text Categorization
Leonardo Rigutini, Università di Siena
Marco Maggini, Università di Siena
Bing Liu, University of Illinois at Chicago
Full Article Text:

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2005.29
Send link to a friend
| Abstract |
|
Due to the globalization on the Web, many companies
and institutions need to efficiently organize and
search repositories containing multilingual documents. The
management of these heterogeneous text collections increases
the costs significantly because experts of different
languages are required to organize these collections.
Cross-Language Text Categorization can provide techniques
to extend existing automatic classification systems
in one language to new languages without requiring additional
intervention of human experts. In this paper we propose
a learning algorithm based on the EM scheme which
can be used to train text classifiers in a multilingual environment.
In particular, in the proposed approach, we
assume that a predefined category set and a collection of labeled
training data is available for a given language L₁.
A classifier for a different language L₂ is trained by translating
the available labeled training set for L₁ to L₂ and
by using an additional set of unlabeled documents from
L₂. This technique allows us to extract correct statistical
properties of the language L₂ which are not completely
available in automatically translated examples, because
of the different characteristics of language L₁ and of
the approximation of the translation process. Our experimental
results show that the performance of the proposed
method is very promising when applied on a test document
set extracted from newsgroups in English and Italian.
|
Additional Information
|
Citation:
Leonardo Rigutini, Marco Maggini, Bing Liu,
"An EM Based Training Algorithm for Cross-Language Text Categorization,"
wi,
pp. 529-535,
2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05),
2005
|
|