|
Published Articles >> Table of Contents >> Abstract
International Workshop on Challenges in Web Information Retrieval and Integration
pp. 195-204
News Item Extraction for Text Mining inWeb Newspapers
kjetil Norvag, Department of Computer and Information Science, Norwegian University of Science and Technology Trondheim, Norway
Randi Oyri, Department of Computer and Information Science, Norwegian University of Science and Technology Trondheim, Norway
Full Article Text:

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WIRI.2005.27
Send link to a friend
| Abstract |
|
Web newspapers provide a valuable resource for information.
In order to benefit more from the available information,
text mining techniques can be applied. However,
because each newspaper page often covers a lot of unrelated
topics, page-based data mining will not always give
useful results. In order to improve on complete-page mining,
we present an approach based on extracting the
individual news items from the web pages and mining
these separately. Automatic news item extraction is a
difficult problem, and in this paper we also provide strategies
solving that task. We study the quality of the news item
extraction, and also provide results from clustering the extracted
news items.
|
Additional Information
|
Citation:
kjetil Norvag, Randi Oyri,
"News Item Extraction for Text Mining inWeb Newspapers,"
wiri,
pp. 195-204,
International Workshop on Challenges in Web Information Retrieval and Integration,
2005
|
|