IEEE International Conference onServices Computing, 2004. (SCC 2004). Proceedings. 2004
Download PDF

Abstract

We present a model about DOM-based web document segmentation using the semi-structure information of web pages. This model builds DOM tree of the web page by parsing HTML tags which organize structure of the web page. By improving traditional plain text segmentation algorithms, we expand these algorithms to suit web text segmentation. Then, with the boundaries between the nodes in the DOM tree, precision of segmentation results can be increased further.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles