Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)
Download PDF

Abstract

One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. The paper describes a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles