loading...
Self-Similarity Metric for Index Pruning in Conceptual Vector Space Models
2008 19th International Conference on ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
One of the critical issues in search engines is the size of search indexes: as the number of documents handled by an engine increases, the search must preserve its efficiency, despite the growth of indexing structures. A widely agreed solution to this problem is the adoption of smaller, or pruned, indexes that allow increasing the retrieval speed while keeping the search quality as high as possible. This paper extends the notion of pruned index to semantic search systems based on conceptual vector space models and proposes a new self-similarity metric for index pruning. A conceptual vector space model represents documents as vectors ina n-dimensional space where each dimension corresponds to anontology concept. The pruning algorithm proposed in this paper acts on the basis of document self-similarity, preserving only the most significant components of a document conceptual vector. Unlike many already proposed algorithms, the self-similarity metric is only based on local information and does not require to recompute the whole pruned index when new documents are added, i.e., it can be used on-line, possibly combined with other off-line pruning policies. The proposed metric is tested against two benchmark sets respectively related to Siderurgy (250 documents annotated with respect to the e-Class ontology) and Disability (2500 documents annotated with respect to the Passepartout ontology). Results show that the compression ratio achieved by this technique is satisfying (50%), while ranking similarity with results coming from non-pruned indexes remains sufficiently high (80%), thus preserving the quality of provided results.
Index Terms:
index pruning, search index, conceptual vector space model
Citation:
Dario Bonino, Fulvio Corno, "Self-Similarity Metric for Index Pruning in Conceptual Vector Space Models," dexa,pp.225-229, 2008 19th International Conference on Database and Expert Systems Application, 2008
Usage of this product signifies your acceptance of the Terms of Use.


Click here to go to beta feedback form