Language Models and Smoothing Methods for Collections with Large Variation in Document Length
In this paper we present a new language model based on an odds formula, which explicitly incorporates document length as a parameter. Furthermore, a new smoothing method called exponential smoothing is introduced, which can be combined with most language models. We present experimental results for various language models and smoothing methods on a collection with large document length variation, and show that our new methods compare favorably with the best approaches known so far.
Index Terms:
Information retrieval, Smoothing methods
Citation:
Najeeb Abdulmutalib, Norbert Fuhr, "Language Models and Smoothing Methods for Collections with Large Variation in Document Length," dexa,pp.9-14, 2008 19th International Conference on Database and Expert Systems Application, 2008