|
Published Articles >> Table of Contents >> Abstract
Second International Conference on Document Image Analysis for Libraries (DIAL'06)
pp. 232-242
Distance Measures for Layout-Based Document Image Retrieval
Joost Van Beusekom, Technical University of Kaiserslautern, Germany
Daniel Keysers, Technical University of Kaiserslautern, Germany
Faisal Shafait, Technical University of Kaiserslautern, Germany
Thomas M. Breuel, Technical University of Kaiserslautern, Germany
Full Article Text:

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DIAL.2006.16
Send link to a friend
| Abstract |
|
Most methods for document image retrieval rely solely
on text information to find similar documents. This paper
describes a way to use layout information for document image
retrieval instead. A new class of distance measures is
introduced for documents with Manhattan layouts, based
on a two-step procedure: First, the distances between the
blocks of two layouts are calculated. Then, the blocks of
one layout are assigned to the blocks of the other layout
in a matching step. Different block distances and matching
methods are compared and evaluated using the publicly
available MARG database. On this dataset, the layout type
can be determined successfully in 92.6% of the cases using
the best distance measure in a nearest neighbor classifier.
The experiments show that the best distance measure for
this task is the overlapping area combined with the Manhattan
distance of the corner points as block distance together
with the minimum weight edge cover matching.
|
Additional Information
|
Citation:
Joost Van Beusekom, Daniel Keysers, Faisal Shafait, Thomas M. Breuel,
"Distance Measures for Layout-Based Document Image Retrieval,"
dial,
pp. 232-242,
Second International Conference on Document Image Analysis for Libraries (DIAL'06),
2006
|
|