2015 13th International Conference on Document Analysis and Recognition (ICDAR)
Download PDF

Abstract

Training the models needed for Automatic Handwritten Text Recognition of historical documents generally requires a significant amount of human effort. This is mainly due to the great differences that often exist between collections and to the lack of linguistic resources from the period when the documents were written, which results in a need of manual data labelling effort. This paper presents a study on the reuse of models trained with data from a different collection, focusing on the contribution that the language model and the optical models have on the performance. An empirical evaluation is performed using data from Jeremy Bentham manuscripts with the aim of recognising a manuscript about a very different topic written by Jane Austen.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles