Abstract
Ontologies have become a central resource for defining biomedical concepts but linkage to and from textual data is still an unresolved technology. In this paper we approach the task of concept recognition in text by comparing four extant systems (cTAKES, NCBO Annotator, BeCAS and Metamap) with default parameter settings. The systems are compared on benchmark data consisting of 2,163 scientific abstracts and 906 clinical trial reports using an automatically constructed “silver” standard and a random semi-gold standard evaluation methodology. Furthermore, evaluation is conducted on the basis of specific concept identifiers. Experimental results show: (i) Generally higher levels of concept recognition on clinical trial reports than on scientific abstracts; (ii) The best performing system we observed on the silver standard was cTAKES on both the abstract and clinical trial corpora, however NCBO Annotator performed stronger when considering only the selected broad semantic types; (iii) BeCAS and Metamap had a tendency to annotate coarser-grained annotations; (iv) the random semi-gold evaluation places an upper bound on the performance of systems. This shows broad agreement with the silver standard evaluation but highlights areas where the silver standard methodology might be improved.