Estimating the Quality of Crowdsourced Translations Based on the Characteristics of Source and Target Words and Participants

Muhammad Rizal Khaefi; Rajius Idzalika; Imaduddin Amin; Zakiya Pramestri; Pamungkas Jutta; Yulistina Riyadi; George Hodge; Jong Gun Lee

doi:10.1109/ASONAM.2018.8508319

Abstract

Text-based media possess a wealth of insights that can be mined to understand perceptions and actions. Researchers and public officials can use these data to inform development policy and humanitarian action. An important step in analyzing text-based databases, such as social media, is the creation of taxonomies which are used to filter information relevant to topics of interest. We worked with thousands of online volunteers to translate 2,137 keywords or phrases in English to formal or vernacular expressions in 29 different languages with the aim of understanding human responses to natural disasters, as well as developing sets of corpus on non popular languages (non English and non EU languages) which still has limited studies. In processing the data set, we faced a challenge in selecting a set of quality translations for each language. This paper aims to estimate the quality of the crowdsourced translations by non-professional translators. This paper presents an extensive empirical study using 91 features from 29 languages corpora to describe (a) translators, (b) source expressions, and (c) translated expressions. Our results show that our approach exploring two regression models and two supervised learning methods produces better results than a baseline approach with a commonly used metric, namely peer-review scores.

Estimating the Quality of Crowdsourced Translations Based on the Characteristics of Source and Target Words and Participants

Authors

Abstract

Related Articles