Casting some light on experts’ experience with translation crowdsourcing
Vilelmini Sosoni, Ionian University, Corfu
This paper reports on an empirical study concerning professional translators’ attitudes towards and experience with translation crowdsourcing. In particular, it seeks to explore how professional translators perceive translation crowdsourcing and what concerns they raise, if any. It also aims at identifying any problems they may face as crowdworkers during the translation process. The investigation takes place in the framework of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project where a crowd of professional translators is used for the translation into Greek (EL) of English (EN) MOOC (Massive Open Online Courses) educational data on the CrowdFlower platform with the end goal of using such translations to train and tune a machine translation (MT) system. It concludes highlighting the unexpected benefits that crowdsourcing may bring to professional translators.
Translation crowdsourcing, expert crowdsourcing, professional translators, translation technology, MOOCs.
In 2007, Muhammad Yunus, the 2006 Nobel Peace Prize winner, predicted that a time would soon come when there would be only one language in the Information Technology (IT) world – your own (Yunus 2007: 194). Yet today, a decade later, we are still a long way away from this ideal. This can be explained if we consider the geo-political changes of the 20th century and the economic deregulation which have brought about a staggering globalisation and have significantly increased the flow of goods, people and information. In that context, the amount of digital content to be translated has been growing at an unprecedented rate. Moreover, as communication between people in all corners of the world has become practically instantaneous, people are seeking to access the same entertainment material and most of the textual material they encounter in their daily lives in their native languages, at the same time, a phenomenon described by Zuckerman as ‘The Polyglot Internet’ (2008: n.p.). Yet the number of professional translators is not sufficient to meet the demand of individuals and businesses (Kelly 2009a) and budgets to fund this demand for translation range from limited to non-existent (European Commission, 2012: 75), especially in the aftermath of the 2007 financial crisis (Sosoni and Rogers, 2013: 7-8). In the light of these shortfalls, two solutions have been steadily gaining ground: Machine Translation (MT) (Esselink 2003; Quah 2006; O’Hagan 2016) and translation crowdsourcing (Gambier 2012; Garcia 2015: 19).
Both are controversial in the world of professional translators and have been approached with extreme caution by Translation Studies (TS) scholars. This is not surprising given that TS only established itself as a discipline in the 1980s (Snell-Hornby 2012: 365), and also because the translation profession has been struggling with recognition, status and adequate remuneration for years (Dam and Zethsen 2008; European Commission, 2012). Yet, as O’Brien observes, the translation profession “has changed over time and has become almost symbiotic with the ‘machine’” (2012: 103), while at the same time crowdsourcing, which was once considered “a dilettante, anti-professional movement” (O’Hagan 2011: 11) located on the periphery of the translation profession, is now occupying a more central position (Flanagan, 2016: 164).
This paper reports on an empirical study concerning professional translators’ attitudes towards and experience with translation crowdsourcing. In particular, it seeks to investigate how professional translators, i.e. trained practitioners who are remunerated for their work (cf. Pérez-González and Susam-Saraeva 2012: 150; Flanagan 2016: 150), perceive translation crowdsourcing. It also aims at identifying any problems they may face as crowdworkers during the translation process. This investigation is carried out in the framework of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project, which aims at providing reliable machine translation for Massive Open Online Courses (MOOCs) and where, among others, a crowd of professional translators is used for the translation into Greek (EL) of English (EN) MOOC educational data on the CrowdFlower platform. The translations produced on the platform will be used to train and tune an MT system so that it handles successfully all genres of educational data (cf. 2.2.) The experience of the professional translators is recorded with the use of questionnaires and diaries which they were instructed to keep during the translation process. Their answers are used to shed some light on professional translators’ attitudes towards crowdsourcing and help raise awareness about its impact on translation practice, on translation training as well as on the translation industry.
Crowdsourcing as an approach to activate or use the knowledge and skills of a large group of people in order to solve problems has existed for a long time (cf. Ellis, 2014). However, as a concept, it is attributed to Howe (2006) who created this portmanteau term by joining the words crowd and outsourcing, and defined it in his seminal article in Wired Magazine as “the act of taking a job traditionally performed by a designated agent […] and outsourcing it to an undefined, generally large group of people in the form of an open call” (2006: np).
Crowdsourcing is a practice firmly grounded in the participatory nature of Web 2.0 that can be used by any business, organisation, institution or group to harness the wisdom of the crowd, usually a large group of amateurs, experts, volunteers, professionals, fans, citizens etc., to accomplish a given task (Brabham 2013). It has also been boosted by the shift from stand-alone PCs, located at fixed work stations, to the spread of distributed computing in the form of laptops, notebooks, tablets, phablets, smartphones and watches with Internet connectivity. This trend in distributed computing and, thus, the transition from fixed locations of access to increased wireless presence coupled with the exponential growth of Internet capability means that more people have online access and presence 24/7 and can carry out tasks from the comfort of their homes, but also while travelling, commuting, working or during their free time while outdoors. Crowdsourcing has been used widely and extensively and appears in many forms; for instance, it can take the form of an online, distributed problem-solving and production model (Brabham 2008: 76), it can take the form of crowdfunding or crowdvoting (Estelles Arolas and González-Ladrón-De-Guevara 2012), or it can be used to organise labour though the parcelling out of work to an (online) community, offering payment for anyone within the ‘crowd’ who completes the tasks set (Whitla, 2008: 16).
2.1 Translation crowdsourcing
Crowdsourcing has inevitably affected TS as well. As Cronin (2010: 4) observes:
The bidirectionality of Web 2.0 has begun to determine the nature of translation at the outset of the 21st century with the proliferation of crowd-sourced translation or open translation projects such as Project Lingua, Worldwide Lexicon, Wiki Project Echo, TED Open Translation Project and Cucumis.
Nowadays, crowdsourcing may refer to online collaborative translation or free translation crowdsourcing, which assumes the free nature of the contribution and is also known as volunteer translation, community translation, social translation or, in some cases, fan translation (notably fansubbing), where the locus of control is within the community itself and tends to respond to horizontal, rather than vertical hierarchies and control structures (Pym 2011; Gambier 2014). It can also refer to translation crowdsourcing where the locus of control rests with the initiating organisation, institution or company and often involves the compensation of the crowd — so-called paid crowdsourcing (Garcia 2015). In the case of paid crowdsourcing, initiatives often compensate participants depending on their qualifications or performance: from way below market rates for non-professional crowdworkers to variable higher rates for professional translators who collaborate to produce translations (Garcia 2015). More importantly, due to the increasing need for quick, cost-effective and multilingual translation, large Language Service Providers (LSPs), such as Lionbridge, increasingly use managed crowdsourcing workflows as an innovative labour model, which they call Business Process Crowdsourcing (BPC), and which they claim results in higher productivity, quality and cost savings (Lionbridge, 2013).
It can be easily understood that translation crowdsourcing is particularly controversial in the realm of TS and professional translators given that it relies on volunteer labour –both amateurs and professionals, both paid and non-paid– to support not-for-profit but also for-profit activities. Dodd, for instance, considers crowdsourcing “the exploitation of Internet-based social networks to aggregate mass quantities of unpaid labour” and worries this practice could lead to “a new apartheid economics of socialism for the workers, capitalism for the bosses” (Dodd 2011: n.p.). Yet others have been rather positive about the role of crowdsourcing in the industry and the changes this process brings to translation practice. Baer (2010: n.p.), for instance, argues that when crowdsourcing projects are effectively and appropriately designed, they can turn what has been considered a threat to the translation industry into a more acceptable and even positive model that seeds collaboration between amateur and paid professional translators, offers a training ground for new translation graduates and also expands the material that gets translated, broadening access to information, and familiarising more people with the complexity of the translation process. As McDonough Dolmaya (2011: 97) aptly observes:
While some initiatives do enhance the visibility of translation, showcase its value to society, and help minor languages become more visible online, others devalue the work involved in the translation process, which in turn lowers the occupational status of professional translators.
Unfortunately, to date not many empirical studies exist to highlight the possible concerns of translators in relation to the crowdsourcing model (cf. Kelly 2009a; Kelly 2009b; Kelly et al. 2011; McDonough Dolmaya 2012; Pérez-González and Susam-Saraeva 2012; Flanagan 2016). In addition, as Flanagan observes, “much of what has been written is published as industry reports, which professional translators commonly deem as questionable sources” (2016: 150).
Under the light of the above, this paper seeks to identify professional translators’ experience with the practice of translation crowdsourcing through an analysis of questionnaires and diaries related to a set translation crowdsourcing task from EN into EL. The main difference between this study and other studies — such as the one carried out by Flanagan (2016) — is that it actually investigates professional translators’ experience with crowdsourcing rather than their general views about its practice.
2.2 Crowdsourcing and TraMOOC
Translation crowdsourcing is used in this study to refer to expert translation crowdsourcing, i.e. crowdsourcing for the completion of translation tasks —set by an organisation, institution or company — by professional translators on an online crowdsourcing platform (for expert crowdsourcing cf. Retelny et al 2014). In particular, in the framework of the TraMOOC project, crowdsourcing — both expert and amateur crowdsourcing — is employed for collecting human translations of English MOOC content into nine European (German, Italian, Portuguese, Dutch, Bulgarian, Greek, Polish, Czech and Croatian) and two BRIC (Russian and Chinese) languages using the crowdsourcing platform CrowdFlower.
MOOCs have been growing rapidly in recent years in terms of the number of providers and universities involved as well as in terms of the number of students enrolled. This is not surprising given that they are considered a key toolkit for lifelong education, digital skills acquisition, and the continuing professional development (CPD) of workers. However, a major problem with MOOCs is that MOOC content is typically provided in English, when the vast majority of students are non-native English speakers. TraMOOC aims at breaking down the language barrier of MOOC content. It focuses on developing MT solutions for automatically translating all text types of educational content (notes, assignments, video lecture subtitles, forum text etc.) from English into eleven target languages. A primary goal of the project is high quality MT output, which, in part, relies on the acquisition and the development of in-domain parallel training and testing data sources of substantial volume. Given that the majority of the target languages in this project have weak MT infrastructures, adaptation and bootstrapping and crowdsourcing are used to enhance them. Crowdsourcing, in particular, is frequently used for the creation of parallel translation data for the training of MT models (Zaidan and Callison Burch 2011) and for that reason it was also used in the TraMOOC project to create parallel translation data in the eleven language combinations of the project.
3.1 The CrowdFlower platform: selection and configuration
The platform selected for the crowdsourcing activities in the TraMOOC project was CrowdFlower. It was selected among many platforms, such as Amazon Mechanical Turk, Clickworker, Microworkers, etc, mainly because of its configurability, its robust infrastructure and its high reception and popularity level in the microtasking field. The platform was then configured on the basis of prior parallel translation tasks, as described below.
- Instructions: The task instructions informed the crowdworkers about the nature of the task, its particularities, the source of the data, specific rules that had to be followed. In particular, there were two sets of instruction: language-independent (given that the task would be carried out in eleven language combinations) and language-specific.
The language-independent instructions included the following:
Your task is to translate a number of sentences into your language.
The translation is going to be used for the training of a machine translation system.
Please make sure that your translation:
-Is faithful to the original in both meaning and style, i.e., without taking into account potential translator’s preferences concerning style, lexical choice, word order, etc.
-Does not add or omit information from the original text.
-Does not contain any grammatical and/or spelling errors, additional spaces, trailing spaces, line breaks.
-In the case of foreign words used as loanwords in the target language, follow the rules of the target language regarding transliteration/keeping the original.
-In the case of numbers/digits, please use the target language rules.
-Do not translate mathematical symbols and formulas.
-Do not translate the "<URL>" tag (note: all URLs in the text have been automatically replaced with this tag).
-When creating your translation, please do not use any machine translation systems!
The language-specific instructions for Greek included the following:
-Translate the term and acronym MOOC (Massive Open Online Course) as Ανοιχτό Μαζικό Διαδικτυακό Μάθημα.
-Do not translate the words e-mail, blog, company names such as Google, Iversity etc, the names of movies and books.
-Use the angled quotation marks («») instead of the straight quotation marks (“ “). In case you have quotes within quotes please use angled quotation marks («») for the initial quote and straight quotation marks for the quote within the quote (e.g. «Θέλω να σας μιλήσω για τον όρο "εμβύθιση"».)
-Do not forget to place an accent mark on the Greek question words πού, πώς, BUT do not place an accent mark on the question words τι, ποιος/ποια/ποιο/ποιοι/ποιες/ποια.
-Use the existing/established translation for place names (e.g. London: Λονδίνο, Leipzig: Λειψία). In case there is no existing/established translation, please transliterate using the simplification method (e.g. Lombok: Λομπόκ).
-Transliterate anthroponyms and animal names using the simplification method (e.g. Shakespeare: Σέξπιρ, Kate: Κέιτ, Dolly: Ντόλι, etc.), unless there is an existing/established translation in Greek (e.g. Descartes: Καρτέσιος, Newton: Νεύτωνας, etc.) or the personal name is a Greek personal name (e.g. Socrates: Σωκράτης, Plato: Πλάτωνας, etc.)
-Use the existing/established translation for the names of organisations (e.g. IMF: ΔΝΤ, WHO: ΠΟΥ, ECB: EKT, Amnesty International: Διεθνής Αμνηστία, etc.). In case there is no existing/established translation, please leave it untranslated.
-In general, use the forms belonging to Demotiki (L-Variety) rather than Katharevousa (H-variety) (έφτασε instead of έφθασε, υπόψη instead of υπ’όψιν, εκλέχτηκα instead of εξελέγην, etc).
- User Interface (UI) Design: The UI (Figure 1) has been designed based on the task’s requirements and usability principles. For instance, after experimenting with different ranges of segments per job page, it was decided to go with ten segments per page on the basis of previous studies (Zaidan and Callison Burch, 2011). Another issue that was crucial during then designing of the UI was to prevent workers from using machine translation tools during the translation task; sentence selection was deactivated in the source sentence textbox in order to achieve this. Moreover, input textboxes were designed in a way that prevented blank answers, and CrowdFlower’s validators were used to remove multiple and trailing whitespaces from crowdsourced translations.
Fig. 1. The User Interface for the crowdsourcing translation task
- Time Settings: Time constraints were set for the task completion. Based on extensive pilot trials, the minimum time a crowdworker had to spend completing a page of work was set to 2 minutes. This was done to ensure that crowdworkers spend at least a reasonable amount of time on a page before they submit it and thus avoid random or garbage translations. The maximum time limit to submit a page was set by default to 30 minutes as suggested by the platform and as indicated by the tests we run. This upper time limit was meant to help crowdworkers focus on the work instead of spending time on other tasks, which could distract them and hence affect the quality of the translations. Yet after more than half of the crowdworkers sent messages to the Task Support asking for an extension of the upper time limit, this was increased to 60 minutes.
- Accuracy Settings: CrowdFlower allows the calibration of a minimum accuracy level on test questions, i.e. the minimum accuracy a crowdworker must achieve and maintain during a job on a set of predefined translations or gold standard data — called test questions in CrowdFlower terminology — and set as a quality control technique (cf. 3.4.) (Zaidan and Callison Burch 2011: 2). In the Quiz Mode, this is the minimum accuracy percentage a crowdworker must achieve in order to pass the Quiz Mode and enter the actual job; it is also the minimum a crowdworker must maintain during Work Mode, which was designed to contain hidden test questions among the actual translation task. If a crowdworker falls below this threshold at any time, s/he is banned from the job, and all of his/her answers are marked as ‘untrusted’ in the final results. On the basis of extensive pilot trials which included an analysis of the quality of the provided translations on the basis of the DQF-MQM typology1, the minimum accuracy was set to 60%.
3.2 The Crowdworkers
The crowdworkers consisted of 126 Greek students2 . Of those, 99 were final year undergraduate students on a BA in Trilingual Translation course (Greek-English-French or Greek-English-German) at the Department of Foreign Languages, Translation and Interpreting at the Ionian University (Greece). They had at least C2 level of reading and writing in Greek and C1 level of reading and writing in English and had completed 40 or more practical translation modules, half of which were in specialised translation (i.e. translation of technical, scientific, medical, legal, economic, administrative texts). The remaining 27 were postgraduate students attending a postgraduate course (MA) in the Science of Translation at the Department of Foreign Languages, Translation and Interpreting; they were all native Greek speakers (or bilingual in Greek and another European language) working with English (at least C1 level for reading and writing).
Both undergraduate and postgraduate students were given the option to carry out the crowdsourcing tasks without a monetary reward as part of the Translation Tools compulsory module they were enrolled on. The performance of these tasks was intended to familiarise them with crowdsourcing which was completely unknown to them before then. The aim of the translation was made clear to them: the translations would be used for the creation of parallel translation corpora for the training of an MT model for the translation of MOOC content, i.e. for openly available online courses. They were informed that their translations would be donated to the scientific community and that the translation of MOOC content would not be otherwise possible, given that MOOC providers would not be able to employ the number of professional translators required to translate all their content in many different languages.
3.3 The Data
The data used in the trial experiments consisted of 48 000 English segments, which roughly corresponded to 48 000 sentences or 780 000 words, to be translated into Greek. The data, provided in UTF-8 txt format, originated from Iversity and Coursera video lecture subtitles, Iversity course forum discussion text and the Qatar Educational Domain (QED) Corpus (Iversity; Coursera; QED). The data sources included content from various subjects, such as Business Analysis, Contemporary Architecture, Crystals & Symmetry, Dark Matter, Gamification Design, Public Speaking, Web Design, Social Innovation, Monte Carlo Methods in Finance and Critical Thinking. The challenges during the translation of formal text (video lecture subtitles, notes, assignments) involved the high occurrence of domain-specific terms, scientific formulas, as well as spontaneous speech characteristics in subtitles, like repetitions, elliptical and truncated sentences and interjections. When translating informal text (forum discussions), problems included the use of informal language (slang), Internet language properties (e.g. lexical variants like ‘supa’ for super, abbreviations and acronyms like ‘OMG,’ ‘BTW,’ ‘A/S/L’), misspellings in the text, as well as multilingual tokens, unorthodox syntax and awkward word selection due to non-native speaker (NNS) writing. The English data had to undergo a clean-up process, using custom Python language scripts that involved: (1) the removal of non-English and special characters (e.g. Chinese characters, mathematical or other symbols), non-content lines, multiple or trailing whitespace characters, and (2) the correction of cases of erroneous segmentation (e.g. segments separated into multiple segments). Of course, not every problematic segment is automatically detectable and/or correctable and, as a result, there was some noise left in the data, i.e. some problematic/incomprehensible segments.
3.4 Test Questions (Gold standard)
As pointed out in 3.1., in order to ensure translation quality, gold standard data — called test questions in CrowdFlower terminology — were set as a quality control technique (Zaidan and Callison Burch 2011: 2). In particular, a set of 100 test sentences, i.e. English segments chosen from the same data sources as the rest of the data and already translated by experts in Greek, were used to validate the accuracy of the participants’ input. In other words, the basis for the crowdworkers’ evaluation, i.e. the tertium comparationis (TC) in Descriptive Translation Studies (DTS) terms (Toury 1980; Kruger and Wallmach 1997; Wehrmeyer 2014), consisted of the translations provided by translation experts and following the instructions issued by the task authors (cf. 3.1). In order to ensure ‘fair play’ with the crowdworkers, an effort was made to provide not a TC which was an ‘idealised metatext’ (Toury 1980: 76), but constructed from predetermined variables related to the research question (Kruger and Wallmach 1997), i.e. an extensive — if not exhaustive — list of alternative translations which followed closely the given instructions. Clearly, this was no easy task as, in natural language, especially in Language for General Purposes (LGP), many alternative renderings are possible and an exhaustive list of acceptable translations is not always feasible. A sample of these test questions, i.e. the English segments and their acceptable Greek renderings, follows in Table 1.
Table 1. A sample of test questions
|English Segment||Greek translations|
|1||The stock price fell by 0.8%.||Η τιμή της μετοχής έπεσε κατά 0,8%.
Η αξία της μετοχής έπεσε κατά 0,8%.
Η τιμή της μετοχής μειώθηκε κατά 0,8%.
Η αξία της μετοχής μειώθηκε κατά 0,8%.
|2||It was an autoimmune disease.||Ήταν αυτοάνοσο νόσημα.
Ήταν αυτοάνοση πάθηση.
Ηταν αυτοάνοση ασθένεια.
Ήταν αυτοάνοση νόσος.
|3||I'll use the intermediate value theorem.||Θα χρησιμοποιήσω το θεώρημα μέσης τιμής.
Θα χρησιμοποιήσω το Θεώρημα Μέσης Τιμής.
Θα χρησιμοποιήσω το Θεώρημα Μέσης Τιμής Διαφορικού Λογισμού.
Θα χρησιμοποιήσω το θεώρημα μέσης τιμής διαφορικού λογισμού.
|4||What do others think?||Τι λένε οι άλλοι;
Τι σκέφτονται οι άλλοι;
Τι νομίζουν οι άλλοι;
Οι άλλοι τι λένε;
Οι άλλοι τι σκέφτονται;
Οι άλλοι τι νομίζουν;
As the table illustrates, the Greek sentences, which are relatively simple both syntactically and terminologically, have multiple acceptable renderings which are mainly due to a) the free word order of the Modern Greek Language, b) the non-standardisation of terminology (see Valeontis and Krimpas 2014) and c) the inherent richness of the Modern Greek language which can be explained by the fact that its history dates back to the 13th century BC (Christidis 2001) and also by its current status as a minor language, i.e. as a language of limited diffusion or one of intermediate diffusion compared to a major language or language of unlimited diffusion. (Parianou 2009), which makes it open to borrowings, calques and influence from other, more dominant languages.
3.5 Questionnaires and Diaries
The crowdworkers were asked to fill in a questionnaire and submit diaries they were instructed to keep during their work on the CrowdFlower platform. The questionnaire consisted of two parts and 27 questions in total. The first part included 16 questions on demographics and the second 11 questions on their crowdsourcing experience; 25 were multiple choice questions and 2 open-ended questions requesting comments, while one was an optional question on the worker ID. Diaries, which are increasingly used as data collection instruments (Flaherty 2016) — also in TS as reflective journal logs (Hansen 2006; Shih 2011; Eraković 2013) —, constitute a research method used to collect qualitative data about user behaviors, activities, and experiences longitudinally, i.e. over a period of time. The context and time period during which data is collected differentiate diaries from other common user-research methods, such as questionnaires or usability tests (Flaherty 2016). In the case at hand, crowdworkers were asked to keep anonymously a diary during their translation work on CrowdFlower and provide comments in the form of free text entries about their interpretations, feelings, perceptions and overall translation experience. The goal of the diaries was mainly to make students aware of the different aspects of their translation work on CrowdFlower, but also to help them ponder on their experience and express their feelings and thoughts freely and over time without being restricted by the structured form of a questionnaire.
4. Findings and discussion
We collected 28 935 translated segments, i.e. 471 391words, over a period of four weeks. On average, every crowdworker provided translations for 230 segments. Only 61, however, of the 126 participants answered the questionnaire, while 78 provided a diary. This section reports on the most significant findings after analyzing the crowdworkers' answers. The majority belonged to the 18-24 age group. 86% were female, while all were native Greek speakers. All students were days away from receiving their qualification as translators, and almost all of them were computer competent. One third of the students combined their studies with some form of employment, whether temporary, part-time or freelance work, and most of them belonged to the lowest income level (<$10 000 p.a.). All students were first time users of a crowdsourcing platform as they had no prior experience with crowdsourcing. Interestingly, half of the students were not aware of translation crowdsourcing before embarking on this particular Translation Tools module.
Fig. 2. Number of workers who would be interested in completing translation tasks on a crowdsourcing platform in the future
Fig. 3. Reasons for participating in a crowdsourced translation task in the future
As shown in Figure 2, professional translators are not negatively predisposed to pursue a crowdsourcing translation task in the future; of the 61, only nine replied ‘Not at all,’ while seven replied ‘Yes, definitely’ and one specified that they would undertake translation tasks ‘if the money is worth it.’ The majority, i.e. 44, chose ‘Maybe.’ Figure 3 illustrates that translators would participate in a crowdsourced translation task in the future mainly in order to gain experience and receive a reward. This is in line with what research suggests (c.f. Schultheiss et al, 2013; Gerber and Hui, 2013) about money and practice of skills being the most important factors that motivate people to work on crowdsourcing platforms. Interestingly, however, although altruism is also mentioned in the literature as a strong motivating factor for the crowd, in the present study it was only mentioned by two of the participants. Given that the particular experiment was carried out among students in Greece at a time of severe austerity and very high unemployment (23.4% in August 2016, the highest unemployment rate recorded in the European Union3 ), it is not hard to understand why employability and expertise constitute the students’ main preoccupation and top priority. Perhaps that is why altruism scored very low on the motivational scale, with only two of the participants choosing it as a motivating factor.
Fig. 4. Level of satisfaction among workers
As far as the translators’ experience with crowdsourcing is concerned and as can be seen in Figure 4, this was satisfactory for 30 of them; 16 were neutral and 15 were either somewhat dissatisfied or very dissatisfied with their experience.
Fig. 5. Level of difficulty of the translation job
As illustrated in Figure 5, 17 translators found the translation job easy and two very easy, 20 found it somewhat difficult and only one very difficult, while 21 found that the job was neither easy nor difficult.
Fig. 6. Clarity of instructions
Fig. 7. Level of satisfaction with the Task Support
In addition, as can be seen in Figure 6, the majority, i.e. 59 translators, found the instructions very clear and somewhat clear, three thought they were neither clear nor unclear, seven found them somewhat unclear and two very unclear. The clarity of the instructions is of paramount importance if we want the provided translations to meet the set expectations. Although the instructions, in this case, were overall considered to be clear, further refinement is required in order to ensure the maximum level of clarity and, by extension, the maximum level of the translators’ conformity with them. Figure 7 illustrates that the translators were very satisfied with the Task Support provided. 42 were either very satisfied or somewhat satisfied and nine were neutral; only four found the support rather or very unsatisfactory, while six did not use it.
Fig. 8. Fairness of test questions
With respect to the test questions and as it emerges from Figure 8, 6 translators thought they were very fair and 27 somewhat fair, while 10 were neutral. Yet 15 thought the test questions were somewhat unfair and 2 very unfair. Given that, as pointed out in 3.1., test questions are used as a quality control technique and crowdworkers have to a) achieve the set pass mark in order to enter the Work Mode and b) maintain the set mark in order to keep on working on the job, it becomes evident that these need to be refined in order to avoid contentions by translators as well as unfair fails. Based on the translators’ comments in the diaries and the questionnaires and their contentions as these have been recorded on the CrowdFlower platform, it actually emerges that for reasons of fairness and transparency the type of gold standard should be replaced with a different one, e.g. a set of multiple choice questions.
Fig. 9. The causes of the main difficulties, according to the workers
Table 2. A sample of translator answers to the question “How was the translation on the crowdsourcing platform different from the translations you completed in the past?”
|How was the translation on the crowdsourcing platform different from the translations you completed in the past? (open question in Questionnaire)|
|1||In my opinion translation out of context and per word, following exactly the source text, is considered to be unfruitful, and a little bit narrow-minded.|
|2||There was absolutely no context, the sentences from time to time had mistakes in syntax and grammar.|
|3||Unlike other translation tasks that aim to the creation of a new cohesive text, translations on CrowdFlower required the faithful reproduction of words in the target language.|
|4||I wasn't given any context.|
|5||Absence of context, faithful in structure of sentences|
|6||You could not be sure of what was translated because there was not any context to help you guess the faithful translation.|
|7||The oral speech, the mistakes found in the original that are not supposed to be found in a written text.|
|8||Not an actual cohesive text to translate but abstract segments with no context. Faithful translation required.|
|9||I was not aware of the context of the texts.|
|10||It was really difficult trying to understand what the text was about without its context.|
|11||The main problem was to produce translation without context. Translating till now this problem was not present.|
|12||There was no context. There were segments of text and one could not always specify their subject.|
|13||There were only the segmented texts without the context. So, it was difficult to find the true meaning of some words|
|14||There was a time limit and there was no context|
|15||Very different in terms of context lack and highly specialised scientific terminology.|
|16||It was segmented and out of context, that made it somewhat difficult. In one page there could be as many as ten different-area segments, where you needed to research many of different terms, which made it inconvenient for the translator.|
|17||The most important difference - and difficulty - was the lack of context, and then the lack of freedom to produce more flexible translations.|
|18||It was different because we did not have context. The translation quality is being jeopardised.|
|19||The word to word translation is very difficult and most of the times the translated sentences lack in coherence and cohesion. Moreover, not knowing the context affects the outcome of the translation.|
|20||The translation task that was carried out on CrowdFlower was different from the ones I have completed in the past because there was no context and the original text had many mistakes. These are the reasons why the traditional translation method cannot be applied in that case.|
|21||The source text/ segments were full of mistakes and there was no context!|
|22||It was quite different because it required faithful translation and because the texts were without context.|
|23||The main difference was the fact that someone should translate without a context, and also the time limit.|
|24||There's always a given context in the translations I've completed in the past. Here, the context was missing. Without it a translation is bound to be mediocre.|
|25||For the first time I had to translate without knowing the context. It was as if 4 years of undergraduate studies in Translation were deconstructed.|
|26||There was no context so as to fully understand what each segment was talking about and also the time limit was quite stressful. Impossible to keep the quality.|
|27||Completely different, as faithful translation was required.|
|28||It was out of context.|
As far as the actual translation process is concerned, it emerges from Figure 9 and Table 2, which includes the first 28 of the 48 answers provided in the open question about the workers’ experience with translation on the crowdsourcing platform, that the main problems are the segmented text and the lack of context – which in translation is of paramount importance. As Killman (2015: 206) observes:
An appropriate translation must rely on context, where context influences the semantic properties of a piece of source text and the lexical properties of how it should be translated. That is, the meaning of a piece of language and the way meaning can be expressed may both vary depending on the context.
This lack of context, workers noted, also increased the inherent ambiguity of natural language and rendered their task especially challenging.
They also commented on the mistakes found in the original and the large number of incomplete segments (probably a result of data cleaning). They underlined the orality of the language, an uncommon characteristic of written texts which inevitably poses problems to translators, the highly specialised and varied terminology which often included scientific formulas and the awkward syntactic structures and unnatural lexical choices most probably due to NNS writing. Finally, they stressed the difficulty caused by the need to produce faithful translations given that they were used to translating with a communicative skopos (Vermeer, 1996) in mind, and demanded more detailed instructions.
Table 3. A sample of translator diary logs
|Logs in Diaries|
|1||The platform was easy to use and practical. I want to work more on it.|
|2||It's too difficult for a translator to translate segments like these as there were a lot of mistakes and the speakers were not native.|
|3||The test questions were unfair. There are many additional correct translations.|
|4||This programme gave me the opportunity to have a direct feedback concerning right or wrong (accepted or non-accepted) versions of the translated words, which I found extremely helpful and innovative.|
|5||I really enjoyed it. Even if it was a little bit abstract, I really gained experience by doing it.|
|6||I can concentrate more with the embedded timer.|
|7||The segmented text was impossible to translate. The instant grade was perfect!|
|9||It puts translation at risk. I can see MT and the ‘crowd’ replacing us very soon.|
|10||I will make the most of it now that I am unemployed.|
|11||Great for gaining experience. And maybe protecting small languages like Greek.|
The analysis of their diaries also reveals their frustration at the segmented text and the lack of context as well as the faithful translation required. Yet as can be seen in Table 3, translators also made positive comments about their work on CrowdFlower, emphasising the experience they gained and the immediate feedback they received (since the system of evaluating the test questions was automated).
The findings of the analysis of the questionnaires and the dairies cannot be generalised, as the study has certain limitations which are related to the questionnaire design and methodology, the respondent population, and various other factors. First of all, there is a certain degree of response bias due to the well-documented perception that technologies are intended to replace human translators entirely (Marshmann 2014) and that crowdsourcing is also threatening the translation profession (Pérez-González and Susam-Saraeva 2012). In addition, the respondent population was limited to TS students and did not include experienced translators.
The questionnaire model does not permit clarification or complementing of the information volunteered, while the use of multiple-choice questions necessarily forces the respondents to choose among specific answers rather than provide their own. These limitations were partially compensated for by the inclusion of open-ended questions and the use of diaries that allowed respondents to explain their feelings and experiences freely.
Despite the fact that they cannot be generalised, the findings of the study indicate that professional translators approach crowdsourcing with caution and face problems mainly with the segmented nature of the text and the lack of context, both of which they fear affect translation quality. The rest of the problems they reported are inherent to the particular job and can be addressed in future cases (e.g. clean data, faithful translation). Moreover, translators do identify some benefits in translation crowdsourcing, especially in relation to their training and gaining of experience and the protection of lesser-used languages.
As Doherty (2016: 963) aptly observes:
As translation technologies intersect and sometimes subsume the translation process entirely, an important factor in moving toward their effective use and in preparing for future changes is a critical and informed approach in understanding what such tools can and cannot do and how users should use them to achieve the desired result.
The professional translators’ views — as recorded in this study — suggest that professional translators and translation crowdsourcing are not incompatible and that crowdsourcing platforms and translation therein pose challenges but also opportunities which cannot be ignored. They also reveal that crowdsourcing projects, if effectively and appropriately designed, can constitute a training tool for translation students and new translation graduates, and they can also be used to help minor languages become more visible online. Although we cannot predict with certainty the position of translation crowdsourcing within the translation profession, what becomes evident is that stakeholders involved in shaping the future of the profession — be it translator trainers, TS scholars or LSPs — should at the very least take heed.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement number 644333 (TraMOOC).
- Baer, Naomi (2010). “Crowdsourcing: Outrage or opportunity?”. Translorial: Journal of the Northern California Translators Association. http://translorial.com/2010/02/01/crowdsourcing-outrage-or-opportunity/ (consulted 14.09.2016)
- Brabham, Daren C. (2013). Crowdsourcing. Cambridge: MIT Press.
- —(2008). “Crowdsourcing as a Model for Problem Solving: An Introduction and Cases.” Convergence: The International Journal of Research into New Media Technologies 14(1), 75-90.
- Christidis, Anastassios Fivos (ed.) (2007) A History of Ancient Greek: From the Beginnings to Late Antiquity. Cambridge: Cambridge University Press.
- Cronin, Michael (2010). “The translation crowd.” Revista tradumàtica, 8. http://www.fti.uab.cat/tradumatica/revista/num8/articles/04/04.pdf (consulted 14.09.2016)
- Dam, Helle Vrønning and Korning Zethsen, Karen (2008). “Translator status: A study of Danish company translators.” The Translator 14(1), 71–96.
- Dodd, Sean Michael (2011). “Crowdsourcing: Social[ism] Media 2.0.” Translorial: Journal of the Northern California Translators Association. http://translorial.com/2011/01/01/crowdsourcing-socialism-media-2-0/ (consulted 14.09.2016)
- Doherty, Stephen (2016). “The Impact of Translation Technologies on the Process and Product of Translation.” International Journal of Communication 10(2016), 947–969.
- Esselink, Bert (2003). “Localisation and translation.” Harold Somers (ed.) (2003). Computers and Translation: A translator's guide, Amsterdam: John Benjamins, 67-86.
- Ellis, Sally (2014). “A History of Collaboration, a Future in Crowdsourcing: Positive Impacts of Cooperation on British Librarianship.” Libri 64 (1), 1–10.
- Eraković, Borislava (2013). “The Role of Translation Diaries in the Acquisition of Theoretical Translation Concepts at the Beginner Level.” Professional Communication and Translation Studies, 6 (1-2), 149-156. http://www.cls.upt.ro/files/maria-nagy/SITE/Publicatii/18%20Eratovici.pdf (consulted 18.12.2016)
- Estelles Arolas, Enrique and González-Ladrón-De-Guevara, Federico (2012). “Towards an Integrated Crowdsourcing Definition.” Journal of Information Science. 32(2), 189–200.
- European Commission (2012). Studies on translation and multilingualism: Crowdsourcing Translation. http://www.termcoord.eu/wp-content/uploads/2013/08/Crowdsourcing-translation.pdf (consulted 18.12.2016)
- Flaherty, Kim (2016) “Diary Studies: Understanding Long-Term User Behavior and Experiences.” Nielsen Norman Group Articles, June 5, 2016, n.p. https://www.nngroup.com/articles/diary-studies/ (consulted 18.12.2016)
- Flanagan, Marian (2016). “Cause for concern? Attitudes towards translation crowdsourcing in professional translators’ blogs.” JoSTrans, The Journal of Specialised Translation 26, 149–173.
- Gambier, Yves (2014). “Changing landscape in translation.” International Journal of Society, Culture & Language, 22, 1–12.
- — (2012). “Denial of Translation and Desire to Translate.” VertimoStudijos 5, 9–29.
- Garcia, Ignacio (2015). “Cloud marketplaces: Procurement of translators in the age of social media.” JoSTrans, The Journal of Specialised Translation 23, 18–38.
- Gerber, Elizabeth. M. and Hui, Julie (2013). “Crowdfunding: Motivations and Deterrents for Participation.” ACM Transactions on Computer-Human Interaction (TOCHI), 20(6), 1–32.
- Hansen, Gyde (2006). “Retrospection methods in translator training and translation research.” JoSTrans, The Journal of Specialised Translation 5, 2–41.
- Howe, Jeff (2006). “Crowdsourcing: A Deﬁnition.” http://crowdsourcing.typepad.com/cs/2006/06/crowdsourcing_a.html (consulted 14.09.2016)
- Kelly, Nataly (2009a). “Myths about crowdsourced translation.” Multilingual 20(8), 62–63.
- — (2009b). “Freelance translators clash with LinkedIn over Crowdsourced translation.” Common Sense Advisory Blogs. http://www.commonsenseadvisory.com/Default.aspx?Contenttype=ArticleDetAD&tabID=63&Aid=591&moduleId=391 (consulted 14.09.2016)
- Kelly, Nataly, Ray, Rebecca and DePalma, Donald A. (2011). “From Crawling to Sprinting: Community Translation Goes Mainstream.” Minako O’Hagan (ed.) (2011). Translation as a social activity. A special issue of Linguistica Antverpiensia 10, 75–94.
- Killman, Jeffrey (2015). “Context as Achilles’ heel of translation technologies. Major implications for end-users.” Translation and Interpreting Studies 10(2), 203–222.
- Kordoni, Valia, van den Bosch, Antal, Kermanidis, Katia Lida, Sosoni, Vilelmini, Cholakov, Kostadin, Hendrickx, Iris, Huck Matthias and Way, Andy (2016) “Enhancing Access to Online Education: Quality Machine Translation of MOOC Content, Proceedings of the 10th edition of the Language Resources and Evaluation Conference. May 2016, Portorož, Slovenia. http://www.lrec-conf.org/proceedings/lrec2016/index.html (consulted 21.11.2016)
- Kruger, Alet and Wallmach, Kim (1997). “Research methodology for the description of a source text and its translation(s) – a South African perspective.” South African Journal of African Languages 17(4), 119–126.
- Lionbridge (2013). The Complete Guide to Business Process Crowdsourcing: Changing the Way Work Gets Done. http://info.lionbridge.com/Business-Processing-Crowdsourcing-Guide.html?_ga=1.138089200.1359215512.1475143910 (consulted 14.09.2016)
- Marshman, Elizabeth (2014). “Taking Control: Language Professionals and Their Perception of Control when Using Language.” Meta, 59(2), 380-405.
- McDonough Dolmaya, Julie (2012). “Analysing the crowdsourcing model and its impact on public perceptions of translation.” The Translator 18(2), 167–191.
- — (2011). “The ethics of crowdsourcing”. Minako O’Hagan (ed.) (2011) Translation as a social activity. A special issue of Linguistica Antverpiensia 10, 97–110.
- O’Brien, Sharon (2012). “Translation and human-computer interaction.” Translation Spaces, 1 (1), 101–122.
- O’Hagan, Minako (2011). “Introduction: Community Translation: translation as a social activity and its possible consequences in the advent of Web 2.0 and beyond.” Minako O’Hagan (ed.) (2011) Translation as a social activity. A special issue of Linguistica Antverpiensia10, 11–23.
- — (2016). “Massively Open Translation: Unpacking the Relationship Between Technology and Translation in the 21st Century.” International Journal of Communication 10, 929–946
- Parianou, Anastasia (2009) Translating from Major into Minor Languages. Athens: Diavlos.
- Pérez-González, Luis and Susam-Saraeva, Şebnem (2012). “Non-professionals Translating and Interpreting.” The Translator 18(2), 149–165.
- Pym, Anthony (2011). “Translation Research Terms: a Tentative Glossary for Moments of Perplexity and Dispute.” Anthony Pym (ed.) (2011). Translation Research Projects 3, Tarragona: Intercultural Studies Group, 75-110.
- Quah, Chiew Kin (2006). Translation and Technology. Basingstoke: Palgrave Macmillan.
- Retelny, Daniela, Robaszkiewicz, Sébastien, To, Alexandra, Lasecki, Walter S., Patel, Jay, Rahmati, Negar, Doshi, Tulsee, Valentine, Melissa and Bernstein, Michael S. (2014). “Expert crowdsourcing with flash teams.” Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST '14). ACM, NY, USA, 75–85.
- Schultheiss, Daniel, Blieske, Anja, Solf, Anja and Staeudtner, Saskia (2013). “How to encourage the crowd? A study about user typologies and motivations on crowdsourcing platforms.” Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing. IEEE Computer Society, 506–509.
- Shih, Claire Y. (2011). “Learning from Writing Reflective Learning Journals in a Theory-based Translation Module: Students' Perspectives.” The Interpreter and Translator Trainer 5(2), 309-324.
- Snell-Hornby, Mary (2012). “From the Fall of the Wall to Facebook. Translation Studies in Europe twenty years later.” Perspectives: Studies in Translatology 20(3), 365–373.
- Sosoni, Vilelmini and Rogers, Margaret (2013). “Translation in an Age of Austerity: From Riches to Pauper, or Not?”. Minor Translating Major (mTm) 5, 5–17.
- Toury, Gideon (1980). In Search of a Theory of Translation. Tel Aviv: Porter Institute.
- Valeontis, Κostas Ε. και Krimpas Panagiotis G. (2014). Νομική γλώσσα, νομική ορολογία: θεωρία και πράξη [Legal language, legal terminology: theory and practice]. Athens: Nomiki Vivliothiki.
- Vermeer, Hans Josef (1996). A Skopos Theory of Translation. Heidelberg: TEXTconTEXT.
- Wehrmeyer, Jennifer (2014). “Introducing Grounded Theory into Translation Studies.” Southern African Linguistics and Applied Language Studies, 32(3), 373-387.
- Whitla, Paul (2009). “Crowdsourcing and Its Application in Marketing.” Contemporary Management Research 5(1), 15–28.
- Yunus, Muhammad and Weber, Karl (2007). Creating a world without poverty: social business and the future of capitalism. New York: BBS Public Affairs.
- Zaidan, Omar F. and Callison-Burch, Chris (2011). “Crowdsourcing translation: Professional quality from non-professionals.” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, 1220–1229.
- Zuckerman, Ethan (2008). The Polyglot Internet. http://www.ethanzuckerman.com/blog/the-polyglot-internet (consulted 09.09.2016)
- Coursera. https://www.coursera.org/about/programs (consulted 14.09.2016)
- CrowdFlower. https://www.crowdflower.com/ (consulted 14.09.2016)
- Iversity. https://iversity.org/ (consulted 14.09.2016)
- QED (Qatar Computing Research Institute Educational Domain) http://alt.qcri.org/resources/qedcorpus/ (consulted 14.09.2016)
- TAUS: QT21 Harmonized Error Typology. https://www.taus.net/evaluate/qt21-project#harmonized-error-typology (consulted 18.12.2016)
- TraMOOC (Translation for Massive Open Online Courses). http://www.tramooc.eu (consulted 14.09.2016)
Vilelmini Sosoni is Lecturer at the Department of Foreign Languages, Translation and Interpreting, Ionian University, Greece. She has extensive academic as well as industrial experience having worked as freelance translator, editor and subtitler, as well as in-house project manager. She is a founding member of the Research Group “Language and Politics,” and her research interests lie in the areas of the Translation of Institutional, Legal, Political and Economic Texts, Text Linguistics and Corpus Linguistics, Translation Technology and AVT. She has participated in H2020, DG Competition and EuropeAid projects, and has published articles in international journals and edited volumes.
For a detailed description of the Harmonized DQF-MQM Error Typology see TAUS: QT21 Harmonized Error Typology. Return to this point in the text
Final-year undergraduate students and postgraduate students are considered to be professional translators in the sense that they are trained practitioners who are remunerated for their work; their core practice is translation and they do not translate for fun. Return to this point in the text
Among the Member States, the lowest unemployment rates in October 2016 were recorded in the Czech Republic (3.8 %) and Germany (4.1 %). The highest rates were observed in Greece (23.4 % in August 2016) http://ec.europa.eu/eurostat/statistics-explained/index.php/Unemployment_statistics (consulted 14.11.2016). Return to this point in the text