RSS feed

Challenges for the audiovisual industry in the digital age: the ever-changing needs of subtitle production1
Panayota Georgakopoulou, Deluxe Digital Studios


Every facet of life in the 21st century is defined by technological advances in digitisation and networked communication, which result in endless information exchange. The mind boggling amount of content made available every day through a variety of media has already resulted in an increased need for making such information accessible both to speakers of different languages as well as to people with disabilities, who have the inalienable right to quality of life. Such demand is only going to increase exponentially in the years to come. This is certainly true of the audiovisual industry, as translated audiovisual content made available through the various distribution channels that exist today reaches a wider audience than any other type of translation. The audiovisual industry is thus experiencing an ever-increasing demand for audiovisual translation services, yet at the same time is forced to contend with the reduction of budgets as well as the contraction of timeframes in which these services need to be provided. As an industry which has strong links with and is heavily influenced by changes in technology, it is only natural to turn to language technology experts seeking from them solutions to meet the demand and deliver quality end products. All the above is bound to influence the very process of audiovisual translation, and in particular subtitling, which is the focus of the present article. It would thus be wise to look at the history of subtitle production, as well as the profile of a subtitler, as it has been defined and changed over the years, and try to define the ways in which it is expected to change further in the years to come. A version of this article was presented as a keynote speech at the META FORUM 2010 conference in November 2010 in Brussels.


Subtitling, captioning, audio description, automatic speech recognition, respeaking, machine translation.

1. Introduction

This article will focus on subtitling, the audiovisual translation (AVT) type that presents the largest area of growth today. We will discuss in brief the changes the practice of subtitling has undergone since its inception, which are intrinsically linked to developments in technology. We will then look at the challenges the industry is facing at present and the types of training necessary for subtitlers to be able to continue providing the services that the industry will demand from them in the years to come.

2. A historical overview

The beginnings of the AVT industry can be traced back to the birth of film, when subtitles first appeared as ‘intertitles’ in the silent film era. It was the sound film era though, or the ‘talkies,’ that really gave shape to the AVT industry in the ‘30s. The two main types of AVT that were developed were subtitling and revoicing (i.e. any type of replacement of the original voice track with a new one, which includes lip sync dubbing, voice over and free commentary). Today there are additional types of AVT, such as surtitling for the theatre and the opera, sign language interpreting on the screen, and audio description, as well as spoken subtitles for the visually impaired. All these types aim to offer accessibility to the media and various forms of entertainment to people with disabilities as well as to the general public.

Although subtitling and revoicing grew in tandem over the years and individual countries mainly adopted one or the other method as their preferred national method of audiovisual translation in the cinemas and on television, and later on in VHS, established practices started to change in the ‘90s with the appearance of digital video formats and the boost this gave to subtitle production. As a result, today we see traditionally dubbing countries, such as Spain, France, Germany and Italy, as well as countries with a tradition in voice over, such as Poland and other Central and East European countries, embracing subtitling as a major AVT mode (interlingual or intralingual) to be offered in their internal markets. According to Diaz Cintas (2005a: 19), out of the three main audiovisual translation methods employed in Europe, i.e. subtitling, dubbing and voice over, subtitling is the one that has not only grown the most but is expected to grow further in the future, making it the “supreme” audiovisual translation method, due to three main advantages: it is the quickest and most economical method, and it is also suitable to any type of programming. Subtitling is also the AVT type that has changed much more drastically in terms of its practice over the years than any type of revoicing, where the production process remains largely the same. Subtitling itself can be classified into several distinct categories, depending on the languages involved (i.e. ‘interlingual’ from one human language to another, and ‘intralingual’ that refers to subtitles in the same language), the method of broadcast of the subtitles (i.e. pre-recorded, semi-live or live), as well as the method of preparation of these subtitles.

In terms of training, for the largest part of the 20th century the profession of the subtitler was relatively undefined. In the beginning, the work was commonly split between translators who provided the text of the subtitles on the one hand and typists and technicians on the other. The latter, who did not necessarily understand the language of the film, provided the spotting, i.e. the in and out times of the subtitles, and they were the ones responsible for getting the subtitle text on the screen. In the mid-‘80s PCs and timecodes revolutionised the process of interlingual subtitling and it was possible for a single person to be in charge of spotting the subtitles, writing the text, as well as reviewing them on the screen, altering timings or text as s/he saw fit prior to transmission. Training for such work was mainly on the job training at the various subtitling studios and the people employed there mainly had a languages background.

At university level, AVT was not recognised as an official translation genre until very recently, with research and a comprehensive theoretical framework being set up towards the end of the 20th century. Universities picked up on the gap in the market in the late ‘90s and recognised the need for educating and training prospective subtitlers. We now have many universities offering modules and courses on subtitling and audiovisual translation in general, but also specialised courses for particular subtitling types, such as intralingual hard-of-hearing subtitling (SDH), as well as audio description (AD) or video game localisation. Interestingly enough, training in using Language Technology (LT) is generally not part of audiovisual university curricula to this date. The need for such training will be discussed in more detail later on.

3. The DVD boom and centralised subtitle production

The fast-paced technological developments in the AVT industry, mainly since the advent of digital television and the DVD in the late ‘90s, were bound to set a milestone in terms of subtitle production. The appearance of digital video formats in the market brought the possibility of centrally controlled services, such as DVD authoring, as well as all elements that go with it, like for instance the creation of multiple subtitling streams for both scripted (e.g. features, series) and unscripted material (e.g. commentaries and value added material) commonly found on DVDs. Since then, the subtitling industry has witnessed regional variation being slowly subsumed by global production (cf. Carroll 2004; Georgakopoulou 2006). This trend originally came about as a result of the requirements imposed on the market by the large Hollywood studios. As the quantity of DVD subtitling boomed in the beginning of the millennium, the ability to produce subtitles in 40 or more languages simultaneously, at fast turnarounds, became imperative for many companies. Piracy also influenced such a trend. Studios were, and still are, losing millions of dollars to piracy every year, and thus needed to find ways to control their assets better. This was achieved by storing them centrally rather than sending them all over the world. Day and date releases became more widespread for marketing reasons, but also as part of the move on the studio front to close the windows between theatrical and DVD releases in order to combat piracy. Also, central production of DVDs/BDs as opposed to local production was less expensive for studios, both in real terms, but also in terms of indirect costs, such as studio administration. Finally, there were copyright issues to deal with. Studios generally do not have the infrastructure to deal with copyright. They thus pass on this responsibility to their vendors, who are asked to pass copyright of the files they create back to the Studios. This made it easier for Studios to keep track of and archive their assets, such as subtitle files, and hold the copyright to re-use them as necessary for adaptation or reformats to other media, e.g. for broadcast, video on demand (VOD), airline releases, etc. This need for centralised production of subtitle files gave birth to a new working methodology, which, as we will discuss further down, lends itself as the ideal environment for the application of LT.

International subtitling companies were set up to respond to this need and created new working processes on the basis of centralised subtitle creation (cf. Georgakopoulou 2009). The subtitling process was split in two and a new type was born; some refer to it as ‘relay’ subtitling, a term originally used in interpreting. In this new way of subtitling, first a (typically) English subtitle template file is produced. This ‘template’ file (also known as ‘genesis’ file, ‘master’ file, ‘transfile,’ etc) provides the basic structure of a subtitle file, typically with fixed timings, which is then translated into all the languages required. This new method of subtitling caused a lot of brouhaha among traditional subtitlers and companies strongly embedded in local markets. They saw this change as a threat to their profession and immediately raised the issue of potential loss of quality, on the grounds that the subtitling styles followed in each country obey local norms which have been developed in response to local needs and should not be changed but respected. It was of course true that the changes in the working processes described above could conflict with existing subtitling norms in each country and soon enough international subtitling norms started making their appearance.2 Research conducted in respect of the reduction of speech from the audio to the subtitles in Greek subtitle files, shows that it is only a maximum of 75% of the original audio that makes it as text in the subtitles, with reduction of speech in the subtitles reaching even 50% of the original audio (Georgakopoulou 2010: 155-212). The situation is bound to be similar in other subtitling countries where subtitling is performed according to established, country-specific subtitling norms. Research also shows that the method of ‘relay’ subtitling, or the use of English template files, represents a convergence of subtitling trends across Europe, when such English template files are created efficiently and with translation by speakers of other languages in mind (ibid.: 213-293).
In terms of working practices, the use of subtitle template files provided the solution to easily centralise the management and quality check of subtitle files without necessarily having in-house staff fluent in all the languages in which subtitles are produced. It also opened up the pool of subtitler resources needed to produce this volume of work to include the entire body of traditional text translators worldwide, who could successfully translate such template files into their native languages with minimal training in subtitle production. Regional variation still remained possible to an extent in this new working methodology, assuming the creation of template files was meticulous and always with a clear definition of the end user in mind. The very structure of this working methodology, which facilitates the control and management of multiple subtitle files of the same video material, also paved the ground for the application of LT in subtitling, such as the use of translation memories and machine translation tools, which could very well cause a revolution in the process of subtitle production in the future.

The other important change which was brought along by digital formats was the availability of subtitling to new audiences. For example, we now see an increase in the amount of subtitled output enjoyed by viewers in countries with a revoicing tradition. Media Consulting Group / Peacefulfish (2007: 4) report that there is a clear move towards subtitling as far as cinema distribution is concerned in countries where dubbing and voiceover are the predominant screen translation methods. At the same time, we witness an increase of intralingual subtitling in traditionally dubbing countries, such as France and Spain, as a result of mandates for the provision of hard of hearing subtitles for broadcast purposes. Undoubtedly, this trend will increase, and it will bring with it a further increase in the demand for subtitling services. Such a growth in volume, not only necessitates training of prospective subtitlers to cater for such an increased demand, but also gives further reason for LT to be employed in order to cater for this demand.

4. Intralingual subtitling and automatic speech recognition technology

On the intralingual subtitling front, or ‘captioning’ as it is called in the States, our American colleagues have led the way. Captioning started in the States back in the ‘70s. There are many different types of intralingual subtitles or captions, the main difference being their method of transmission, i.e. whether they are pre-recorded, semi-live or live. There are also further distinctions among each type, so for pre-recorded captions, there is a choice between pop on placed captions (i.e. captions placed under the speaker), pop on centred captions and timed roll-up captions, all of which can be created with different styles. The people working as captioners come from a linguistic background mainly, but when it comes to live captioners, initially these were court stenographers trained in captioning and employed to stenocaption live programmes, such as news, sports, etc.

The developments in Automatic Speech Recognition (ASR) technology revolutionised captioning, and mainly live captioning, in the beginning of our century. Until then, the quality threshold of ASR technology was too low for it to be effectively used in live captioning and there were no ASR solutions blended in subtitling software. According to the National Captioning Institute,3 ASR was only used for about 15% of the live captioning production in the States in 2010, while stenocaptioning is still the predominant method. However, the use of ASR technology is growing rapidly. We now see ASR technology employed in the live subtitling market in Europe as well. The working methodology developed on the basis of ASR technology is known as ‘respeaking’ (also known in the States as ‘voice writing’), as it involves, apart from the essential vocabulary building in order to increase the efficiency of the tool, the training of the speech recognition engine by the user (‘respeaker’ or ‘voice writer’), who listens to the audio and subsequently re-speaks it into voice recognition software, while at the same time using specific words to enter punctuation marks and indicate subtitle features specific to a hard-of-hearing audience. In terms of training, respeaking training is only beginning to make its appearance at university level (cf. Romero-Fresco 2011: 40-43), and this is certainly a gap that will need to be filled soon, as the demand for respeakers is expected to rise drastically in the near future.

Attempts are also being made for live subtitle production to be made possible through ASR software alone, but accuracy levels are still very far from acceptable, let alone ideal.4 It is certainly not unrealistic though to expect that such accuracy and intelligibility levels will increase with further research and development in ASR technologies – for instance, with further improvement on the issue of automatic punctuation. It would also be wise to ponder on other ways that human involvement might become useful to achieving higher quality standards in such working methodologies. An idea, for instance, might be to examine alternative methodologies already in existence, such as the technique du perroquet, which is used in TF1 and France 2, as Pablo Romero-Fresco explains in his book on respeaking (2011: 3). In this case, a perroquet (parrot) does the actual respeaking, a souffleur (whisperer) proposes alternatives and a correcteur (corrector) actually types the text making the final decision on its content (see also Ninsight ProTitle Live). This is admittedly a very labour-intensive method and reports about the live subtitling situation in the UK for instance indicate that the number of people working on live subtitling has gone down proportionally to the output,5 so it might be an idea to evaluate such task segmentation, in order to see how each individual level can be automated. Perhaps it is easier to look at ways of increasing the efficiency of ASR technologies only to the extent necessary so that human involvement is limited to providing some sort of final editing to subtitles produced automatically from ASR systems integrated with professional subtitling software.6

Speech recognition is also used in offline captioning, as it is called in the States, or pre-recorded subtitling, as it is called in Europe, to an extent. In the States and in the UK, the methodology is to use ASR technologies to create transcripts of the audio in a quick and cost effective way, which can then be easily turned into caption files by experienced editors. The method of using transcripts as the first step in the creation of offline subtitle files came about relatively recently as a result of a steep drop in prices in the captioning market. It was also due to the fact that many companies provide cheap transcription solutions using operations abroad, in English speaking countries such as India or other Southeast Asian countries. The results produced in such countries require heavy editing from native speakers, due to significant quality issues. To be specific, employees in such companies often have problems understanding regional accents for instance or references to culture-specific concepts (e.g. pop culture). This method has been anathematised by many subtitlers (cf. Nakata Steffensen 2007a and 2007b), as subtitlers have to keep up with current affairs and the evolution of the language in order to be able to do their job effectively, and has also been highlighted in the European Commission 2009 report on the size of the EU language industry:

The sector of subtitling is in clear need of regulations on a European level in order to counteract trends such as peer-to-peer subtitling and outsourcing to Asian countries, which result in decreasing quality trends (2009: v).

An alternative that has only just made its appearance is the use of ASR technology to create these initial transcripts in an affordable way. This is again done using the respeaking method, i.e. by using trained employees to dictate the audio to a speech recognition system that turns it into text, which then requires little editing for quality purposes. There is still some way to go before ASR is used to its full capacity in the offline captioning market, though it is my belief that this will happen much sooner than many realise. I do not see major hurdles in employing ASR technologies in combination with appropriate subtitling software, not only for the creation of transcripts, as has been the case so far, but also such technology could be used by respeakers to create offline captions and subtitle files directly. This would circumvent the transcript step, speed up the captioning process and create further time and cost benefits. In fact, such a workflow has already started making its appearance among companies that specialise in live subtitling and who are also utilising their respeakers for offline work to an extent so as to offer them the opportunity to rest from having to focus on stressful live subtitling/captioning work full-time.7 Furthermore, the use of voice recognition software alleviates RSI problems in the subtitling population.

In Europe, apart from similar developments in the use of ASR technologies in the live subtitling industry (mainly for intralingual subtitles, as Pablo Romero-Fresco reports (2011: 1), but also for interlingual, as in the cases of Red Bee Media Wales and VTM in Flanders), speech recognition has been used to a limited extent in an alternative hybrid workflow process, where the subtitler inputs the text written out in proper subtitle format and the speech recognition system built in the subtitling software provides timings to subtitles based on the onset of speech in the audio.8 Such technology has only really been applied so far to intralingual SDH subtitling, where synchronisation of subtitles is made to the onset of utterances, even if such utterances are hesitations, false starts, unfinished or heavily ungrammatical constructions or exclamations. In addition, intralingual subtitles tend to carry higher reading speeds than interlingual subtitles. This may sound surprising, as deaf viewers are generally viewed to be slow readers, but it is a fact.9 This also means that a larger amount of text can be recorded in the subtitles, which makes the implementation of ASR solutions easier. The last point to be made here is that, currently, there are good ASR systems in use for very few languages only. The two most popular ASR engines in use today are Dragon and ViaVoice (although the later has recently been discontinued) and they cater for a limited number of languages so far.

Some interesting research has been carried out on a European scale with the DTV4ALL project, where the preferences of deaf viewers across Europe have been recorded so as to reassess their needs and propose new intralingual subtitling standards in Europe. The findings of this research were presented at the Languages & the Media 2010 conference in Berlin and one of the suggestions was the provision of different types of intralingual subtitles in accordance with the audience’s needs (e.g. deaf people (born deaf or not), severely hard of hearing people, hearing people that like to use intralingual subtitles, deaf/hard of hearing children, etc).10 Hopefully one day this will come true, but for now such a suggestion would be viewed as largely utopian, as we are not even at the point yet where one stream of SDH subtitles is available to viewers for 100% of the content available. Innovative technological solutions for the creation of such additional subtitle streams through intelligent and largely automated re-use of existing subtitle files would make such ideas much more realistic (e.g. automated editing down of an original stream of SDH subtitling to cater for audiences that would benefit from lower reading speeds, such as younger audiences).

5. Legislation on accessibility in Europe

Some countries are more advanced in the intralingual subtitle area; the UK has for a long time led the way with legislation making SDH subtitles, audio description and signing a requirement for public service broadcasters with a certain percentage of audience. The Disability Discrimination Act promotes egalitarianism and safeguards people’s equal rights to employment, as well as access to goods, facilities and services, education and public transport, within reason. OFCOM sets the targets in the UK through the Code on Television Access Services as to the amount of TV subtitling, signing and audio description that broadcasters must provide. It also provides guidance on how access services should be presented and monitors the performance of broadcasters.

Accessibility is also a focal concern of the European Union, as evidenced by the New European Community Disability Strategy of 1996 (equal opportunities for disabled persons) and the new policy framework of 1999 which sets out to remove all barriers so as to achieve full participation for disabled people in all areas of life. Media literacy and “access for persons with a hearing or visual impairment” through the use of sign language, subtitling, audio description (AD) and easily understandable menu navigation is stipulated in Article 7 of the Audiovisual Media Services Directive, which was published in the Official Journal of the European Communities on 18th December 2007 as an amendment to the Television Without Frontiers Directive. It is estimated that about 10 percent of the world’s population, or approximately 650 million people, live with a disability of some sort, while if we take into account trends in population growth, medical advances and an increasingly ageing population, it is certain that this number will continue to grow. Therefore, facilitating effective communication for all is not a “fringe issue” but rather a significant challenge (ITU and Accessibility).

Some level of legislation relating to accessibility exists in European countries other than the UK as well. Several EU countries currently have legislation mandating the accessibility of TV programming for the deaf and the hard-of-hearing through subtitles and sign language, whereas other countries offer such services even without relevant legislation. There is currently no legislation on the provision of audio subtitling in Europe for the benefit of visually impaired viewers, though this service is offered in some countries, and although at the time this is being written the UK is still the only country that has legislated the provision of AD services, AD guidelines already exist in quite a few EU countries and the service is increasingly offered by various member states (cf. EBU).

6. Audio description

AD is a service that is relatively new to audiences. It is a technique born out of necessity, that has been developed to provide visual information to blind viewers and viewers with low vision and it can be used in TV broadcasts, DVDs, theatres, museums, etc. It has existed since the early ‘80s but has only really been developed to a significant level in the beginning of the 21st century. Today, audio described material on television, in the cinema and on DVDs is on the increase and several countries are benefiting from this service. A major concern in terms of AD is the costs that relate to this service, which are considerably higher than respective costs for subtitling for the hard-of-hearing. Although it is usually the same material that is being audio-described and subtitled, technologies have not been developed yet to make efficient use of available subtitle material so as to make the AD process quicker, easier and more cost effective. When thinking of AD one would probably place the service closer to any other type of revoicing, rather than subtitling, as it involves replacement of the audio track of a programme with a new recording that is mixed with the original dialogue and M&E tracks. Nevertheless, except for the audio recording side of the production, the working process involves both software as well as skills that bear more affinity to subtitling than any type of revoicing. The workstations used to produce AD scripts resemble subtitling workstations and are typically developed by companies that also produce subtitling software. The timing skills involved in creating AD scripts are the same timing skills that are employed when subtitling. In a way, a subtitle file is the exact opposite of an AD script file, in the sense that one provides written text for the timings the other doesn’t cover. It would be possible in theory then, for the majority of the programming that is both subtitled and audio-described, to first create the subtitle file and then use it to create a new file with timings that would represent the gaps between all the subtitles in the file. Such a pre-processed file could serve as the basis of an AD file, as all the times in which an AD script could be inserted would have already been spotted. Also, much in the same way that base ‘template’ files have been used in the multilingual subtitling process for DVDs, such template files are also prepared and used for AD script writing in cases of the same programming being audio-described in multiple languages,11 albeit this AD script writing method is not yet very widespread. Such a process streamlines AVT work and produces time and cost benefits both for subtitling companies and their clients. New workflows are created as a result, and relevant training is required. Finally, Andrew Salway and James Lakritz (2005) have also conducted interesting research on different ways in which language technologies can be used in relation to AD, including preliminary work on how the first draft of an AD script can be automatically generated from a film script.

7. Subtitling technology and ICT knowledge

We have already talked about developments in interlingual subtitling, the creation of new workflows and working models. Subtitling technology has continued to develop over the years, and at a particularly fast pace since the ‘90s, providing efficient solutions to the management and QC of subtitle files, by automating many steps that were performed manually in the past. Text editing is assisted with features that are common in word processing tools, such as spell checkers, as well as automated checks for optimum subtitle length according to the selected reading speed and timing. Technology has also helped make possible the repurposing of existing subtitle files, so that time and cost savings are made when going from one medium to the other (theatre, to DVD/BD and now 3D as well, to broadcast, VOD, internet streaming etc).
As subtitling and audiovisual translation in general are so closely related to technology, it is expected that anyone interested in working in the field needs to have very good Information and Communications Technology (ICT) knowledge and be willing to become familiar with technologies that are constantly being developed. It is increasingly true that in-house subtitlers and subtitling project managers need to have a deeper understanding of more technical matters, such as video standards and video encoding, as well as interactive elements that appear on DVDs/BDs, in order to be able to perform their jobs more effectively. It is notable that this involves training that is not currently provided at universities and that is relatively hard to provide on the job, as the professionals employed mainly have a humanities background which is often incompatible with technology. The need for further interdisciplinarity in studies is evident and the universities are called upon to respond to these new and ever-changing demands in the industry. When subtitling was initially introduced at university level in Europe over a decade ago, it was frequently taught without the use of subtitling software and students had to be given rough rules of thumb on how to estimate subtitle duration and reading speed. Back in those days, real-life practical training in audiovisual translation in most cases came from placement schemes and internships provided by industry representatives that took the initiative and saw the benefit of training their prospective members of staff. Nowadays universities typically make agreements with software providers and get professional suites at a reduced cost and are thus in a position to offer practical training to their students, who need to be conversant with the new tools and specs that emerge in the marketplace.

The need for technological competence on the part of translation professionals working in the multimedia industry is also highlighted by the EMT Expert Group (2009: 7),12 as part of its proposals to implement a European reference framework for a Master's in translation (European Master's in Translation – EMT) throughout the European Union. Universities have certainly come a long way in the past decade in terms of including subtitling in their courses and responding to the industry needs for training, but there is still room for more to be done and there is no reason to doubt that the academia will once again adapt to the needs of an ever-changing industry.

8. Machine translation in interlingual subtitling

Finally, although the sheer volume of content to be made accessible, intralingually or interlingually, is increasing exponentially, the timeframes within which it has to be made accessible are decreasing, while the pressure to reduce production costs becomes greater. In terms of the latter, we have witnessed a decrease in prices of about 50% in the past decade, and a similar or higher drop in turnaround times. The situation in the intralingual subtitling industry is similar in those countries where it has flourished. For example, live captioning prices in the States have gone down by approximately 40% in the past 4 years alone and well over 50% if we look at live captioning prices over the past decade.13 Speech recognition technology has been the tool that captioning companies have used to remain competitive and is the future for them, as rates are continuing to drop, whereas the amount of programming and the number of channels and platforms are increasing. The same thing has happened, albeit with a delay, in the interlingual subtitling industry. Despite the important technological developments in management and manipulation of subtitle files, the use of highly developed software for reading speed calculation, shot frame detection and automation of many of the technical elements in the process, as well as the relatively easy repurposing of pre-existing subtitle files, the translation side is still done manually. This has put an immense strain both on companies and individual subtitlers, who are asked to cut down on costs in the process, without really being able to automate half of it. This has resulted in a drop in the quality of the subtitle files, as per the widely known project management triangle problem: time, cost, quality—pick any two.

As there is no indication that turnaround times or prices will go up, or that volume will go down, quite the contrary in fact, machine translation is asked to provide the solution to the challenge we are faced with today. In order to maintain the balance between all three sides of our project management triangle (time, cost, quality) and satisfy the demands of content providers and consumers, we will need first to provide an ever increasing volume of subtitles of all types to cater for all audience needs, so that we can all live in a society where information is accessible to all, irrespective of disabilities, nationalities and language barriers. We will also have to do this within reduced timeframes (with 100% live being the ultimate aim for certain types of content), and finally this will have to be done at affordable prices so that the information can indeed be made widely accessible.

Machine translation was feared and even mocked when it was introduced in the text translation industry. It was first applied in texts that were ideal for this type of processing, i.e. mainly technical manuals, where sentence construction normally follows simple syntactic patterns and vocabulary is limited, while terminology, which needs to be translated consistently and correctly, abounds. And now, a few years on, machine translation is an essential tool used among large language service providers, who have streamlined their working processes accordingly and demand such competencies and skills of their staff. Machine translation is also taught at universities as part of specialised translation syllabuses and general-purpose machine translation systems are used by millions of users every day, e.g. Google Translate.

In the past decade, we have seen the first few attempts to apply machine translation solutions to subtitling (e.g. MUSA, E-TITLE, etc), as well as postgraduate research on the subject (for example, see Sarrazin 2007). These first few attempts exhibited one or more of the following problems which would prevent the results of such research from being implemented in business practice:

  • Not adequately adaptable results, mainly due to the lack of corpora the systems should be built on, meaning that the ensuing translations would have to be post-edited, sometimes heavily, and a translator would not save enough time.
  • Further development of the systems depending on technological progress in speech recognition, i.e. speech-to-text conversion.
  • No easy integration of the technology in the existing workflows.

What is interesting though in these initial attempts is that all the right ideas are there. The MUSA project (cf. Piperidis et al. 2004), for example, was to create a system that would convert audio streams to transcribed text, then generate subtitles from this text, which would eventually be translated into other languages. This would be an ideal scenario for the subtitling industry, and even if this may still sound like science fiction to many today, it is my belief that it may become a reality soon. Human editors would of course have to be involved at the end of every step of the process both to safeguard the quality of the end product and to provide feedback towards the improvement of these tools.

There are already partial examples of such technology in commercial use today. Speech Conversion Technologies Inc (SCTI) is a US company which sells a product called TranslateTV. This tool claims to take US captions broadcast live on American TV and translate them live into LA Spanish for simultaneous broadcast for the benefit of the increasing Hispanic population in the States. Unfortunately the samples present in the website are disappointing in terms of the quality of the subtitled output in Spanish, and it has been easy for people to criticise such efforts as dangerous and “threatening” to the Spanish language and even the “viewer’s brain” (Diaz Cintas 2005a: 21). Such controversy could delay technological advances, instead of focusing on their innovative parts and on further research and development that needs to take place so that the end result does justice to the language it serves and truly fulfils the needs of the intended audience. User acceptance is necessary if the result of any such attempt is to be successful. One should keep in mind of course that, although the above controversy could delay the user base and therefore slow down the speed of incremental change, SCTI’s technology is already in use, and this technology will only get better with greater use and investment in refining it.

Anecdotal evidence suggests that users are currently experiencing the same fears translators experienced when machine translation for text translations was initially introduced.14 Subtitlers claim that it is impossible for machine translation to be applied to such a culturally bound product as video or to the translation of oral speech as opposed to written, heavily standardised technical text. They also secretly fear that machine translation will eventually replace them and reduce them to post-editors, and they will be required for lesser pay to correct the ‘stupid’ mistakes a machine will make, whilst not improving their speed and taking away all the enjoyment of actually translating high profile film productions. On the other hand, there is currently more content that is not made accessible than the content that is and catering for such potential volume is beyond our ability to provide cost-effective language solutions with human translators alone. However, any possible machine-aided solutions do not eliminate the need for skilled translators. On the contrary, the system demands the collaboration of human translators if the result is to be successful where accuracy is important. In fact, such attempts can also expand the translation industry, create new jobs and open up new areas for growth, not take away the jobs that are already available. If professional linguists have a say on how these changes reshape their working lives, they are more likely to embrace technology and rely on it, instead of fearing it and fighting it. They would, thus, no doubt contribute to its development with their expertise.

More recent work by Martin Volk (2008) at Stockholm University and the University of Zurich led to the production of a statistical machine translation system for the automated translation of subtitles from Swedish into Danish to begin with, later on to Norwegian as well, and also from English into Swedish, a system that has already found commercial application by integrating it in the working processes of a large Scandinavian subtitling company (Volk et al. 2010). As Martin Volk (2010b) explained at his presentation at the Languages and the Media 2010 conference (cf. Volk 2010a), initial results show savings of approximately 25% in productivity, as statistical machine translation seems to have an ideal application in subtitling due to the fact that subtitles are normally short textual units with a simple syntactic structure, they are easily aligned with the use of timecodes and can be very repetitive. Further research by De Sousa et al. (2011) in the English-Portuguese language pair, measuring the post-editing effort required for texts that have been translated using statistical machine translation (SMT) systems and translation memories, shows even higher time savings in productivity, to the level of 40% on average, as compared to when subtitles are translated from scratch. However, widespread user acceptance of such machine translation solutions is far from achieved, while these initial developments have caused yet more commotion among professional subtitlers.

The problem of the lack of quality of the end results of existing SMT systems to date has been ascribed to the volume and original quality of the parallel data that such systems use (Petukhova et al. unpublished). Large quantities of such data do exist of course, but they constitute the Intellectual Property Rights (IPR) of subtitling companies, who in all rightfulness would want to protect it from exploitation by their competitors. Therefore a level of trust has to be developed between such companies, software providers and LT experts. Such parallel corpora also exist mainly because of the working processes developed largely in the past decade, and which have to do with centralised subtitling services and the ‘template’ method of subtitle creation explained in detail earlier on.

Further development in this area comes with SUMAT, an EU-funded project which promises to develop an online translation service for subtitling, which addresses 9 European languages in total and aims at semi-automating large-scale subtitle production in 14 language pairs to begin with (7 bi-directional language pairs). Among the companies that form the SUMAT consortium are four major subtitling companies, whose main input is to provide the large volume high quality parallel subtitle corpora needed for the training and further development of an existing SMT system in said language pairs, as well as to evaluate its results. The SUMAT project kicked-off on 1st April 2011 and is expected to be complete by 31st March 2014.

Machine translation can be viewed as a type of last frontier in terms of technological developments in the subtitling industry and my belief is that it will cause another revolution and reshape the profile of the job of the subtitler in the years to come, perhaps to an even larger extent than any change we have seen to date. However, for such machine translation systems to be successful, it is not only the allegiance of subtitling companies that is required. It is equally important to build trust between LT experts and individual users/subtitlers, so that further collaboration is encouraged with the view of providing appropriate feedback in order for technology to achieve user acceptance. Such a stumbling block could be eliminated by educating linguists to understand not only what machine translation can do, but also what it cannot do, i.e. infer, assume, read between the lines, etc, so as to be in a position to provide the best feedback possible to software developers. Through this training, these linguists will also be in a position to experience first-hand how their daily work could be facilitated by such technological advances, so that user acceptance is also achieved. Finally, such collaboration can be further encouraged through interdisciplinarity in studies at university level, so that prospective subtitlers can truly understand the technology that underlies their profession but also the inherent potential in other types of technologies, such as machine translation, and how these could be best applied in subtitling.

9. Taking stock

The AVT industry today is witnessing an unprecedented demand for subtitling. Content providers and broadcasters are trying to reach the widest audience possible, while the rapid growth of internet-based video has made it increasingly common that subtitles accompany streamed and downloaded content. There is also intensive lobbying by organisations representing people with disabilities and, as a result, increasing legislation mandating the subtitling for the deaf and the hard-of-hearing.15 Subtitles are becoming more common in public places where the sound is muted or there is a lot of ambient noise, such as in gyms, airports, malls and coffee shops, while captions are also used by advertisers in an effort to widen their demographic. And let’s not forget that subtitles, captions and AD files provide instant metadata for the video asset and add value to it by increasing its ‘searchability,’ which among other things aids its repurposing.16

From the viewers’ perspective, we are witnessing an increased demand for interlingual subtitling, even in countries that are not traditionally subtitling countries, as digital video formats have helped make their audiences familiar with this AVT method. Intralingual subtitles, on the other hand, are not only used by hard-of-hearing viewers: research by OFCOM (2006) shows that 6 million out of the 7.5 million users that use intralingual TV subtitles in the UK have no hearing impairment at all. Aside from the obvious audience conditioning to subtitles, which is especially true since the advent of DVD, subtitles are also used by hearing viewers as a means to keep up with fast-paced dialogue or to retrieve information when it is not easily decipherable from the audio, or due to a character’s accent, or simply because of a noisy background (e.g. children playing in the living room) or because they have to keep the sound of the TV down. As Duffy (2006) explains, viewers also use subtitles to access additional information, such as the name of a song that is being played in the background or its lyrics, while subtitles can add value to shows such as X-factor by making it possible for the viewers to have the lyrics in front of them and turn a show into a sing-along event. There are also numerous references in subtitling literature about the positive effect of subtitles on the improvement of people’s reading skills, while there is also research conducted regarding the use of subtitling as a means to teach language skills (cf. Danan 2004; Sokoli 2006; Kothari et al. 2007; McCall and Craig 2009; etc). On top of this, we are witnessing an increase in terms of the available content to which users want instant access—and such content is no longer just the product of the entertainment industry. The term ‘content’ may also refer to corporate videos used for intra-company communication within multilingual companies spanning across all continents. Or it can be user generated content, the result of the growth of social media, such as YouTube, Facebook, etc, another sign of our times in which human relationships are changing from local to global. The spread of phenomena such as fansubbing,17 which has resulted in many subtitle files being available over the internet, and crowdsourcing18 also have an effect that we cannot ignore.

The combination of increased demand for subtitled content and reduced timeframes within which such content needs to be made accessible have created needs that subtitling companies are asked to respond to, and this will inevitably be done with help from relevant technologies. The new workflows that will result from the application of new technologies in subtitling will also reshape the profile of the job of a subtitler accordingly. It is thus advisable to look ahead to such changes and lay the ground for the training that is necessary for the successful implementation of such processes and workflows, and also to provide the skilled manpower necessary to deliver these workflows. Universities will once again have to play a major role and adapt their syllabuses accordingly. The first piece of language technology that is already used with successful results is ASR engines. The specialisation of a respeaker is already established as a job profile in subtitling companies, and demand is only likely to increase. Machine translation technologies are beginning to make their appearance as well, and much is expected from and relies on their development in the years to come. Globalisation and digitisation are reshaping the very way we work and live our lives, as is the case with other professions as well. The confluence of dropping prices, increased demand, global marketplace as well as advances in technology are likely to push development along faster in the next decade than in the past one, while it is likely that the user base will be enrolled to adopt such technology for at least certain types of projects, where usability could become the measure of effectiveness of a translation type over translation equivalence (cf. Gambier 2005). Subtitling companies recognise the need to work closely with computational linguists and technology providers, but also with universities, so that the new generation of graduates is better educated on the technicalities of the business and better able to fit the job profile of the subtitler as this will be reshaped by technology in the coming years. If the individual subtitlers are involved in how this happens, they can also have a say in what works best for them while maintaining their socio-cultural role in this changing landscape. What is certain though is that the century that we live in is that of technology and the generation of our children will experience translation in a different way to us, perhaps not unlike what we have seen in sci-fi films of the past few decades.

  • Attenborough, Alison (2011). “RedBee Media’s Use of Voice Recognition Technologies in Subtitling.” Presentation at the Third International Symposium on Live Subtitling with Speech Recognition, Antwerp, Belgium, 21.10.2011.
  • Carroll, Mary (2004). “Subtitling: Changing Standards for New Media?” (consulted 28.06.2011).
  • Danan, Martine (2004). “Captioning and Subtitling: Undervalued Language Learning Strategies.” Meta, 49:1, April 2004, 67-77. (consulted 10.10.2011).
  • De Sousa, Sheila and Aziz, Wilker & Specia, Lucia (2011). “Assessing the Post-Editing Effort for Automatic and Semi-Automatic Translations of DVD Subtitles.” (consulted 12.10.2011)
  • Díaz Cintas, Jorge (2005a). “The Ever-Changing World of Subtitling: Some Major Developments.” John Sanderson John (ed) (2005). Research on Translation for Subtitling in Spain and Italy. Universidad de Alicante, 17-26.
  • (2005b). “Back to the Future in Subtitling.” MuTra 2005 – Challenges of Multidimensional Translation: Conference Proceedings. Saarbrücken, 2-6.05.2005. (consulted 30.09.2011).
  • Duffy, Jonathan (2006). “The Joy of Subtitles”. BBC News Magazine, 31 March 2006. (consulted 28.09.2010).
  • EMT Expert Group (2009). “Competences for Professional Translators, Experts in Multilingual and Multimedia Communication.” Brussels, January 2009. (consulted 29.09.2010).
  • Gambier, Yves (2005). “Multimodality and Audiovisual Translation.” MuTra 2005 – Challenges of Multidimensional Translation: Conference Proceedings. Saarbrücken, 2-6 May 2005. (consulted 30.09.2011)
  • Georgakopoulou, Panayota (2006). “Subtitling and Globalisation.” The Journal of Specialised Translation.Issue 6, July 2006, 115-120.
  • (2009). “Subtitling for the DVD Industry.” Jorge Díaz Cintas and Gunilla Anderman (eds) (2009). Audiovisual Translation: Language Transfer on Screen. Palgrave Macmillan, 21-35.
  • (2010). Reduction Levels in Subtitling. DVD Subtitling: A Convergence of Trends. Saarbrücken: Lambert Academic Publishing.
  • Kapsaskis, Dionysios (2011). “Professional Identity and Training of Translators in the Context of Globalisation: The Example of Subtitling.” The Journal of Specialised Translation. Issue 16, July 2011, 162-184. (consulted 30.08.2011)
  • Kothari, Brij and Bandyopadhyay, Tathagata and Bhattacharjee, Debanjan (2007). “Same Language Subtitling on TV: Impact on Basic Reading Development among Children and Adults.” (consulted 18.10.2011)
  • Lakritz, James and Salway, Andrew (2005). “The Semi Automatic Generation of Audio Description from Screenplays.” School of Electronics and Physical Sciences, Department of Computing, University of Surrey, UK. (consulted 03.06.2011).
  • Media Consulting Group / Peacefulfish (2007). “Study on Dubbing and Subtitling Needs and Practices in the European Audiovisual Industry: Executive Summary.” 14 November 2007. Paris/London. (consulted 10.10.2011)
  • European Commission, Directorate-General for Translation (2009). “The Size of the Language Industry in the EU.” Studies on Translation and Multilingualism. January 2009. (consulted 10.10.2011).
  • McCall, Greg and Craig, Carmen (2009). “Same-Language-Subtitling (SLS): Using Subtitled Music Video for Reading Growth.” World Conference on Educational Multimedia, Hypermedia and Telecommunications (EDMEDIA) 2009. (consulted 10.10.2011).
  • Melero, Maite and Oliver, Antoni and Badia, Toni (2006) “Automatic Multilingual Subtitling in the eTITLE project”. Proceedings of ASLIB Translating and the Computer 28. London, November 2006. (consulted 03.06.2011).
  • Nakata Steffensen, Kenn (2007a). “Lost in Translation.” Stage Screen & Radio. March 2007, 18
  • (2007b). “Freelancers and the Crisis in British Subtitling.” ITI Bulletin.May-June 2007, 18-19
  • Ortega, Alfonso and Garcia, Jose Enrique and Miguel, Antonio and Lleida, Eduardo (2009). “Real-Time Live Broadcast News Subtitling System for Spanish.” Aragon Institute for Engineering Research, University of Zaragoza, Spain. (consulted 30.09.2011).
  • Petukhova, Volha and  Agerri, Rodrigo and Fishel, Mark and Georgakopoulou, Panayota and Penkale, Sergio and Del Pozo, Arantza and Sepesy Maučec, Mirjam and Volk, Martin and Way, Andy (unpublished). Paper submitted for The 8th International Conference on Language Resources and Evaluation (LREC). Istanbul, Turkey, May 2012.
  • Piperidis, Stelios and Demiros, Iason and Prokopidis, Prokopis and Vanroose, Peter and Hoethker, Aja and Daelemans, Walter and Sklavounou, Elsa and Konstantinou, Manos and Karavidas, Yannis (2004). “Multimodal Multilingual Resources in the Subtitling Process.” 4th International Conference on Language Resources and Evaluation (LREC). Lisbon, Portugal, May 2004. (consulted 30.09.2010).
  • Romero-Fresco, Pablo (2011). Subtitling Through Speech Recognition: Respeaking. Manchester: St Jerome Publishing.
  • Sarrazin, Gregor (2007). Computer Assisted Subtitle Translation Using Translation Memory. Unpublished MSc Thesis. University of Reading.
  • Sokoli, Stavroula (2006). “Learning via Subtitling (LvS): A tool for the creation of foreign language learning activities based on film subtitling.” MuTra 2006 – Audiovisual Translation Scenarios: Conference Proceedings. Copenhagen, May 2006. (consulted 30.09.2011).
  • Stinso, Michael and Christy, Horn and Larson, Lydy and Levitt, Hary and Stuckless, Ross (1999). “Real-Time Speech-to-Text Services.” (consulted 30.09.2011).
  • Tescari, Alessandro (2011). “Pervoice Subtitling Workstation: The New Frontier of TV Live Subtitling.” Paper presented at the Third International Symposium on Live Subtitling with Speech Recognition, Antwerp, 21 October 2011.
  • Volk, Martin (2008). “The Automatic Translation of Film Subtitles. A Machine Translation Success Story.” Joakim Nivre, Mats Dahllöf and Beáta Megyesi (eds) (2008). Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein. Uppsala University, Sweden, 202–214.
  • (2010a). “Multilingual Subtitling – The Machine Translation Revolution.” 6 September 2010. (consulted 03.06.2011).
  • (2010b). “Multilingual Subtitling in the Age of Google Translate.” Paper presented at Languages and the Media 2010 (Berlin, 6-8 October 2010).
  • (2011). “Seven Steps for the Successful Employment of Machine Translation for Subtitles.” Paper presented at the Points of View in Language and Culture conference (Krakow, 14-15 October 2011).
  • Volk, Martin and Sennrich, Rico and Hardmeier, Christian and Tidström, Frida (2010). “Machine Translation of TV Subtitles for Large Scale Production.” Proceedings of the Second Joint EM+/CNGL Workshop on Bringing MT to the User: Research on Integrating MT in the Translation Industry. Denver, 4 November 2010, 53-62. (consulted 10.06.2011)
  • Wagner, Susanne (2005). “Intralingual Speech-To-Text-Conversion in Real Time: Challenges and Opportunities.” MuTra 2005 – Challenges of Multidimensional Translation: Conference Proceedings. Saarbrücken, 2-6 May 2005. (consulted 30.09.2011).

Dr Panayota Georgakopoulou holds a PhD in translation and subtitling from the University of Surrey and is currently Senior Director of Research and International Development at Deluxe Digital Studios. She has taught translation and subtitling at the University of Westminster and the University of Surrey, where she also served as external examiner for the postgraduate programs in translation and audiovisual media. She is a member of the European Association for Studies in Screen Translation and of the META Technology Council, and has published one book and several articles on subtitling and audio description. Prior to working for DDS, Yota was Managing Director of the European Captioning Institute.


Note 1:
I am indebted to the late Prof. Peter Newmark for his generosity at the time of choosing my research topic. The enthusiasm with which he greeted my early ideas of conducting research in subtitling, at a time when subtitling was still not a widely acknowledged field of translation studies, was all the approval I needed to pursue PhD research on the subject. Although the present article was not written with the late Prof. Peter Newmark in mind, I would like to think that it would have also met with his approval and would have provided enjoyable reading for him.
Return to this point in the text

Note 2:
Extensive research as to the potential loss of quality as a result of the use of templates files has not been published to date. However, for an interesting attempt at a first analysis of this issue see Kapsaskis (2011).
Return to this point in the text

Note 3:
Personal communication with Gene Chao, President and CEO of the National Captioning Institute, October 2010.
Return to this point in the text

Note 4:
According to Stinson et al (1999), in order for a message to be transferred effectively, accuracy must exceed 95% (i.e. no more than one word error in 25), as intelligibility drops rapidly below this threshold. Wagner (2005: 4-5) reports that ASR technologies can offer 90+x% accuracy levels in ideal scenarios (training of the engine, little to no background noise, no regional speaking characteristics), however a 96+x% accuracy is still only achieved through the use of a respeaker, while the main restrictive factor in the use of ASR is the ability of the engines to recognize phrase and sentence boundaries and speaker change. The most successful application of ASR technologies in automating live subtitling without the use of a respeaker is in the case of live monolingual subtitles for the news (see, for example, Ortega et al. 2009).
Return to this point in the text

Note 5:
Personal communication with Claude Le Guyader, Business Development Manager at ITFC, October 2011.
Return to this point in the text

Note 6:
Such a solution is already made available in the ‘direct mode’ of the Pervoice subtitling workstation recently launched in Italy (Tescari 2011).
Return to this point in the text

Note 7:
This is for instance the case in Red Bee Media (cf. Attenborough 2011; Romero-Fresco 2011: 23).
Return to this point in the text

Note 8:
Although this is very useful technology, it is not problem-free in cases where there is speech at the same time as background music/sound effects. An example of such technology is Softel’s Swift Create TiGo. For more information see
Return to this point in the text

Note 9:
This is a perception that does not hold. Through the use of closed captioning, especially near verbatim, the user’s reading speed increases to keep up with the caption stream. It may be skim reading or partial reading, but the results are similar to speed reading training available in the ‘60s and ‘70s in the USA (Personal communication with Jack Gates, formerly NCI President and CEO, November 2010).
Return to this point in the text

Note 10:
Suggestion presented by Henrik Gottlieb, University of Copenhagen, Denmark, at the Panel “DTV4ALL: The Reception of SDH in Europe” at the Languages and the Media 2010 conference, Berlin, 6-8 October 2010.
Return to this point in the text

Note 11:
Dubbed versions do require adjustments as off camera actors can speak for a longer or shorter time, yet the results are still overall satisfactory, according to Claude Le Guyader, Business Development Manager at ITFC (personal communication, October 2011).
Return to this point in the text

Note 12:
Excerpt from the EMT Expert Group report on the competences of professional translators (2009: 7):
- Knowing how to use effectively and rapidly and to integrate a range of software to assist in correction, translation, terminology, layout, documentary research (for example text processing, spell and grammar check, the internet, translation memory, terminology database, voice recognition software)
- Knowing how to create and manage a database and files
- Knowing how to adapt to and familiarise oneself with new tools, particularly for the translation of multimedia and audiovisual material
- Knowing how to prepare and produce a translation in different formats and for different technical media
- Knowing the possibilities and limits of MT.
Return to this point in the text

Note 13:
Personal communication with Juan-Mario Agudelo, National Director of Sales at the National Captioning Institute, October 2010.
Return to this point in the text

Note 14:
Such were the reactions from individual subtitlers and translation associations to Martin Volk’s presentations both at the Languages & the Media 2010 conference in Berlin and the Points of View in Language and Culture conference in 2011 in Krakow, as well as some of the initial reactions to the SUMAT project.
Return to this point in the text

Note 15:
See for instance the Twenty-First Century Communications and Video Accessibility Act of 2010 in the States.
Return to this point in the text

Note 16:
Although it is extremely difficult to provide exact data as to the size of the audiovisual industry, the Media Consulting Group / Peacefulfish (2007) report that the market for subtitling and dubbing in the EU alone was estimated between €372 and €465 million in 2006. The European Commission report on “The Size of the Language Industry in the EU” predicts a conservative minimum of a 10% annual growth rate in the language industry sector, making the market for subtitling and dubbing in the EU an average of €506 million in 2008 (2009: 41), when the language industry EU-wide in total was €8.4 million in 2008 and is expected to reach a staggering €16.5 to €20 billion by 2015 (2009: iii).
Return to this point in the text

Note 17:
With the spread of fansubbing, the flexibility inherent in this translation technique could well influence and question long term established subtitling norms (cf. Diaz Cintas 2005b). This could in turn lead the way to a general, in-depth discussion and rethinking of national and international guidelines, making it easier for certain subtitling conventions (e.g. verbatim rendering of audio in US captions) to make it into markets that were not previously open to them (e.g. the European live subtitling market). As Gambier points out (2005), developments such as these could also be “an argument for viewing the output of machine translation programs […] in a different light, in that they satisfy a certain number of users who are far from the illiterate but who do not need a polished, finely boned text.”
Return to this point in the text

Note 18:
According to the European Commission (2009: vi), crowdsourcing “is likely to contribute to the survival and strengthening of regional and minority languages.”
Return to this point in the text