Background

Subtitling is the preferred multimedia content translation method in most European countries and for most genres, ensuring that audiovisual content is widely accessible across languages. The increasing use of multilingual multimedia through the internet, the popularity of DVDs, and the current European policies promoting linguistic diversity and audiovisual accessibility have all raised the demand for subtitling in recent years.

There is a clear need to optimise the productivity of current subtitle translation workflow processes, reducing costs and turnaround times while enhancing the consistency of the translation results.

 

Goals

SUMAT aims to increase the efficiency of professional subtitle translation through the introduction of statistical machine translation technology.

We are developing an online subtitle translation service for 9 European languages combined into 14  language pairs.

 

The Language Pairs

Final Language Pairs Diagram

 

Why Use MT Technology?

Machine translation uses software to translate text from one natural language to another.

Statistical Machine Translation (SMT) is a way of generating translations on the basis of statistical models derived from the analysis of bilingual and monolingual text corpora.

SMT suits subtitles because:

  • Subtitles are short, grammatically sound, textual units, whose linguistic properties fit well with state-of-the-art SMT models.
  • The approach promotes the reusability of  existing and new translations as training data.

 

The Rising Use Of Post-editing

The translation industry is embracing post-editing translation in domains where there are enough parallel bilingual corpora to customise machine translation engines.

This means that for trained human translators post-edited translation is an increasingly useful method that has been shown to achieve higher productivity than human translation alone.

 

The SUMAT Approach

To build customised SMT engines for subtitles, trained on large professional-quality parallel and monolingual subtitle corpora.

To evaluate the merits of this approach by:

1. Having professional subtitle translators judge the quality of machine-translated subtitles through quality ranking scales.

2. Measuring the productivity gain achieved by post-editing machine-translated subtitles, compared to starting the translation process from scratch.

 

Project Milestones

Corpora

For each of the language pairs in the project, large amounts (ca.1 million subtitles on average) of professional quality parallel subtitle corpora have been collected and prepared for SMT training purposes.

Experiments

Various technical approaches have been explored with the aim of improving SMT performance:

  • Subtitle vs. sentence alignment
  • Factored and syntax-based models
  • Named Entity Recognition & Compound Splitting
  • Augmented phrase-tables
  • Mixed models for translation domain adaptation

Online Service

A prototype online service has been developed and is currently being refined. The final service will be based on the requirements and specifications provided by professional users in the consortium.

Evaluation

Evaluation by professional subtitle translators is under way. Two evaluation rounds are foreseen:

  • Round 1: Subtitle translators are scoring individual subtitles and categorising the errors found with the aim of analysing the quality of the SMT outputs. Their feedback is being used to refine the SMT engines.
  • Round 2: The productivity gain that can be achieved through the use of the SUMAT approach will be measured.

Results

Evaluation results and the Online Service will be finalised by Q1 2014.

SUMAT leaflet 2014