IWSLT 2017

Human Evaluation data

IWSLT 2017 human evaluation data is available here

Human evaluation focused on Multilingual translation and was specifically carried out on the four language directions for which the Zero-Shot translation task was proposed, namely Dutch-German (nl-de), German-Dutch (de-nl), Romanian-Italian (ro-it) and Italian-Romanian (it-ro).

The human evaluation (HE) dataset created for each language direction was a subset of the corresponding 2017 test set (tst2017). All the four tst2017 sets (nl-de, de-nl, ro-it, it-ro,) are composed of the same 10 TED Talks, and around the first half of each talk was included in the HE set. The resulting HE sets are identical and include 603 segments, corresponding to around 10,000 words for each source text.

Human evaluation included two different assessment methodologies, namely direct assessment (DA) of absolute translation quality and the traditional IWSLT evaluation based on post-editing (PE), where the MT outputs are post-edited (i.e. manually corrected) by professional translators and then evaluated according to TER-based metrics. DA and PE data collection followed different criteria:

Direct Assessment

Direct Assessment data collection was funded and carried out by Microsoft Cloud+AI, Redmond, WA, USA. 

Post-editing

The collection of post-edits was funded by the CRACKER project (EU’s Horizon 2020 research and innovation programme, grant agreement no. 645357)

For further information see:

M. Cettolo, M. Federico, L. Bentivogli, J. Niehues, S. Stüker, K. Sudoh, K. Yoshino, C. Federmann. 

The IWSLT 2017 Evaluation Campaign.

In Proceedings of the International Workshop on Spoken Language Translation (IWSLT-2017), Tokyo, Japan.

An investigation of human evaluation based on Post-editing and its relation with Direct Assessment has been carried out using a subset of IWSLT 2017 data in:

Luisa Bentivogli, Mauro Cettolo, Marcello Federico, Christian Federmann. 

"Machine Translation Human Evaluation: an investigation of evaluation based on Post-Editing and its relation with Direct Assessment" .

Proceedings of the 15th International Workshop on Spoken Language Translation (IWSLT 2018), Bruges, Belgium, 2018.

The IWSLT 2017 special release of the data used in this paper can be found here