IWSLT 2011

Human Evaluation data

The complete release of the IWSLT 2011 human evaluation data is available here

The human evaluation was carried out on all primary runs submitted by participants to the following tasks:

For all MT tasks, individual systems were jointly evaluated with the SC runs and the additional online system runs prepared by the organizers.

For each task, systems were evaluated on an evaluation set composed of 400 sentences randomly taken from the test set used for automatic evaluation.

The IWSLT 2011 human evaluation focused on System Ranking, which aims at producing a complete ordering of the systems participating in a given task. In IWSLT 2011, the ranking evaluation was carried out with the following characteristics:

For further information, see the following papers:

Marcello Federico, Luisa Bentivogli, Michael Paul, Sebastian Stüker. 2011. Overview of the IWSLT 2011 evaluation campaign. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), San Francisco, CA, 8-9 December 2011.

Marcello Federico, Sebastian Stueker, Luisa Bentivogli, Michael Paul, Mauro Cettolo, Teresa Herrmann, Jan Niehues, Giovanni Moretti. 2012. The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation. In Proceedings of LREC 2012, Istanbul, Turkey, 23-25 May 2012.