Human evaluation was carried out on the seven primary runs submitted by participants to the TED Talks MT English-French task.
Systems were evaluated on a subset of the 2012 progress test set (tst2012). The Human Evaluation set represents around the initial 50% of each of the 11 tst2012 TED talks, for a total of 580 segments and around 10,000 words.
Human evaluation was based on Post-Editing, i.e. the manual correction of the MT system output, which was carried out by professional translators.
The resulting evaluation data consist of seven new reference translations for each of the 580 sentences in the Human Evaluation set.
For further information see:
M. Cettolo, J. Niehues, S. Stüker, L. Bentivogli, M. Federico.
Report on the 10th IWSLT Evaluation Campaign.
In Proceedings of the 10th International Workshop on Spoken Language Translation (IWSLT), Heidelberg, Germany, 2013.