Training and development sets for the MT track
For the language pairs of the IWSLT 2011 evaluation campaign, plain texts for MT experiments can be downloaded from this table. The names of languages are represented by TED codes, mostly the same as ISO 639-1 codes. Numbers refer to millions of units (untokenized words). (row,col) entries provide the size of parallel training data available for the row language side. Each entry is linked to the tar archive of the data for the corresponding language pair, just click on it for downloading. The archives include parallel and monolingual training sets, and development/evaluation sets.