Plain texts for MT
For some language pairs, plain texts for MT experiments can be downloaded from this link. For each pair, the .tgz includes training and, possibly, development and evaluation set(s).
- in general, data included in an entry (L1,L2) differ from the data in the entry (L2,L1), due to the asymmetry of rebuilding sentence and text cleaning operations
- sentences were not rebuilt in language pairs having either Chinese or Japanese as target language; in such cases, the original segmentation in subtitles from TED documents is kept