Find here the tools used for generating WIT3 plain texts for MT:

  • HLT Web Manager (Manual): the WEB crawler used for downloading web pages of TED Talks and storing them in XML format

  • Processing scripts (README): perl scripts developed for processing TED Talks transcripts as downloaded by the HLT Web Manager