The block world corpus ====================== origins from the department of Department of Linguistics and Philology at Uppsala University, Sweden, www.lingfil.uu.se By the kind consent of professor Jörg Tiedemann, I've got the permission to use the corpus, translate it to new languages and licence it as I see fit. The corpus illustrates some linguistic features that are a nuisanse to machine translation. It consists of two parts: 1. Files for training/developing a translation model - these files are named blockworld.parallel + language suffix e.g. blockworld.parallel.en 2. Files for building a statistical language model - these files are named blockworld.full + language suffix e.g. blockworld.full.en (The files are UTF-8 encoded to comply with the standard for Machine Translation. The national characters will be distorted if you use Windows and open the files with e.g. Notepad. I recommend Notepad++ for viewing and editing the files.) Originally the corpus was intended for experiments with statistical machine translation, but it might as well be used with rule based systems e.g. shallow transfer systems. It would be useful to have the corpus translated to more languages. I would very much appreciate if you translated the corpus to your language and sent the files to me. Per Tunedal