Importance of Parallel Text in Machine Translation

Modern machine translation systems learn from existing translations how to translate. translate.tilde.com also is such a system — a statistical machine translation system computing probabilities based on existing translations. Then, the probabilities of the translations are used in translating.
For a computer to be able to compute the probabilities of translations, we need the so-called parallel text, i.e., a text in one language with a corresponding translation in the other language; sentences of both texts should be aligned, i.e., we must know which sentence corresponds to which translation. The larger is the size of parallel texts available, the higher is the level of a machine translator you can train. Therefore, the parallel text is very significant in the development of machine translation.

As a result, globally today various projects and activities take place with the aim to collect as many as possible parallel texts for both improving the machine translation and increasing the productivity of human translators. Tilde is also involved in a number of such activities. They include research projects, development projects of new services, and just good initiatives. This time, I'll tell about one of such initiatives, next time — about research and other projects we are involved in.

Tilde in collaboration with other companies such as Adobe, Oracle, Sun, Intel, Microsoft, etc., is one of the founders of the international TAUS Data Association (TAUS DA). The organization was created with the aim to share parallel text resources among those who have large parallel text resources available to them. The TAUS DA database contains translations from different organizations, including companies, EU institutions, individual translators. And these translations are very useful both in increasing the productivity of translators and in improving the machine translation systems. Tilde, too, has made available a big part of translations done in Tilde, and now they are included in the TAUS DA database.

Currently, these are just the first steps of real application (for details, see: http://www.tausdata.org/index.php/visitor-center/use-cases). Tilde also together with Adobe took part in an experiment organized by the TAUS DA with the aim to find out whether it is possible to create a customized machine translation system over a very short period of time (24 h) based on the TAUS DA data that would assist in translating real software interface and documentation. The answer is: yes, within 24 hours you can create a machine translation system based on the TAUS DA data that provides good English–Latvian translations of Adobe texts.