Machine Translation

Technology that automatically translates text from one language to another using a computer. The advent of neural machine translation (NMT) has dramatically improved quality, enabling cross-language conversion that involves changes in character count.

Machine translation (MT) is the technology that converts text from one language to another without human intervention. Services such as Google Translate, DeepL, and Microsoft Translator have made it widely accessible, and it is used daily for translating web pages, drafting business documents, and enabling real-time conversation across languages.

The history of machine translation spans three generations. First-generation rule-based translation (1950s-1990s) relied on grammar rules and dictionaries. Second-generation statistical machine translation (2000-2015) learned translation patterns statistically from large parallel corpora. Third-generation neural machine translation (NMT, 2016 onward) uses deep learning to capture the meaning of entire sentences. NMT brought a dramatic leap in translation quality.

Machine translation and character count are closely linked. Expressing the same content in different languages produces significant variation in character count. The Japanese word "情報" (2 characters) becomes "information" (11 characters) in English. As a rule of thumb, translating from Japanese to English increases the character count by a factor of 1.5 to 2, while English to Japanese reduces it to 0.5 to 0.7 times the original. This expansion ratio directly affects the sizing of buttons and labels in UI localization.

Post-translation character limits are a major practical challenge. When translating within Twitter's 280-character limit, 280 Japanese characters carry the information equivalent of 400 to 500 English characters, so an English translation will exceed the limit. For meta descriptions, ad copy, and UI labels with character constraints, simple translation is not enough; paraphrasing or summarizing to fit within the limit is necessary.

BLEU score is the most widely used metric for evaluating machine translation quality. BLEU compares the machine output against a human reference translation using N-gram match rates, producing a score from 0 to 100. Current NMT systems achieve BLEU scores of 40 to 50 for English-French, but scores for Japanese-English tend to be somewhat lower due to the greater structural differences between the two languages.

Post-editing - the process of having a human translator review and correct machine translation output - is becoming a standard workflow in the translation industry. By using machine translation for a first draft and having a human refine it, translation speed can be increased two to three times while maintaining quality. The effort required for post-editing depends on both the quality of the machine output and the character count of the source text. Machine translation books on Amazon provide deeper insight into these workflows.

Share this article