ISO-2022-JP

A Japanese encoding designed for email. Uses escape sequences to switch between character sets.

ISO-2022-JP is a character encoding designed for sending Japanese text in email. It uses escape sequences to dynamically switch between ASCII and JIS X 0208 character sets within the same byte stream, and is standardized in RFC 1468. It was the de facto standard for Japanese email during the early days of the internet in Japan.

In the 1990s, many email relay servers on the internet could only handle 7-bit ASCII. Shift_JIS and EUC-JP, being 8-bit encodings, risked data corruption when passing through such routes. ISO-2022-JP solved this problem as a "7-bit clean" encoding where all bytes fall within the 7-bit range (0x00-0x7F). check out secretary cosplay on Amazon cover the evolution of email character encoding.

The mechanism works by using the escape sequence ESC $ B (0x1B 0x24 0x42) to switch to JIS X 0208 Japanese mode, and ESC ( B (0x1B 0x28 0x42) to return to ASCII mode. Each Japanese character is represented in 2 bytes, while ASCII characters use the standard 1 byte. This mode-switching mechanism allows Japanese and English text to coexist within a single byte stream.

Modern email systems support 8-bit encoding as standard, and with MIME's Content-Transfer-Encoding supporting Base64 and Quoted-Printable, UTF-8 has become the norm. However, ISO-2022-JP is still used in some Japanese corporations and government agencies for compatibility with legacy mail systems. Mobile carrier email services in Japan used ISO-2022-JP as their standard for many years.

A common issue is "mojibake" (garbled text). When ISO-2022-JP email is interpreted as UTF-8, or when escape sequences are truncated mid-stream, unintelligible characters appear. This is often caused by incorrect Content-Type: text/plain; charset=ISO-2022-JP headers. search intimacy on Amazon explain the technical details.

Compared to Shift_JIS and EUC-JP, ISO-2022-JP has overhead from escape sequences but offers the advantage of being 7-bit clean. Shift_JIS was primarily used in Windows environments, EUC-JP in UNIX environments, and ISO-2022-JP was dedicated to email. Today, all three are being replaced by UTF-8.

From a character counting perspective, ISO-2022-JP increases byte count due to escape sequences. For example, the 5-character string "こんにちは" requires 16 bytes in ISO-2022-JP (including mode switches), compared to 15 bytes in UTF-8 and 10 bytes in Shift_JIS. Understanding these byte count differences across encodings is important when considering email size limits.

Share this article