JIS (Japanese Industrial Standards)

Japan's national standards for industrial products. In the character encoding domain, JIS X 0208 (basic Japanese character set) and JIS X 0213 (extended character set) form the foundation of Japanese text processing.

JIS (Japanese Industrial Standards) is the set of national standards established under Japan's Industrial Standardization Act. The name was updated from "Japanese Industrial Standards" to "Japanese Industrial Standards" (日本産業規格) in a 2019 legal revision, though the abbreviation JIS remains unchanged. In the character encoding domain, the character sets and encoding schemes defined by JIS have formed the backbone of Japanese computing for decades.

Three JIS standards are central to Japanese character encoding. JIS X 0201 (1969) defined katakana and ASCII-compatible alphanumeric characters as the first standard. JIS X 0208 (1978, originally JIS C 6226) defined 6,879 characters including kanji and became the core of Japanese computing. JIS X 0213 (2000) extended JIS X 0208 to cover 11,233 characters, adding Level 3 and Level 4 kanji.

The kanji in JIS X 0208 are divided into "Level 1" (2,965 characters) and "Level 2" (3,390 characters). Level 1 contains frequently used kanji arranged in Japanese syllabary order. Level 2 is arranged by radical and stroke count, covering kanji used in personal names and specialized terminology. This classification is reflected in the byte ranges of Shift_JIS and EUC-JP, where Level 1 and Level 2 kanji occupy different byte value ranges. JIS character code references on Amazon provide detailed mapping tables.

To actually use JIS character sets on computers, encoding schemes are required. Three encoding schemes emerged for the same JIS X 0208 character set: ISO-2022-JP (7-bit, designed for email), Shift_JIS (designed by Microsoft, the Windows standard), and EUC-JP (widely adopted on UNIX systems). This "one character set, multiple encoding schemes" structure is the root cause of Japan's notorious mojibake problem.

Today, there is virtually no reason to choose a JIS-based encoding for new development. Unicode (UTF-8) has become the de facto standard, and all JIS X 0208 characters are included in Unicode. However, government systems, financial institution core systems, and EDI (Electronic Data Interchange) platforms still use Shift_JIS or ISO-2022-JP, making character encoding conversion knowledge essential in practice.

For character counting, JIS X 0208 characters occupy 2 bytes in Shift_JIS, 2 bytes in EUC-JP, and 3 bytes in UTF-8. The same "one kanji character" has different byte counts depending on the encoding, so byte-based length limits (such as database VARCHAR definitions) require knowing which encoding is assumed.

Share this article