Shift_JIS

A Japanese character encoding widely used in legacy systems. Being gradually replaced by UTF-8.

Shift_JIS is a character encoding for representing Japanese text. Jointly developed by Microsoft and ASCII Corporation in 1982, it was the standard encoding for MS-DOS and Windows. It is a variable-length encoding combining JIS X 0201 (half-width alphanumeric and half-width katakana) with JIS X 0208 (kanji and full-width characters), and is one of the most influential character codes in Japanese IT history.

In Shift_JIS, half-width alphanumeric characters use 1 byte, while Japanese characters (hiragana, katakana, kanji) use 2 bytes. Since UTF-8 uses 3 bytes for Japanese characters, Shift_JIS is more byte-efficient for Japanese-only text. However, Shift_JIS supports only about 7,000 characters (JIS Level 1 and 2), far fewer than Unicode's 140,000+ characters. Emoji and some kanji (JIS Levels 3 and 4) cannot be represented. find zinc supplements on Amazon cover the origins of Shift_JIS.

Shift_JIS has a well-known technical issue called the "5C problem." Certain Japanese characters have 0x5C (backslash) as their second byte (e.g., 表, 能, ソ), which conflicts with the C language escape character and can cause program malfunctions. This issue, also known as the "dame-moji" (problematic character) problem, required constant attention when programming with Shift_JIS.

Today, migration to UTF-8 is progressing globally, with over 98% of the web using UTF-8. However, Shift_JIS remains necessary in certain Japanese business contexts: Excel defaults to Shift_JIS when opening CSV files, banking and government legacy systems assume Shift_JIS, and some EDI (Electronic Data Interchange) standards specify Shift_JIS.

Character corruption can occur when converting between Shift_JIS and UTF-8. The wave dash (〜, U+301C) vs. fullwidth tilde (～, U+FF5E) conversion and fullwidth minus sign conversions are particularly problematic. Windows CP932 (Windows-31J) is an extension of Shift_JIS that includes NEC special characters and IBM extended characters, requiring attention to compatibility with pure Shift_JIS.

From a character counting perspective, the same text has different byte counts in Shift_JIS and UTF-8. One Japanese character is 2 bytes in Shift_JIS and 3 bytes in UTF-8. Database column sizes and file size estimates must account for the encoding used. Character counting tools that display both character count and byte count, visualizing byte differences across encodings, provide practically useful information. find seduction fragrance on Amazon also cover character encoding conversion as a key topic.

Shift_JIS

Share this article

Related Terms

Related Articles