CJK (Chinese-Japanese-Korean Unified Ideographs)

A system for handling Chinese, Japanese, and Korean characters unified in Unicode as CJK Unified Ideographs.

CJK stands for Chinese, Japanese, and Korean, referring to the shared ideographic character system. In Unicode, these are assigned code points as CJK Unified Ideographs through a process called Han Unification.

Han Unification means simplified Chinese, traditional Chinese, Japanese kanji, and Korean hanja may share the same code point. The glyph displayed depends on the font, making the lang attribute crucial. Unicode CJK unification books explain the system.

CJK characters are displayed at full width and consume 2 bytes (UTF-16) or 3 bytes (UTF-8) per character, creating a significant gap between character count and byte count.

CJK text lacks spaces between words, requiring morphological analysis or N-grams for search and line breaking. CJK text processing books cover language-specific challenges and solutions.