Variant Glyph
Kanji characters that share the same meaning and reading but differ in visual form. Includes relationships such as standard forms, popular forms, old-style characters, and simplified characters, as in "高" vs. "髙" or "辺" vs. "邊" vs. "邉."
A variant glyph (also called a variant character) is an alternative visual form of a character that carries the same meaning and pronunciation. Japanese kanji are especially rich in variants: the character "辺" (as in the common surname Watanabe) alone has over 20 variant forms including "邊" and "邉." Because variant glyphs appear frequently in personal and place names, they are an unavoidable challenge in Japanese text processing.
Variant glyphs arise from several historical factors. First, over centuries of use, different calligraphic styles produced multiple accepted forms of the same character. Second, Japan's postwar script reforms (the Toyo Kanji list of 1946 and the Joyo Kanji list of 1981) simplified many characters, creating old-form/new-form pairs (e.g., "國" simplified to "国," "學" to "学"). Third, China, Taiwan, Japan, and Korea each independently simplified shared characters, resulting in region-specific standard forms for the same underlying character.
Unicode handles variant glyphs through two approaches. One approach assigns separate code points to each variant: "高" (U+9AD8) and "髙" (U+9AD9) have distinct code points. The other approach uses Variation Selectors: a base character is followed by VS1-VS256 (U+FE00-U+FE0F) or an IVS (Ideographic Variation Sequence, U+E0100-U+E01EF) to specify the desired glyph. The Adobe-Japan1 collection uses IVS to distinguish over 23,000 glyph variations. Unicode and kanji references on Amazon provide comprehensive coverage of these mechanisms.
Variation selectors are a significant pitfall for character counting. A base character followed by an IVS (e.g., "葛" + U+E0100) looks like a single character on screen, but in Unicode it consists of two code points. Using String.length returns 2 (or more if surrogate pairs are involved), which does not match the visual character count. Counting by grapheme cluster correctly yields 1.
In practice, variant glyphs most commonly cause problems in name matching. "渡邊" and "渡辺" likely refer to the same person, but as strings they are completely different. Financial institutions and government systems need to normalize (unify) variant characters before performing matching. Unicode normalization forms (NFC/NFKC) merge some compatibility kanji, but variants like "辺" and "邊" are not unified, requiring custom variant character tables.
From a font perspective, displaying variant glyphs requires the font to contain the specified glyph. If a font does not include the glyph designated by an IVS, the base character's default glyph is displayed instead. Japan's "Character Information Infrastructure" (MJ Character Information List) catalogs approximately 60,000 kanji glyph forms, covering the variant characters used in family registers and resident records.