Surrogate Pair
A mechanism in UTF-16 to represent characters outside the BMP using two 16-bit code units.
A surrogate pair is a mechanism in UTF-16 encoding to represent characters outside the Basic Multilingual Plane (BMP: U+0000 to U+FFFF). It combines a high surrogate (U+D800-U+DBFF) and a low surrogate (U+DC00-U+DFFF) to form one character using two 16-bit code units.
Many emoji are located outside the BMP and require surrogate pairs. JavaScript's String.length counts surrogate pairs as 2, causing emoji-containing text to report a higher character count than expected. JavaScript emoji handling books teach accurate counting methods.
To get the correct character count, use [...str].length or Array.from(str).length, which treat surrogate pairs as single characters.
Surrogate pairs are specific to UTF-16 and do not exist in UTF-8 or UTF-32. Character encoding deep dive books explain the technical details of surrogate pairs.