UTF-16

A Unicode encoding that uses 16-bit code units. Used internally by JavaScript, Java, and Windows.

UTF-16 is a Unicode encoding that represents characters using 16-bit (2-byte) code units. Characters in the Basic Multilingual Plane (BMP) use 2 bytes, while characters outside the BMP use surrogate pairs (4 bytes).

JavaScript strings are internally represented as UTF-16. This means String.length returns the number of UTF-16 code units, and emoji or other non-BMP characters count as 2. JavaScript string handling books explain these nuances in detail.

UTF-16 comes in two byte orders: big-endian (UTF-16BE) and little-endian (UTF-16LE). A BOM (Byte Order Mark) at the start of a file indicates which order is used.

While UTF-8 is the web standard, Windows internal APIs, .NET, and Java use UTF-16. Programming and character encoding books discuss when to use UTF-16 vs UTF-8.