UTF-8

A variable-length Unicode encoding. The dominant character encoding on the web, used by over 98% of websites.

UTF-8 is a variable-length character encoding for Unicode. It maintains backward compatibility with ASCII while supporting every character in the Unicode standard. Over 98% of web pages use UTF-8.

UTF-8 uses different byte lengths for different characters: ASCII uses 1 byte, extended Latin 2 bytes, CJK characters 3 bytes, and emoji 4 bytes. Web technology books cover UTF-8 as essential knowledge for developers.

In HTML, <meta charset="UTF-8"> declares the character encoding. Missing this declaration can cause mojibake (garbled text).

Databases should use utf8mb4 (a UTF-8 variant) to store all Unicode characters including emoji. Character encoding books provide in-depth coverage of UTF-8 internals.