ICU (International Components for Unicode)

A Unicode internationalization library providing string collation, conversion, formatting, and multilingual processing.

ICU (International Components for Unicode) is an internationalization (i18n) library developed by the Unicode Consortium. Available in C/C++ (ICU4C) and Java (ICU4J), it serves as the multilingual processing foundation for many platforms and applications.

ICU provides string collation (locale-aware sorting), date/number/currency formatting, text boundary detection (word, sentence, line breaks), and character conversion (hiragana to katakana). Unicode internationalization library books cover the full scope of ICU.

Node.js has included the full ICU dataset by default since v13, and the Intl API uses ICU internally.

ICU's collation algorithm (UCA: Unicode Collation Algorithm) accurately handles locale-specific sort orders. Software internationalization books explain collation algorithms in detail.