Collation

Rules for comparing and sorting strings. Defines sort order that varies by language and culture.

Collation is a set of rules used for comparing and sorting strings. Since sort order varies by language and culture even for the same characters, it is an important concept in internationalization.

In databases, collation can be specified per table or column. MySQL's utf8mb4_unicode_ci performs case-insensitive comparison, while utf8mb4_bin performs binary comparison. Database internationalization books cover this topic in detail.

Japanese sorting requires complex rules including hiragana-katakana equivalence, dakuten/handakuten handling, and kanji reading order. Unicode's CLDR (Common Locale Data Repository) standardizes these rules.

From a character count perspective, collation affects character equivalence. For example, whether "は" and "ば" are treated as identical depends on collation settings. Internationalization programming books provide additional context.