ASCII

A 7-bit character encoding standard representing 128 characters including English letters, digits, and basic symbols.

ASCII (American Standard Code for Information Interchange) is a character encoding standard established in 1963 by the American Standards Association (ASA, now ANSI). It uses 7 bits to represent 128 characters: uppercase and lowercase English letters (A-Z, a-z), digits (0-9), basic symbols (! @ # $ etc.), and 33 control characters (newline, tab, etc.). Created to standardize data communication between computers, it remains the starting point for all modern character encodings.

ASCII forms the foundation of modern character encoding. UTF-8 is fully backward-compatible with ASCII, meaning ASCII characters use exactly 1 byte in UTF-8. This makes ASCII text highly byte-efficient. Thanks to this compatibility, files written entirely in ASCII can be read correctly as UTF-8 without any conversion. Character encoding fundamentals books cover ASCII's history and mechanics in detail.

In programming, ASCII code values are frequently used for character classification and conversion. For example, uppercase A is 65 and lowercase a is 97, with a constant difference of 32. This regularity enables fast case conversion using bitwise operations. Additionally, the digit 0 has a code value of 48, so subtracting 48 from the character '5' yields the numeric value 5. Such knowledge of ASCII code values is practically useful when implementing input validation and parsers.

ASCII's 128 characters cannot represent Japanese, Chinese, or other non-Latin scripts, which led to the development of regional encodings. Japan used Shift_JIS and EUC-JP, China used GB2312, and the proliferation of these encodings caused widespread character corruption issues. Unicode eventually emerged to handle all the world's characters in a unified system, but ASCII lives on as its foundation. Computer science fundamentals books treat ASCII as essential knowledge.

In character counting, ASCII characters always maintain a simple 1 character = 1 byte relationship. In contrast, Japanese full-width characters consume 3 bytes in UTF-8, so the byte count differs significantly even for the same character count. When considering database VARCHAR limits or API payload sizes, the ratio of ASCII to non-ASCII characters directly impacts byte count estimates.