BOM (Byte Order Mark)

A byte sequence at the start of a file that identifies the encoding. EF BB BF for UTF-8, FF FE or FE FF for UTF-16.

A BOM (Byte Order Mark) is a special byte sequence placed at the very beginning of a text file to indicate the encoding type and byte order (endianness). It is the encoded form of Unicode character U+FEFF (ZERO WIDTH NO-BREAK SPACE), serving as a hint for applications to automatically detect the file's encoding. The BOM is not part of the file's content itself but acts as metadata.

The specific byte sequence of a BOM varies by encoding. UTF-8 uses EF BB BF (3 bytes), UTF-16 Big Endian uses FE FF (2 bytes), UTF-16 Little Endian uses FF FE (2 bytes), and UTF-32 BE uses 00 00 FE FF (4 bytes). For UTF-16 and UTF-32, where multi-byte values can be stored in different byte orders, the BOM is essential for endianness detection. UTF-8, however, has no byte order concept, so its BOM serves purely as an encoding identifier. find vibrator egg on Amazon cover BOM details thoroughly.

In practice, the most frequent issues involve the UTF-8 BOM. The 3-byte sequence (EF BB BF) at the start of a file is treated as unwanted data by many programs and tools. For example, a BOM at the beginning of a shell script prevents the shebang line (#!/bin/bash) from being recognized, causing execution failures. In PHP files, the BOM is sent before HTML output, triggering headers already sent errors. With CSV files, the presence or absence of a BOM determines whether Excel displays characters correctly or shows garbled text. In web development, UTF-8 without BOM is the de facto standard, and the HTML5 specification recommends omitting the BOM.

Windows Notepad historically added a BOM by default when saving as UTF-8, but starting with Windows 10 version 1903, UTF-8 without BOM became the default. This change was a welcome improvement for web developers and programmers. However, opening UTF-8 CSV files correctly in Excel may still require a BOM, so the appropriate choice depends on the use case. Modern editors like Visual Studio Code and Sublime Text allow you to check and change the encoding and BOM status from the status bar. find sake on Amazon discuss encoding settings as an important topic.

In character counting, the BOM is an easy-to-miss pitfall. Since the BOM is a zero-width invisible character, it does not appear on screen, but it does affect file size - a UTF-8 file with BOM is 3 bytes larger than one without. Additionally, if the BOM remains at the beginning of a string when reading file content, it can cause character counts to be off by one or string comparisons to fail unexpectedly. When reading files programmatically, it is important to detect and strip the BOM appropriately.

Share this article