How Emoji Combinations Change Meaning - The Difference in Information Conveyed by One Character vs. Two or More
When you pick "👨👩👧👦" from your phone's emoji keyboard, you think you are typing a single emoji. In reality, this family emoji is a concatenation of seven code points: "👨 + ZWJ + 👩 + ZWJ + 👧 + ZWJ + 👦." One character on the surface, seven underneath. In the world of emoji, combinations can dramatically change both the meaning and the character count of a single symbol.
ZWJ Sequences - Invisible Glue That Merges Emoji
ZWJ (Zero Width Joiner) is a "zero-width joining character" assigned to Unicode code point U+200D. It displays nothing on screen but acts as glue that bonds the emoji on either side into one.
The mechanism is simple. Place emoji A + ZWJ + emoji B in sequence, and if the OS or app has a glyph (rendered image) for that combination, A and B are displayed as a single merged emoji. If no matching glyph exists, A and B simply appear side by side. In other words, a ZWJ sequence is a flexible system: "merge if possible, otherwise just show them separately."
| ZWJ Sequence | Components | Code Points | Display |
|---|---|---|---|
| 👨👩👧👦 | 👨 + ZWJ + 👩 + ZWJ + 👧 + ZWJ + 👦 | 7 | Family (father, mother, daughter, son) |
| 👩💻 | 👩 + ZWJ + 💻 | 3 | Woman technologist |
| 🏳️🌈 | 🏳️ + ZWJ + 🌈 | 4 | Rainbow flag |
| 👨🍳 | 👨 + ZWJ + 🍳 | 3 | Man cook |
| 🧑🚀 | 🧑 + ZWJ + 🚀 | 3 | Astronaut |
| ❤️🔥 | ❤️ + ZWJ + 🔥 | 4 | Heart on fire |
| 👩❤️👨 | 👩 + ZWJ + ❤️ + ZWJ + 👨 | 5 | Couple |
Family emoji are among the ZWJ sequences with the highest code point counts. The four-person family "👨👩👧👦" is 7 code points and 25 bytes when encoded in UTF-8. A single emoji consuming the same data as 25 English alphabet characters.
What makes ZWJ sequences interesting is that, in theory, you can attempt to join any two emoji. Entering "🐱 + ZWJ + 🐉" (cat and dragon) - an undefined combination - will not cause an error. Without a matching glyph, the cat and dragon simply appear side by side. The Unicode Consortium has officially defined about 600 ZWJ sequences, though vendors sometimes add their own support.
Flag Emoji - Two Regional Indicator Characters Paint One Flag
🇯🇵 (the flag of Japan) looks like a single emoji, but it is actually a combination of two characters: "Regional Indicator Symbol Letter J" (U+1F1EF) and "Regional Indicator Symbol Letter P" (U+1F1F5). The ISO 3166-1 alpha-2 country code "JP" is expressed using dedicated Unicode characters.
| Flag | Country Code | Regional Indicators | Code Points |
|---|---|---|---|
| 🇯🇵 | JP | 🇯 + 🇵 | U+1F1EF U+1F1F5 |
| 🇺🇸 | US | 🇺 + 🇸 | U+1F1FA U+1F1F8 |
| 🇬🇧 | GB | 🇬 + 🇧 | U+1F1EC U+1F1E7 |
| 🇫🇷 | FR | 🇫 + 🇷 | U+1F1EB U+1F1F7 |
| 🇧🇷 | BR | 🇧 + 🇷 | U+1F1E7 U+1F1F7 |
| 🇰🇷 | KR | 🇰 + 🇷 | U+1F1F0 U+1F1F7 |
There are 26 regional indicator symbols (A through Z), yielding 26 x 26 = 676 theoretical combinations. However, only the roughly 250 country and territory codes registered in ISO 3166-1 actually display as flags. Unregistered combinations (e.g., "🇽🇽") may render as "XX" text or a blank, depending on the platform.
This design embeds a political consideration. The Unicode Consortium avoided the political judgment of "which regions count as countries" by not defining flag emoji directly, instead delegating to the existing international standard ISO 3166-1. If a new country gains independence and is registered in ISO 3166-1, its flag emoji becomes available automatically without any change to the Unicode specification.
As discussed in URL character limits, country codes appear in many corners of the internet. The ccTLD (country code top-level domain) ".jp" is also based on the same ISO 3166-1 country code.
Skin Tone Modifiers - One Emoji in Five Colors
Skin tone modifiers (Emoji Modifiers), introduced in Unicode 8.0 in 2015, change the skin color of human emoji. Five modifier levels are provided, based on the Fitzpatrick scale used in dermatology.
| Modifier | Code Point | Fitzpatrick Type | Example (👋 + modifier) |
|---|---|---|---|
| 🏻 | U+1F3FB | Type I-II (light skin) | 👋🏻 |
| 🏼 | U+1F3FC | Type III (medium-light skin) | 👋🏼 |
| 🏽 | U+1F3FD | Type IV (medium skin) | 👋🏽 |
| 🏾 | U+1F3FE | Type V (medium-dark skin) | 👋🏾 |
| 🏿 | U+1F3FF | Type VI (dark skin) | 👋🏿 |
Adding a skin tone modifier turns a single emoji into 2 code points. "👋" (U+1F44B) is 1 code point, but "👋🏽" is "U+1F44B U+1F3FD" - 2 code points. In UTF-8, that is 4 bytes + 4 bytes = 8 bytes. Specifying a skin tone alone doubles the data size.
When ZWJ sequences and skin tone modifiers are combined, the code point count explodes. For example, a couple emoji with different skin tones, "👩🏻❤️👨🏿," consists of 👩 + 🏻 + ZWJ + ❤️ + VS16 + ZWJ + 👨 + 🏿 - 8 code points. It looks like a single emoji, yet internally it carries roughly the same data as the English phrase "Hi there!" (9 characters).
Emoji Slang - A Hidden Language Born from Combinations
Emoji are used not only with their official meanings but also as slang within user communities. Individually harmless emoji can take on entirely different meanings when combined.
The most famous example is probably 🍑🍆. A peach and an eggplant - food emoji - but on social media they are widely recognized as sexual innuendo. In 2019, Instagram restricted search results for posts containing this combination.
| Emoji Combination | Character Count | Official Meaning | Slang Meaning |
|---|---|---|---|
| 🍑🍆 | 2 chars | Peach and eggplant | Sexual innuendo |
| 🧢 | 1 char | Baseball cap | Lie (cap = to lie) |
| 💀 | 1 char | Skull | Dying of laughter |
| 🐐 | 1 char | Goat | GOAT (Greatest Of All Time) |
| 👁️👄👁️ | 3 chars | Eyes and mouth | Shocked / bewildered face |
| 🫠 | 1 char | Melting face | Embarrassed / flustered |
| 🤡 | 1 char | Clown | Someone who did something foolish |
"👁️👄👁️" is a form of "emoji art" that arranges three emoji to create a face: eye + mouth + eye, expressing an "indescribable expression." This three-character combination went viral on TikTok around 2020, used to convey shock, bewilderment, or a sense of "I just saw something."
Emoji slang varies by generation and region. In Japan, 🙏 is used to mean "please" or "thank you," while in Western countries it is sometimes interpreted as a "high five." As explained in emoji Unicode and character counting, the technical character count of an emoji and the "amount of meaning" a human perceives are on entirely different planes.
How Different Platforms Count Emoji Characters
Emoji character counting varies significantly across platforms. The same emoji posted on Twitter (X) and Instagram consumes a different number of characters.
| Platform | Emoji Counting Method | 👨👩👧👦 Count | 🇯🇵 Count |
|---|---|---|---|
| Twitter (X) | Character count after NFC normalization | 1 char (= 2 chars consumed) | 1 char (= 2 chars consumed) |
| UTF-16 code unit count | 11 chars | 4 chars | |
| LINE | Proprietary counting | 1 char | 1 char |
| SMS | UCS-2 (16-bit) | 7 chars | 2 chars |
| JavaScript | UTF-16 code units | .length = 11 | .length = 4 |
Twitter (X) is relatively generous, treating ZWJ sequence family emoji and flag emoji as a single emoji as they appear visually (though internally each counts as 2 characters). As detailed in Twitter's character limit, within the 280-character limit each emoji counts as 2 characters.
JavaScript's .length property, on the other hand, returns the number of UTF-16 code units, so emoji containing surrogate pairs return a value larger than the visual character count. The family emoji "👨👩👧👦" has a .length of 11. To get an accurate count, you can use Array.from(str).length or [...str].length, but even these decompose ZWJ sequences and return 7. To count by grapheme cluster, use the Intl.Segmenter API.
You may also find the summary of SNS character limits helpful. Knowing how each platform counts characters can save you from "character limit exceeded" headaches when posting emoji-heavy content.
Emoji Shiritori and Movie Title Guessing - Character Counts in Play
Emoji-based games also yield interesting discoveries from a character count perspective.
"Emoji shiritori" is a game where you play the Japanese word chain game using emoji names. 🍎 (ringo / apple) -> 🦍 (gorira / gorilla) -> 🍛 (ramen)... and so on. The rules are simple, but it is surprisingly hard if you do not know the official names. For example, the official name of 🫥 is "Dotted Line Face." Every one of the roughly 3,600 emoji has a localized name defined in Unicode's CLDR (Common Locale Data Repository).
"Guess the movie title from emoji" is another popular game. For instance, "🦁👑" is "The Lion King" (2 characters representing a 14-character title), "👻👻👻🔫" is "Ghostbusters" (4 characters for a 12-character title), and "🧙♂️💍🌋" is "The Lord of the Rings" (3 characters for a 21-character title). Emoji combinations might be considered a kind of "ultra-compressed language" that conveys meaning while drastically reducing character count.
Getting the "True Character Count" of Emoji in Code
For developers, counting emoji characters is a headache because different languages and runtimes return different values.
| Language / Environment | Method | Result for "👨👩👧👦" | Counting Unit |
|---|---|---|---|
| JavaScript | "👨👩👧👦".length | 11 | UTF-16 code units |
| JavaScript | [..."👨👩👧👦"].length | 7 | Code points |
| Python 3 | len("👨👩👧👦") | 7 | Code points |
| Swift | "👨👩👧👦".count | 1 | Grapheme clusters |
| Rust | "👨👩👧👦".len() | 25 | Bytes (UTF-8) |
| Go | len("👨👩👧👦") | 25 | Bytes (UTF-8) |
Only Swift returns "1" because Swift uses grapheme clusters as its unit of character measurement. This is the result closest to human intuition, though it comes at a higher internal processing cost. To get the same result in JavaScript, use Intl.Segmenter.
As explained in Unicode fundamentals, the definition of "character count" varies by context. Emoji combinations bring this issue into the sharpest focus. Just as full-width and half-width character counting differs, remember that emoji counting methods also vary by platform and language.
The Future of Emoji - Limitless Combination Possibilities
As of Unicode 16.0 (2024), the total number of emoji is approximately 3,790. However, when ZWJ sequences and skin tone modifier combinations are included, the expressible variations reach tens of thousands.
In 2024, "directional modifiers" were introduced, allowing the orientation of human emoji to be flipped left or right. Adding a directional modifier to 🏃 (person running) produces 🏃➡️ (person running to the right). This is yet another factor increasing code point counts.
Emoji combinations have dramatically expanded the expressive power of text communication. In everyday chat, you do not need to worry about how many code points make up a single emoji. But when posting to character-limited social media, designing database character fields, or processing strings in code, the gap between "visual character count" and "internal character count" can become an unexpected pitfall.
As also mentioned in the article on LINE message character counts, messages heavy on emoji carry more data than text-only messages. Next time you pick an emoji, it might be fun to imagine just how many code points are hiding behind that single character.
If emoji and Unicode have piqued your interest, you can explore related books on Amazon.