The World's Shortest and Longest Words - Extreme Character Counts Across Languages

8 min read

The chemical name for the protein "titin" is 189,819 characters long. Reading it aloud takes three and a half hours, and it doesn't appear in any English dictionary. Meanwhile, some languages have words that consist of just a single character. Viewing the world's languages through the lens of character count reveals the remarkably diverse ways humans have compressed - and expanded - meaning. This article introduces the shortest and longest words across languages with their specific character counts, diving deep into the surprising world of word length.

The World's Shortest Words

"Words that convey meaning in a single character" exist in more languages than you might expect. English "I" and "a" are universally known examples, but Japanese takes it further. Characters like 目 (eye), 手 (hand), 歯 (tooth), 火 (fire), and 木 (tree) are complete words in a single kanji. Even in hiragana, え (picture), き (tree), and め (eye) function as single-character words.

Chinese goes even further - virtually every character functions as an independent single-character word with complete meaning. Characters like 人 (person), 大 (big), 水 (water), and 山 (mountain) encapsulate entire concepts in one character. This language design of completing meaning in a single character is a fascinating feature when considering the relationship between characters and bytes.

LanguageShortest Word ExamplesChar CountMeaningNotes
EnglishI, a1 charfirst person / oneCapital I is 1 byte
Japanese (kanji)目, 手, 火1 chareye, hand, fire3 bytes in UTF-8
Chinese人, 大, 水1 charperson, big, waterNearly all characters are 1-char words
Korean나 (na)1 charI/me1 Hangul char = 3 bytes in UTF-8
Vietnamese1 charto live1 character with tone mark

The key observation here is that the information density of "1 character" varies enormously across languages. English "a" is 1 byte, but Japanese 目 is 3 bytes in UTF-8. The same "1 character" means 3 times the data for a computer to process. This is the same structure as how fullwidth vs. halfwidth differences affect character counting.

Long Words in European Languages - German's Compound Word Culture

German is called the "king of compound words." Nouns can be concatenated endlessly to create new words, theoretically generating infinitely long words. Ultra-long words actually used in legal and administrative documents are so long that even native German speakers can't read them in one go.

WordChar CountMeaningContext
Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft80 charsAssociation of subordinate officials of the head office management of the Danube steamboat electrical servicesGuinness record (German)
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz63 charsLaw on the delegation of monitoring beef labelingActual law name until 2003
Kraftfahrzeughaftpflichtversicherung36 charsMotor vehicle liability insuranceCommonly used compound word
Rechtsschutzversicherungsgesellschaften39 charsLegal protection insurance companies (plural)Frequent in business documents

In 2013, the German state of Mecklenburg-Vorpommern abolished the 63-character law name "Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz." It was a BSE (mad cow disease) regulation that became unnecessary due to EU regulatory changes. Its abolition made news as "the longest word in German has disappeared."

Finnish also produces long words. "Lentokonesuihkuturbiinimoottoriapumekaanikkoaliupseerioppilas" (61 characters) means "airplane jet turbine engine auxiliary mechanic non-commissioned officer student" and was actually used as a military term.

The World's Longest Place Names - Character Count Rankings

The world of place names also has extreme character count examples. A hill in New Zealand has an 85-character Maori name. Bangkok's official name in Thai is even longer, certified by Guinness as the world's longest capital city name.

Place NameChar CountLocationLanguage
Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu85 charsNew ZealandMaori
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch58 charsWales (UK)Welsh
กรุงเทพมหานคร... (Bangkok official name)168 chars (Thai script)ThailandThai
Chargoggagoggmanchauggagoggchaubunagungamaugg45 charsMassachusetts (USA)Algonquian origin

The New Zealand hill's name means "the place where Tamatea, the man with big knees, who slid, climbed, and swallowed mountains, played his flute to his loved one." Maori culture names places by describing events that occurred there, resulting in extraordinarily long place names.

Conversely, the world's shortest place names are "Å" (1 character) in Norway and "Ö" (1 character) in Sweden. Both are real settlements that appear on maps. Single-character place names can cause problems in URL and database design, troubling developers in the opposite way from URL character limits.

Chemical Names - The Ultimate in Character Count

In chemistry, the IUPAC naming convention names compounds based on their molecular structure, so larger molecules get longer names. The chemical name for the protein "titin" reaches 189,819 characters, known as the "world's longest word."

However, this chemical name doesn't appear in dictionaries, and whether to recognize it as a "word" is debatable. It's a name mechanically generated following IUPAC conventions that nobody uses in daily life. Reading it aloud reportedly takes about 3 hours and 30 minutes, and YouTube videos of people actually reading it exist.

Substance NameChar CountTypeIn Dictionary
Chemical name of titin189,819 charsProteinNo
Methionylthreonylthreonylglutaminylarginyl... (abbreviated)1,185 charsTryptophan synthetaseNo
Pneumonoultramicroscopicsilicovolcanoconiosis45 charsA type of silicosisYes (English dictionaries)
Supercalifragilisticexpialidocious34 charsMovie coinageSome dictionaries

The longest word in English dictionaries is "Pneumonoultramicroscopicsilicovolcanoconiosis" (45 characters), referring to a lung disease caused by inhaling fine volcanic ash particles. However, this word was itself coined in 1935 for the purpose of "creating a long word" and isn't used in medical practice.

The Longest Words in Japanese - Kanji Compounds and Readings

What is the longest compound word in Japanese? In the kanji world, four-character idioms (yojijukugo) are common, but longer compounds exist. Buddhist terminology includes phrases like 南無妙法蓮華経 (7 characters), and legal terminology has long compounds like 不動産登記事項証明書 (10 characters, "real estate registration certificate").

Considering reading length, Japanese words show even more interesting characteristics. Examples of single kanji with long readings include 承る (uketamawaru, 5 syllables, "to humbly receive") and 志 (kokorozashi, 5 syllables, "ambition"). Conversely, 一昨昨日 (sakiototoi, 6 syllables, "three days ago") uses 4 kanji for just 6 syllables, illustrating how character count and syllable count don't align in Japanese.

This "mismatch between character count and information volume" is also why Japanese users can convey more information than English users within X (Twitter) character limits. The density of meaning compressed into a single kanji is fundamentally different from alphabetic languages.

Identifier Length Limits in Programming Languages

Not just natural languages - programming languages also have "word length" limits. Maximum lengths for variable and function names (identifiers) vary by language, creating practical constraints for developers.

LanguageMax Identifier LengthPractical RecommendationNotes
C (C99)63 chars (significant)20-30 charsBeyond 63 chars doesn't cause syntax errors
Java65,535 chars20-40 charsClass file constraint
PythonNo limit20-30 charsPEP 8 recommends brevity
JavaScriptNo limit15-30 charsShortened during minification
SQL (standard)128 charsUnder 30 charsVaries by RDBMS
COBOL30 chars30 charsHistorical constraint

COBOL's 30-character limit dates from its 1959 design. Computers of that era had extremely limited memory, necessitating identifier length restrictions. Modern languages have virtually no limits, but the recommended length for variable and function names of 20-30 characters reflects the limits of human readability.

Information Density Per Character - The Vast Gap Between Languages

Building on the shortest and longest words we've examined, let's compare "information density per character" across languages. From an information theory perspective, the entropy (information content) conveyed by a single character varies dramatically by language.

LanguageWriting SystemAvg Info Per Char (bits)Info in 100 CharsCharacteristics
EnglishAlphabet (26 chars)~4.7 bits~470 bitsSpaces consume character count
Japanese (mixed)Kanji + Hiragana + Katakana~9.5 bits~950 bitsHigh kanji information density
ChineseCharacters (thousands)~11.2 bits~1,120 bitsHighest information density
KoreanHangul (11,172 syllables)~8.3 bits~830 bitsEfficient syllabic writing
ArabicAbjad (28 chars)~5.8 bits~580 bitsCompressed by vowel omission

A single Chinese character carries roughly 2.4 times the information of a single English character. This is because the Chinese character set contains thousands of varieties compared to English's 26 letters. More character types mean more meanings distinguishable per character, increasing information density.

This density difference also affects data compression efficiency. English text has high redundancy and compresses well, while Chinese text is already information-dense and compresses less efficiently. When gzip-compressed, English text shrinks to about 30-40% of its original size, while Chinese text only shrinks to about 50-60%.

Character Limits and Language Fairness

Character limits on social media and forms are typically unified by "character count." However, since information per character varies by language, the same character limit creates significant differences in expressible information across languages.

When X (Twitter) expanded the English character limit to 280 in 2017 while keeping Japanese, Chinese, and Korean at 140, it was a decision accounting for this information density gap. English at 280 characters and Japanese at 140 characters convey roughly equivalent amounts of information.

This cross-language difference is also important when designing database VARCHAR lengths. A field sufficient at 100 characters for English can store equivalent information in 50 characters for Japanese. Multilingual systems need either language-specific character limits or generous margins based on the language requiring the most characters.

What Extreme Character Counts Teach Us

Placing the world's shortest and longest words side by side reveals fundamental differences in language design. Chinese and Japanese kanji evolved toward "compressing meaning into single characters," while German and Finnish evolved toward "concatenating words to express new concepts."

This difference impacts character limits in the digital age. Social media character limits are unified by "character count," but information per character varies enormously by language. Japanese at 140 characters and English at 140 characters differ by 2-3x in expressible information.

Understanding Unicode basics reveals that the definition of "1 character" itself is technically complex. Emoji composition, variation selectors, combining characters - cases where "visual 1 character" and "data 1 character" don't match are countless. Behind the seemingly simple task of character counting lies a deep world of language and technology.

Those interested in linguistics and the history of writing can find related books on Amazon.

Try It with a Character Count Tool

Measuring the words introduced in this article with an actual character counting tool yields fascinating discoveries. How many bytes is the 80-character German compound word in UTF-8? How many characters does the 85-character New Zealand place name expand to when URL-encoded? Experience firsthand how the definition of "1 character" changes with context.

The world's shortest word and the world's longest word. The character count world stretching between them is a mirror reflecting the diversity of language and human creativity. Next time you use a character counting tool, perhaps spare a thought for the history of language and technology behind each counted "character."

Share this article