Shortest & Longest Words Across Languages

The World's Shortest and Longest Words - Extreme Character Counts Across Languages

8 min read

The chemical name for the protein "titin" is 189,819 characters long. Reading it aloud takes three and a half hours, and it doesn't appear in any English dictionary. Meanwhile, some languages have words that consist of just a single character. Viewing the world's languages through the lens of character count reveals the remarkably diverse ways humans have compressed - and expanded - meaning. This article introduces the shortest and longest words across languages with their specific character counts, diving deep into the surprising world of word length.

The World's Shortest Words

"Words that convey meaning in a single character" exist in more languages than you might expect. English "I" and "a" are universally known examples, but Japanese takes it further. Characters like 目 (eye), 手 (hand), 歯 (tooth), 火 (fire), and 木 (tree) are complete words in a single kanji. Even in hiragana, え (picture), き (tree), and め (eye) function as single-character words.

Chinese goes even further - virtually every character functions as an independent single-character word with complete meaning. Characters like 人 (person), 大 (big), 水 (water), and 山 (mountain) encapsulate entire concepts in one character. This language design of completing meaning in a single character is a fascinating feature when considering the relationship between characters and bytes.

Language	Shortest Word Examples	Char Count	Meaning	Notes
English	I, a	1 char	first person / one	Capital I is 1 byte
Japanese (kanji)	目, 手, 火	1 char	eye, hand, fire	3 bytes in UTF-8
Chinese	人, 大, 水	1 char	person, big, water	Nearly all characters are 1-char words
Korean	나 (na)	1 char	I/me	1 Hangul char = 3 bytes in UTF-8
Vietnamese	ở	1 char	to live	1 character with tone mark

The key observation here is that the information density of "1 character" varies enormously across languages. English "a" is 1 byte, but Japanese 目 is 3 bytes in UTF-8. The same "1 character" means 3 times the data for a computer to process. This is the same structure as how fullwidth vs. halfwidth differences affect character counting.

Long Words in European Languages - German's Compound Word Culture

German is called the "king of compound words." Nouns can be concatenated endlessly to create new words, theoretically generating infinitely long words. Ultra-long words actually used in legal and administrative documents are so long that even native German speakers can't read them in one go.

Word	Char Count	Meaning	Context
Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft	80 chars	Association of subordinate officials of the head office management of the Danube steamboat electrical services	Guinness record (German)
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz	63 chars	Law on the delegation of monitoring beef labeling	Actual law name until 2003
Kraftfahrzeughaftpflichtversicherung	36 chars	Motor vehicle liability insurance	Commonly used compound word
Rechtsschutzversicherungsgesellschaften	39 chars	Legal protection insurance companies (plural)	Frequent in business documents

In 2013, the German state of Mecklenburg-Vorpommern abolished the 63-character law name "Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz." It was a BSE (mad cow disease) regulation that became unnecessary due to EU regulatory changes. Its abolition made news as "the longest word in German has disappeared."

Finnish also produces long words. "Lentokonesuihkuturbiinimoottoriapumekaanikkoaliupseerioppilas" (61 characters) means "airplane jet turbine engine auxiliary mechanic non-commissioned officer student" and was actually used as a military term.

The World's Longest Place Names - Character Count Rankings

The world of place names also has extreme character count examples. A hill in New Zealand has an 85-character Maori name. Bangkok's official name in Thai is even longer, certified by Guinness as the world's longest capital city name.

Place Name	Char Count	Location	Language
Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu	85 chars	New Zealand	Maori
Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch	58 chars	Wales (UK)	Welsh
กรุงเทพมหานคร... (Bangkok official name)	168 chars (Thai script)	Thailand	Thai
Chargoggagoggmanchauggagoggchaubunagungamaugg	45 chars	Massachusetts (USA)	Algonquian origin

The New Zealand hill's name means "the place where Tamatea, the man with big knees, who slid, climbed, and swallowed mountains, played his flute to his loved one." Maori culture names places by describing events that occurred there, resulting in extraordinarily long place names.

Conversely, the world's shortest place names are "Å" (1 character) in Norway and "Ö" (1 character) in Sweden. Both are real settlements that appear on maps. Single-character place names can cause problems in URL and database design, troubling developers in the opposite way from URL character limits.

Chemical Names - The Ultimate in Character Count

In chemistry, the IUPAC naming convention names compounds based on their molecular structure, so larger molecules get longer names. The chemical name for the protein "titin" reaches 189,819 characters, known as the "world's longest word."

However, this chemical name doesn't appear in dictionaries, and whether to recognize it as a "word" is debatable. It's a name mechanically generated following IUPAC conventions that nobody uses in daily life. Reading it aloud reportedly takes about 3 hours and 30 minutes, and YouTube videos of people actually reading it exist.

Substance Name	Char Count	Type	In Dictionary
Chemical name of titin	189,819 chars	Protein	No
Methionylthreonylthreonylglutaminylarginyl... (abbreviated)	1,185 chars	Tryptophan synthetase	No
Pneumonoultramicroscopicsilicovolcanoconiosis	45 chars	A type of silicosis	Yes (English dictionaries)
Supercalifragilisticexpialidocious	34 chars	Movie coinage	Some dictionaries

The longest word in English dictionaries is "Pneumonoultramicroscopicsilicovolcanoconiosis" (45 characters), referring to a lung disease caused by inhaling fine volcanic ash particles. However, this word was itself coined in 1935 for the purpose of "creating a long word" and isn't used in medical practice.

The Longest Words in Japanese - Kanji Compounds and Readings

What is the longest compound word in Japanese? In the kanji world, four-character idioms (yojijukugo) are common, but longer compounds exist. Buddhist terminology includes phrases like 南無妙法蓮華経 (7 characters), and legal terminology has long compounds like 不動産登記事項証明書 (10 characters, "real estate registration certificate").

Considering reading length, Japanese words show even more interesting characteristics. Examples of single kanji with long readings include 承る (uketamawaru, 5 syllables, "to humbly receive") and 志 (kokorozashi, 5 syllables, "ambition"). Conversely, 一昨昨日 (sakiototoi, 6 syllables, "three days ago") uses 4 kanji for just 6 syllables, illustrating how character count and syllable count don't align in Japanese.

This "mismatch between character count and information volume" is also why Japanese users can convey more information than English users within X (Twitter) character limits. The density of meaning compressed into a single kanji is fundamentally different from alphabetic languages.

Identifier Length Limits in Programming Languages

Not just natural languages - programming languages also have "word length" limits. Maximum lengths for variable and function names (identifiers) vary by language, creating practical constraints for developers.

Language	Max Identifier Length	Practical Recommendation	Notes
C (C99)	63 chars (significant)	20-30 chars	Beyond 63 chars doesn't cause syntax errors
Java	65,535 chars	20-40 chars	Class file constraint
Python	No limit	20-30 chars	PEP 8 recommends brevity
JavaScript	No limit	15-30 chars	Shortened during minification
SQL (standard)	128 chars	Under 30 chars	Varies by RDBMS
COBOL	30 chars	30 chars	Historical constraint

COBOL's 30-character limit dates from its 1959 design. Computers of that era had extremely limited memory, necessitating identifier length restrictions. Modern languages have virtually no limits, but the recommended length for variable and function names of 20-30 characters reflects the limits of human readability.

Information Density Per Character - The Vast Gap Between Languages

Building on the shortest and longest words we've examined, let's compare "information density per character" across languages. From an information theory perspective, the entropy (information content) conveyed by a single character varies dramatically by language.

Language	Writing System	Avg Info Per Char (bits)	Info in 100 Chars	Characteristics
English	Alphabet (26 chars)	~4.7 bits	~470 bits	Spaces consume character count
Japanese (mixed)	Kanji + Hiragana + Katakana	~9.5 bits	~950 bits	High kanji information density
Chinese	Characters (thousands)	~11.2 bits	~1,120 bits	Highest information density
Korean	Hangul (11,172 syllables)	~8.3 bits	~830 bits	Efficient syllabic writing
Arabic	Abjad (28 chars)	~5.8 bits	~580 bits	Compressed by vowel omission

A single Chinese character carries roughly 2.4 times the information of a single English character. This is because the Chinese character set contains thousands of varieties compared to English's 26 letters. More character types mean more meanings distinguishable per character, increasing information density.

This density difference also affects data compression efficiency. English text has high redundancy and compresses well, while Chinese text is already information-dense and compresses less efficiently. When gzip-compressed, English text shrinks to about 30-40% of its original size, while Chinese text only shrinks to about 50-60%.

Character Limits and Language Fairness

Character limits on social media and forms are typically unified by "character count." However, since information per character varies by language, the same character limit creates significant differences in expressible information across languages.

When X (Twitter) expanded the English character limit to 280 in 2017 while keeping Japanese, Chinese, and Korean at 140, it was a decision accounting for this information density gap. English at 280 characters and Japanese at 140 characters convey roughly equivalent amounts of information.

This cross-language difference is also important when designing database VARCHAR lengths. A field sufficient at 100 characters for English can store equivalent information in 50 characters for Japanese. Multilingual systems need either language-specific character limits or generous margins based on the language requiring the most characters.

What Extreme Character Counts Teach Us

Placing the world's shortest and longest words side by side reveals fundamental differences in language design. Chinese and Japanese kanji evolved toward "compressing meaning into single characters," while German and Finnish evolved toward "concatenating words to express new concepts."

This difference impacts character limits in the digital age. Social media character limits are unified by "character count," but information per character varies enormously by language. Japanese at 140 characters and English at 140 characters differ by 2-3x in expressible information.

Understanding Unicode basics reveals that the definition of "1 character" itself is technically complex. Emoji composition, variation selectors, combining characters - cases where "visual 1 character" and "data 1 character" don't match are countless. Behind the seemingly simple task of character counting lies a deep world of language and technology.

Those interested in linguistics and the history of writing can find related books on Amazon.

Try It with a Character Count Tool

Measuring the words introduced in this article with an actual character counting tool yields fascinating discoveries. How many bytes is the 80-character German compound word in UTF-8? How many characters does the 85-character New Zealand place name expand to when URL-encoded? Experience firsthand how the definition of "1 character" changes with context.

The world's shortest word and the world's longest word. The character count world stretching between them is a mirror reflecting the diversity of language and human creativity. Next time you use a character counting tool, perhaps spare a thought for the history of language and technology behind each counted "character."

The World's Shortest and Longest Words - Extreme Character Counts Across Languages

The World's Shortest Words

Long Words in European Languages - German's Compound Word Culture

The World's Longest Place Names - Character Count Rankings

Chemical Names - The Ultimate in Character Count

The Longest Words in Japanese - Kanji Compounds and Readings

Identifier Length Limits in Programming Languages

Information Density Per Character - The Vast Gap Between Languages

Character Limits and Language Fairness

What Extreme Character Counts Teach Us

Try It with a Character Count Tool

Share this article

Related Articles

Full-Width vs Half-Width Character Counting

Unicode: A Beginner's Encoding Guide

Characters vs. Bytes: UTF-8 Encoding Guide

AI Prompt Character Limits and Engineering

Amazon Listing Character Limits Guide

API Response Length Design Guide