Skip to main content

Element word analysis

Words with chemical elements: what the dictionaries actually reveal

Every word in our dictionaries can be spelled with chemical element symbols. That is not a discovery — it is a design choice. Carbonat's word lists are pre-filtered curated sets: if a word survived into the dictionary, it already passed the element parser. So the question is not “which dictionary words work?” (all of them do), but rather: given that these 85,911 words already work, what is interesting about how they work?

To answer that, I measured element usage patterns, variation counts, and word length distributions across all seven supported languages. The results are more honest than exciting: most words have exactly one valid layout, a handful of single-letter elements do most of the heavy lifting, and 26 of the 118 elements never show up in the parser's preferred split. But every single element does appear somewhere across alternative variants, and the long tail is genuinely interesting once you know where to look.

This page walks through the data in full. I want to be transparent about what the numbers actually say — and equally transparent about where the data pipeline shapes the results before you ever see them. If you are here for the quick version, the summary box below has the headline numbers. If you want the deep read, keep scrolling.

Quick answer

Last updated March 31, 2026

All 85,911 dictionary words across seven languages are elementizable by construction.92 of 118 elements appear in the parser's preferred splits, while the remaining 26 only surface in alternative variants. Oxygen and Iodine each appear in roughly 41% of all words, while the median word has exactly one valid layout. Italian produces the most variation at 2.28 average splits per word; Welsh is the leanest at 1.58.

Dictionary words
85,911
Pre-filtered across all 7 languages
In preferred splits
92 of 118
26 elements only appear in alternative variants
Median variants
1
Half of all words have exactly one valid layout
99th percentile
8 variants
Only the rare tail produces real branching
Indexed words
89,323

Elementizable words in the current stats dataset across all supported languages.

Largest dictionary
14,802

Italian has the most entries, followed by Dutch at 14,554.

Highest avg. variants
2.28

Italian words produce the most alternative splits on average.

Most variation-rich
66 splits

"inasiniscono" in Italian holds the single-word record for valid layouts.

Methodology and caveats

The word dictionaries shipped with Carbonat are not raw language corpora. They are curated, pre-filtered sets: every word in every dictionary has already been verified to have at least one valid element-symbol split. That means the 100% elementizable rate across all seven languages is a property of the data pipeline, not a surprising finding about natural language.

I want to be upfront about that because it changes the kinds of questions worth asking. “Can this word be spelled with elements?” is already answered at index time. The interesting questions are about distribution: which elements carry the most weight, how many alternative splits exist, how word length interacts with variation count, and where the languages diverge from each other.

All numbers on this page come from the same element parser and normalization rules used by the live app. Normalization strips punctuation, digits, and diacritics before matching, so the parser works on cleaned letter sequences. It tries 1-, 2-, and 3-character symbol prefixes recursively, and a word succeeds only if every remaining segment continues to map cleanly to real element symbols.

The seven languages covered are English, Welsh, German, Spanish, French, Italian, and Dutch. Dictionary sizes range from 7,510 (French) to 14,802 (Italian), so per-language percentages are more comparable than raw counts. When I say “an element appears in X% of words,” that percentage is relative to the individual language dictionary, not the pooled total.

One more caveat: the “optimal split” data reflects the parser's preferred path, not all possible paths. When the parser encounters ambiguity — say, “co” at the start of a word — it favours paths that complete successfully. Non-optimal paths are still valid and counted in the variation totals, but the element-usage percentages come from the primary split only. This distinction matters when interpreting which elements “never appear”: an element might exist in an alternative split but still be absent from the optimal one.

Which elements carry the most weight

The global top five are all single-letter symbols: O (Oxygen, 41%), I (Iodine, 41%), N (Nitrogen, 40.1%), S (Sulfur, 35.6%), and C (Carbon, 30.5%). This is not a coincidence: single-letter elements can fill any single-character gap in a word, which makes them the universal connectors of the system. A two-letter symbol like Er needs the next letter to cooperate; O just needs to exist.

The first two-letter symbol to break into the global rankings is Er (Erbium) at 17.4%, which makes sense: “er” is one of the most common letter pairs across European languages — verb endings, agent suffixes, comparatives. After that come Re (Rhenium, 11.9%) and Te (Tellurium, 10.6%) — again reflecting common bigrams rather than any chemical significance. These elements dominate the word data not because they are chemically important, but because their symbols happen to look like normal spelling.

There is an asymmetry worth noting. The 14 single-letter element symbols (B, C, F, H, I, K, N, O, P, S, U, V, W, Y) collectively account for a massive share of all element appearances in the data. But they represent only 12% of the periodic table by count. Meanwhile, the 104 two- and three-letter elements share the remaining slice. The element distribution in words is not a reflection of the periodic table — it is a reflection of orthography.

Single-letter elements (global)

These are the universal gap-fillers. Any position in a word where a single character matches one of these symbols becomes a valid segmentation point.

SymbolElementWords% of all
OOxygen35,23841%
IIodine35,20341%
NNitrogen34,47940.1%
SSulfur30,59535.6%
CCarbon26,19530.5%
HHydrogen16,48819.2%
PPhosphorus14,76417.2%
UUranium13,60715.8%
BBoron12,66614.7%
FFluorine11,19813%
KPotassium7,9289.2%
YYttrium7,2108.4%
VVanadium6,8257.9%

Top two-letter elements (global)

Two-letter symbols succeed when their bigram aligns with natural spelling patterns. The top performers all mirror common letter pairs in European languages.

SymbolElementWords% of all
ErErbium14,92817.4%
ReRhenium10,21211.9%
TeTellurium9,13810.6%
ArArgon8,3959.8%
LaLanthanum7,7129%
RaRadium7,4928.7%
TiTitanium6,6997.8%

Why single-letter dominance matters

The dominance of the 14 single-letter element symbols is the single most important structural fact about element-word creation. Together they act as a flexible scaffold: wherever one of these letters appears in a word, the parser has an instant valid segmentation point. That is why the “elementizable rate” for curated dictionaries is 100%. Once you filter a word list down to entries that pass the parser, the survivors lean heavily on these characters.

The practical consequence is that variation — multiple valid layouts for the same word — comes almost entirely from positions where the parser can choose between a single-letter symbol and a two-letter symbol that starts with the same character. When you see “co” in a word, the parser can read it as C + O or Co (Cobalt), assuming both lead to valid continuations. That fork is the engine behind higher variation counts. Cobalt participates in thousands of alternative splits across the dictionaries, but it never appears in the parser's preferred path because C + O is always tested first.

Consider the word “carbon” as a concrete example. The parser sees C-A-R-B-O-N. At position 0 it matches C (Carbon). At position 1 it matches Ar (Argon). Then B (Boron), O (Oxygen), N (Nitrogen). Five elements, all single-letter or common-bigram. No exotic chemistry needed — just familiar spelling patterns doing the work.

The 26 elements the parser never picks first

Every single one of the 118 elements appears in at least some valid word split across the seven dictionaries. But 26 of them never show up in the parser's preferred path — the first variant it returns. They only appear in alternative splits. That makes them invisible in the default view unless a user explicitly selects a different variant.

Why does this happen? The parser tries 1-character symbol prefixes before 2- and 3-character ones. When a word contains “co,” the parser tests C (Carbon) before Co (Cobalt). If C leads to a valid continuation, the parser takes that path and Cobalt never gets chosen as the first result. Cobalt still participates in an alternative layout — it is a valid split — but it is not the default.

The 26 most interesting cases are elements like In (Indium), which appears in 6,961 words across alternative variants, Co (Cobalt) in 5,095, and Si (Silicon) in 4,014. These elements are not rare or exotic — they are just consistently outcompeted by the single-letter element that shares their first character.

Elements only in alternative variants

These 26 elements appear in valid word splits but never in the parser's first choice. The count shows how many words include them in at least one alternative layout.

SymbolElementWords (alt)
InIndium6,961
CoCobalt5,095
ScScandium4,412
SiSilicon4,014
NiNickel3,982
NoNobelium3,246
OsOsmium3,019
HoHolmium2,442
PoPolonium2,145
BiBismuth1,935
CuCopper1,040
PuPlutonium762
SbAntimony498
NhNihonium421
NbNiobium388

Why the parser's preference matters

The parser tries shorter symbol prefixes first (1-char, then 2-char, then 3-char). When C and Co both lead to valid paths, C wins because it is tested first. This is not random — it is a deterministic preference for shorter symbols. That is why single-letter elements dominate the top-element tables and two-letter symbols like Co, Si, In, Ni, and Sc get pushed into alternative-only territory.

For users, this means the default tile layout on the home page will always use the parser's preferred path. To see layouts that include Cobalt, Silicon, or Indium, a user would need to switch to an alternative variant. The element exists in the word — it just is not the first thing the parser shows.

In a different parser design that preferred longer symbols, the top-element rankings would shift significantly. The absence is algorithmic, not linguistic.

How the seven languages compare

Each language dictionary was built independently, so the sizes and compositions differ. Italian has the largest set at 14,802 words, while French has the smallest at 7,510. These numbers are not representative of total vocabulary size in each language — they reflect curation depth and availability of elementizable entries in the source word lists I used.

The more telling metric is average variants per word. Italian leads at 2.28, meaning the typical Italian word in our dictionary has close to two valid element layouts on average. Welsh is at the other end with 1.58. Only two languages — Italian and German — have a median variant count above one (both at 2), meaning more than half their words have at least two valid splits. In every other language, the median is exactly one.

This difference tracks with orthographic structure. Italian and German both favour longer words with dense vowel-consonant alternation, which creates more fork points for the parser. Welsh, despite having the second-largest dictionary, produces fewer variants per word because its consonant clusters (ll, rh, ch) do not map cleanly to two-letter element symbols, leaving fewer ambiguous positions.

Per-language summary across all seven dictionaries.
LanguageWordsAvg. variantsMedian variantsTop elementTop element %
English13,3371.691O38.7%
Welsh13,9901.581N42.2%
German12,9212.032N54%
Spanish8,7971.791O48.6%
French7,5101.741I44.5%
Italian14,8022.282O67.8%
Dutch14,5541.651N41.5%

Italian and German lead on branching

Italian's average of 2.28 variants per word and German's 2.03 both exceed the global average of 1.83. Italian also holds the highest 99th percentile at 12 variants, and its overall maximum of 66 (from “inasiniscono”) is the highest in any language. Vowel-heavy Italian spelling creates more fork points for the parser at every turn.

Welsh favours different elements

Welsh is the only language where Nitrogen (N) leads at 42.2%, followed by Iodine (I), Carbon (C), and — uniquely — Tungsten (W) and Gold (Au). The prominence of W and Au reflects Welsh orthography: “w” functions as a vowel in Welsh, and “au” is one of the language's most common diphthongs. This makes Welsh the most orthographically distinct language in the dataset.

Oxygen dominates Romance languages

In Italian, Oxygen appears in 67.8% of words — the single highest per-language element percentage in the dataset. Spanish is close at 48.6%. This tracks with how Romance languages end most words in vowels, giving O (and I) constant opportunities to participate in valid splits.

Top elements by language

The global top five (O, I, N, S, C) dominate everywhere, but their relative ordering shifts in ways that reflect each language's orthographic habits. Welsh pushes Nitrogen to first place and introduces Tungsten and Gold into its top ten. Italian and Spanish elevate Oxygen above all others. German and Dutch lean more heavily on Erbium (Er) and Rhenium (Re) thanks to their “-er” and “-re-” word patterns.

The cards below show the top five elements for each language with their appearance rate. Compare them side-by-side to see how spelling conventions shape which elements the parser reaches for.

English

13,337 words

  • O Oxygen38.7%
  • I Iodine35.6%
  • N Nitrogen35.4%
  • S Sulfur35.1%
  • C Carbon28.5%

Welsh

13,990 words

  • N Nitrogen42.2%
  • I Iodine37.8%
  • C Carbon29.2%
  • O Oxygen28.2%
  • Y Yttrium26.7%

German

12,921 words

  • N Nitrogen54%
  • S Sulfur46.7%
  • I Iodine41.4%
  • H Hydrogen39.9%
  • C Carbon33.4%

Spanish

8,797 words

  • O Oxygen48.6%
  • I Iodine37.9%
  • C Carbon35.7%
  • N Nitrogen30.3%
  • S Sulfur28%

French

7,510 words

  • I Iodine44.5%
  • O Oxygen44%
  • N Nitrogen39.1%
  • S Sulfur32.9%
  • C Carbon31.4%

Italian

14,802 words

  • O Oxygen67.8%
  • I Iodine55.9%
  • S Sulfur40.8%
  • C Carbon39.4%
  • N Nitrogen35.4%

Dutch

14,554 words

  • N Nitrogen41.5%
  • S Sulfur38.7%
  • O Oxygen34%
  • I Iodine33.5%
  • Er Erbium25.7%

Words that are element names

A small but satisfying corner of the data: some dictionary words are themselves the names of chemical elements. When “carbon” appears in the English dictionary, the parser does not treat it specially — it just splits it into C + Ar + B + O + N like any other word. The element Carbon does not “recognise” its own name; it is merely one of five symbols that happen to tile the word.

Across all seven languages, I found 15 unique element names present in at least one dictionary. English has the most with 12 matches, while Spanish has none. The pattern is predictable: languages that borrow English scientific terminology directly (like Dutch and English) contain more element names, while languages with their own chemical naming traditions include fewer.

There is a small irony here. “Iron” appears as a word in the English dictionary — split as Ir + O + N — using Iridium, not Iron's own symbol (Fe). “Silicon” is in the word list too, split as S + I + Li + C + O + N — six elements, none of which is Silicon (Si). The parser splits the word successfully, but Si itself only appears in alternative variants because S + I is tested before Si. The words survive; their namesake symbols take the back seat.

Element names found in dictionaries

Each tag links to the home page with the word pre-filled, so you can see the element layout firsthand.

Words that cross language boundaries

These words appear in three or more language dictionaries simultaneously. They tend to be international loanwords or short functional words that survive across orthographic traditions.

  • casino5 languages

    German, Spanish, French, Italian, Dutch

  • in5 languages

    German, English, French, Italian, Dutch

  • oscar4 languages

    Welsh, English, French, Dutch

  • bar3 languages

    Spanish, French, Italian

  • car3 languages

    Welsh, English, French

  • crash3 languages

    German, French, Dutch

  • crisis3 languages

    English, Spanish, Dutch

  • eric3 languages

    Welsh, English, French

  • fiasco3 languages

    Spanish, French, Dutch

  • flora3 languages

    English, Spanish, Dutch

  • francisco3 languages

    Welsh, English, Spanish

  • laura3 languages

    Welsh, English, Dutch

  • no3 languages

    English, Spanish, Italian

  • o3 languages

    Welsh, Spanish, Italian

  • un3 languages

    Spanish, French, Italian

  • was3 languages

    German, English, Dutch

Variation distribution: the median is 1

The global median variant count is 1. That means half of all dictionary words have exactly one valid element-symbol layout — no alternatives, no choices to make. At the 75th percentile it rises to only 2, and the 95th percentile reaches 4. The combinatorial explosion that people imagine — dozens of alternative tile layouts for every word — is not how the system actually behaves for the vast majority of inputs.

The 99th percentile is 8, and the absolute maximum in the dataset is 66 (the Italian word “inasiniscono”). So yes, extreme branching exists, but it is genuinely extreme. The distribution is not a bell curve — it is a cliff with a very long, very thin tail. The standard deviation of 1.54 confirms the skew: most of the population clusters at the low end, and the outliers pull the average up to 1.83.

This has a direct UX implication. For most users typing a single word into the home page, the app will return exactly one layout. That layout is deterministic and reliable, which is good for product confidence. But if you want multiple layout options to compare — different visual rhythms, different element colors — you need to target the long tail deliberately.

Variant count by percentile

The curve is nearly flat until the 75th percentile, then climbs steeply. Italian and German separate from the pack at higher percentiles because their spelling creates more parser fork points.

How to find high-variation words

If you want words with many layout alternatives, look for these patterns:

  • Words containing “co,” “no,” “si,” or “os” — letter pairs where a single-letter element competes with a two-letter symbol for the same starting position.
  • Repeated vowels, especially “o” and “i,” which create multiple independent fork points that multiply combinatorially.
  • Medium-length words (10–16 characters) with dense ambiguity, rather than very long words where the chain must survive end-to-end.
  • Italian and German words in particular, which structurally produce more branching than other languages.

The variation-rich examples below show the kinds of words that sit at the extreme tail of each language's distribution.

Variation-rich words by language

The most variation-rich word in each language, with the number of valid splits. These sit at the extreme tail of the distribution and are useful for demonstrating the system's full branching capability.

English
pneumonoconiosis

48 valid splits

Welsh
cosinau

13 valid splits

German
hinausgeschossen

27 valid splits

Spanish
neumoconiosis

24 valid splits

French
copossession

24 valid splits

Italian
inasiniscono

66 valid splits

Dutch
prinsbisschop

24 valid splits

How long can elementizable words get?

The global median word length is 8 characters, with the 95th percentile at 13 and the 99th at 16. The absolute longest entry is 28 characters — the Dutch word “buitengewonelastenregelingen” — which says something about both Dutch compound-word rules and the tolerance of the element parser for long chains.

German words also trend long, with a median of 9 and a 99th percentile of 19. English words are shorter on average (median 7) but include the famous “supercalifragilistic” at 20 characters. Italian clusters tightly between 7 and 10 characters for the middle 50%, reflecting the regularity of Italian word formation even at longer lengths.

Interestingly, longer words do not automatically produce more variation. A 20-character word can have fewer valid splits than a 10-character word if its letter sequence locks the parser into a single path at each step. The relationship between length and variation is not linear — it depends on the density of ambiguous positions within the word, not the word's total length.

Word length by percentile

German consistently produces the longest successful words, while Welsh and English cluster tighter around shorter lengths. The separation between languages widens dramatically above the 75th percentile.

Length versus variation

Longer words can theoretically support more fork points, but they also introduce more positions where the parser must find a valid continuation. A 20-character word is not twice as likely to have many splits as a 10-character word. In practice, very long words often have only one or two valid layouts because the chain must survive end-to-end without a single dead end.

The words with the most variation tend to be medium-length (10–16 characters) with dense clusters of ambiguous letter pairs. Pure length is less important than structural ambiguity. “inasiniscono” has 66 splits in 12 characters, while “buitengewonelastenregelingen” is 28 characters long but likely has far fewer valid layouts.

Longest successful words by language

The top three longest elementizable words in each language. These represent the extreme upper tail of what the parser can sustain — every character in these words maps to a valid element chain from start to finish.

Are any elementizable dictionary words palindromes?

Yes, but they are rare. Across all seven language dictionaries I found 129 unique palindromic words (after normalization, at least three characters long). These words read the same forwards and backwards and still split into valid element symbols — a double constraint that filters out almost everything.

The longest is reconocer (9 characters, Spanish). The average palindromic word is just 4.0 characters long — much shorter than the general dictionary median of 8 characters. That compression makes sense: the longer a word gets, the harder it is to maintain mirror symmetry while also landing on valid element boundaries at every position.

Most palindromic words are short and structurally simple — three to five letter loops that happen to tile cleanly. But even the short ones are interesting because they demonstrate the intersection of two unrelated constraints: orthographic symmetry and chemical symbol decomposability. Neither constraint knows about the other, so the words that survive both feel like small accidents of language.

Palindromic words by language

How many unique palindromic dictionary words each language contributes. Languages with more flexible vowel-consonant patterns tend to produce more palindromes.

LanguagePalindromes
English48
Dutch46
Welsh25
Spanish24
Italian23
German21
French11

Notable examples

The longest palindromic dictionary words. Each one reads the same forwards and backwards after normalization, and every character maps to a valid element chain.

Pros, cons, and constraints

These points are grounded in the current element-matching rules and API limits, not generic chemistry-word advice.

Pros

  • Single words are the least brittle input type because only one word must succeed, and our 85,911-word dictionary proves the pipeline works at scale.
  • Variant-rich words (up to 66 layouts for a single Italian word) create strong visual experimentation in the designer, letting users compare tile arrangements before committing.
  • Seven-language coverage means the same tool works for English, Welsh, German, Spanish, French, Italian, and Dutch inputs without switching modes.
  • The element-distribution data is grounded in real parser output, not heuristics, so users can trust that what the stats page shows matches what the home page returns.

Cons

  • Most words produce only one valid layout (median = 1), so the "wow" cases with many tile alternatives are rarer than casual users expect.
  • 26 of 118 elements never appear in the parser's preferred split, which limits the visual diversity of default word results — though they do appear in alternative variants.
  • Longer words can look promising but sometimes have fewer variants than shorter ones, because every additional character adds another position where the chain could fail.
  • Dictionary sizes vary significantly between languages (from 7,510 to 14,802), so cross-language comparisons need care.

Hard constraints

  • All dictionaries are pre-filtered: the 100% elementizable rate is a design property, not a finding about natural language. Words that cannot be split into elements were excluded at build time.
  • Normalization removes punctuation, spaces, digits, and diacritics before matching, so the parser treats only the cleaned letter sequence as the real input.
  • Search and autocomplete operate on inputs up to 64 characters, which puts a practical ceiling on exploratory word lookup.
  • The matching engine tries 1-, 2-, and 3-character symbol prefixes recursively, so a word only survives if every remaining segment maps cleanly to real element symbols.

What to do with this

If you want the fastest discovery loop, start on the home page with a short word and watch the element layout update as you refine the input. If you already know the word you want, jump straight into the designer for download and print-ready output. Words with high variation counts are especially worth exploring in the designer, because each split produces a visually distinct tile arrangement.

For name-specific exploration, the names page is the better landing page — it has its own curated data and benchmarks. For raw symbol context, the periodic table closes the loop between language and chemistry. And if you want to push beyond single words, the sentence article and poem article cover what changes when the parser evaluates multiple words in sequence.

The broader word-level statistics and percentile distributions also live on the language info page, which shows the same data in a more tabular, reference-oriented format. If you want the numbers without the editorial commentary, that is the page to bookmark.

Related articles

Follow the adjacent intent when you want a shorter word, a full sentence, or a more constrained poem.