The word dictionaries shipped with Carbonat are not raw language corpora. They are curated, pre-filtered sets: every word in every dictionary has already been verified to have at least one valid element-symbol split. That means the 100% elementizable rate across all seven languages is a property of the data pipeline, not a surprising finding about natural language.
I want to be upfront about that because it changes the kinds of questions worth asking. “Can this word be spelled with elements?” is already answered at index time. The interesting questions are about distribution: which elements carry the most weight, how many alternative splits exist, how word length interacts with variation count, and where the languages diverge from each other.
All numbers on this page come from the same element parser and normalization rules used by the live app. Normalization strips punctuation, digits, and diacritics before matching, so the parser works on cleaned letter sequences. It tries 1-, 2-, and 3-character symbol prefixes recursively, and a word succeeds only if every remaining segment continues to map cleanly to real element symbols.
The seven languages covered are English, Welsh, German, Spanish, French, Italian, and Dutch. Dictionary sizes range from 7,510 (French) to 14,802 (Italian), so per-language percentages are more comparable than raw counts. When I say “an element appears in X% of words,” that percentage is relative to the individual language dictionary, not the pooled total.
One more caveat: the “optimal split” data reflects the parser's preferred path, not all possible paths. When the parser encounters ambiguity — say, “co” at the start of a word — it favours paths that complete successfully. Non-optimal paths are still valid and counted in the variation totals, but the element-usage percentages come from the primary split only. This distinction matters when interpreting which elements “never appear”: an element might exist in an alternative split but still be absent from the optimal one.