Friday, April 22, 2005

The scale of the accidental gap issue

To get a number of possible words in English, we'd have to count all the possible combinations of consonants and vowels that could be created from the set of English sounds. This is kinda hard to do, but we can get a pretty good approximation.

The simplest word is made up of one syllable. Fortunately, English, like all languages, has rules about what can and cannot be a syllable. In English, A syllable consists of at least a vowel (V) which is preceded or followed by one or more consonants. Consonants at the start of the syllable are called onsets (O) and the ones at the end are called codas (C).

If we ignore all the onsets and codas with more than one consonant for simplicity, one syllable template for English looks like (O)V(C). The parentheses around the O and C indicate that the consonants are optional. This template gives us words like


  • eye (V)
  • me (CV)
  • on (VC)
  • cat (CVC)


    There are about 23 different consonants that can be an onset to a syllable. And about 21 consonants that can be a coda to a syllable. There is also the possibility that the syllable has no onset or coda. That gives at total of 24 possible Onsets and 22 possible Codas (assuming 0 is an option).

    Vowels are a little simpler. There are about 7 so-called long vowels that can be in a syllable with or without a coda. English has another 7 or so so-called short-vowels that have to be in a syllable with a coda. For simplicity I'm going to ignore the short vowels.

    So the number of possible single syllable words is more than 3,381.

    (O) V (C)
    23 x 7 x 21 = 3,381

    I say more because, we're ignoring the words that you can make with a short vowel in a closed syllable and any word with more than one cononant in the onset or coda.

    If we further assume that two syllable words can be formed by putting any two single syllables together, the number of those would be 3,381 x 3,381. That's over 11 million! Can that be right?

    One estimate I found for the number of words in English is roughly 1 million. And that's including multi-syllabic scientific words. Oxford Dictionaries estimates the number to be only around 1/4 of a million.

    Could we really be using around 1/10th of the possible words? If so, there is an incredible amount of word space we can use and there should be no need for homonymy.

  • 5 comments:

    boredoom said...

    Oi suggest quee start fwilling en the gaps now!

    Ed Keer said...

    Last week, William Safire was advocating borrowing from other languages to fill what he coined vocabugaps--words we need but don't have. Is he nuts? Look at all the gaps lying around.

    Anonymous said...

    Wouldn't you have to eliminate all the combinations that wind up breaking a grammar rule?

    Like "calenderly" is totally an accidental gap, but in the context of the rules of the english language, it would make no sense. I don't know how many combinations that would eliminate, but basically all the prefixes/suffixes we use like pre, in, dis, mis, un might be removed as choices to create new words. Unless...you were planning on making a new word using the existing rules.

    I think it would be more confusing if we had new words like "unnew" or "pregap" that have definitions not related to any grammar rules. Sort of like how folks are always confusing inflammable. What the heck happened there?

    W.Safire is suggesting nothing that isn't already happening, right? I mean we borrow words from languages all the time. Fajita and assassin come to mind, but I'm sure there are a bunch more that I'm spacing out on. To be honest, I don't think Safire has ever said anything remotely controversial when it comes to linguistics. He's like the John Madden of Linguistic Theory, "You know, to win this 13-7 game, the Giants are going to need a touchdown," is the same as saying "The English language is going to need more words in order to increase its vocabulary."

    Duuuhhh....really Senor Safire? Duuuuuhhhh.

    Ed Keer said...

    The figure I cam up with totally takes into account the grammatical rules for syllables in English. Still it ignores sequences of more than one consonant. So 'calenderly' not in there. It also only counts single syllable and two syllable words. So there's a lot more possible words.

    The suffix issue is interesting. I'm assuming simple words with no suffixes. There is some overlap between words that are complex and ones that are simple. Can't think of any off the top of my head though.

    Ed Keer said...

    For example: inept is not in + ept.

    Site meter

    Search This Blog