Consonants


Humans produce sounds by pushing air from the lungs through the oral cavity.  If there is an obstruction, complete or partial, in the mouth, the sound is called a consonant.  We classify the consonants by 1) where the obstuction occurs, 2) the degree of closure produced by the obstruction, 3) whether the nasal passages are open or closed, 4) the constrast between voicing and nonvoicing, and 5) other factors.  What follows is a detailed description of the consonant system of Esperanto.

Stops
 
Consider the /b/ in 'buŝo' = mouth, which is identical to the /b/ in English 'boat'.  To produce this sound, we momentarily stop the flow of air through the oral cavity by pressing together the upper and lower lips, and then we release the blockage.  The resulting turbulence is what the human ear interprets as /b/.  Because the blockage in the oral cavity is complete, we call this sound a stop.  Because the blockage is formed by the two lips, it is called bilabial, and because the vocal chords vibrate during its production, it is said to be voiced.  /b/ is thus a voiced bilabial stop.  The /p/ of  'poŝo' = pocket is produced in the same way except that the vocal chords do not vibrate.  /p/ is an unvoiced bilabial stop.  Generally consonants exist in pairs;  one member of the pair is voiced and other is unvoiced.  (The vocal chords are in the larynx (Adam's apple), and you can feel the vibration by placing your finger on your larynx.)

The /t/ of 'tuŝi' = touch is an unvoiced apico-dental stop in the sense that the apex (tip) of the tongue forms an obstruction against the back surface of the upper teeth.  The /d/ of 'dento' = tooth is the voiced member of the pair.  It is a voiced apico-dental stop.   Note that /t/ and /d/ in English are slightly different in that the tip of the tongue touches not the back surface of the upper teeth, but rather the alveolar ridge, the slight protuberance behind the upper teeth.  In English they are apico-alveolar.  In Spanish they are apico-dental.  The difference is not great, and there is no authoritative rule in Esperanto that says that they are apico-dental rather than apico-alveolar.  I have a preference for apico-dental, but either way is acceptable pronunciation.

The /k/ of 'kisi' = to kiss is an unvoiced dorso-velar stop in the sense that the dorsum or back part of the tongue closes the oral cavity by touching the velum, which is the back part of the palate, which in turn is the whole upper part (roof) of the mouth.  The /g/ of 'legi' = to read is a voiced dorso-velar stop.

We how have 6 stops.  We often shorten the terminology and say that /b/ and /p/ are labial, that /t/ and /d/ are dental, and that /k/ and /g/ are velar.  Another term for stop is plosive.  The obstruction is released suddenly and the term 'plosive' suggest an explosion of air released under pressure.


                     labial      dental       velar

        voiced         b           d            g
      unvoiced         p           t            k


Fricatives

If we do not completely block the passage of air through the oral cavity by a complete closure and allow a small aperture to remain through which air continues to flow, we produce another class of consonants called fricatives.  For example, to produce the /z/ of /uzi/ = to use, the tip of the tongue almost touches (but not quite) the alveolar ridge.  This sound lasts as long as you can continue to push air out of your lungs, and you can very easily feel the vibration of the vocal chords with your finger.  /z/ is a voiced apico-alveolar fricative.  The /s/ of 'sep' = seven is the unvoiced member of the pair.

It is possible to produce bilabial fricatives, but in the languages of the world, labio-dental fricatives are much more common.  To produce the /f/ of 'forta' = strong, the lower lip almost touches the upper teeth, creating a partial obstruction.  /f/ is unvoiced.  The /v/ of 'voĉa' = voiced in the voiced member of the pair.  The /v/ of English is labio-dental, whereas the corresponding sound in Spanish is bilabial.

If you press the tongue against a certain part of the upper surface of the mouth in order to produce a stop and then lower it slightly, the tongue occupies precisely the proper position in order to produce the corresponding fricative.  Thus if you place your tongue against the velum to produce /k/ and then lower it slightly, you will produce the unvoiced velar fricative /ĥ/ of 'eĥo' = echo.  It is possible to produce the voiced counterpart, but this sound does not exist in Esperanto.  The unvoiced velar fricative does not exist in American English.  It is the 'ch' in Scottish 'loch' or in modern German 'hoch'.  It has never been a common sound in Esperanto, and in fact many words which originally had this sound, e.g., 'ĥemio' = chemistry, how have variants with 'k' which have for all practical purposes supplanted the original form.  Today people generally use 'kemio'.  If you have difficulty with this sound, the consequences aren't too great.

If we remember that the stops are bilabial and the fricatives are labio-dental, and if we somewhat loosely use the term 'dental' for apico-dental or apico-alveolar, we can arrange the consonants as follows.


                           labial      dental       velar

         voiced stop         b           d            g
       unvoiced stop         p           t            k

    voiced fricative         v           z
  unvoiced fricative         f           s            ĥ


The /ŝ/ of 'ŝipo/ is a prepalatal fricative in the sense that the front part of the tongue touches the front part of the palate immediately behind the alveolar ridge.  It is unvoiced and identical to English 'sh' in 'ship'.  The 'ĵ' in 'aĵo' = thing is the voiced counterpart and is identical to English 'z' in 'azure' or 's' in 'measure'.


Nasals


In order to pronounce a nasal consonant, you close the mouth as for a stop and allow air to flow through the nasal passages.  The mouth functions as a resonance chamber, and the shape of it determines the resulting sound.  The /m/ of 'fama' = famous is bilabial, and the /n/ of 'nazo' = nose is dental.  Many languages have a velar nasal phoneme.  For example, in English, 'sin' and 'sing' are a minimal pair and are distinguished only by the difference between dental /n/ and velar /ng/ (The proper symbol for this phoneme, using the International Phonetic Alphabet, is /
ŋ/).  Esperanto does not have the phoneme /ng/, but in many languages a nasal consonant before another consonant is produced in the position of the second consonant (this phenomenon is called assimilation), and when in Esperanto a nasal consonant spelled with 'n' occurs before /k/ or /g/, it is generally pronounced like the [ng] in English 'sing'.


/l/ and /r/ and /h/


The /l/ in 'lango' = tongue is a lateral consonant in the sense that the tongue blocks the oral cavity in the middle but allows air to flow out on one or both sides.  To produce the /r/ in 'rido' = laugh, the tip of the tongue touches the front part of the palate and immediately falls.  Such a sound is often called a flap.  It is identical to the Spanish /r/ in 'pera' = pear.  The /h/ is 'hela' = bright is a glottal fricative which is the result of air flowing through the consticted space between the vocal chords.  That space between the vocal chords is called the glottis.


Affricates


We now need to analyze the /c/ in 'certa' = certain.  If the tongue closes the oral cavity behind the teeth so as to produce /t/ and then rapidly releases the blockage, we hear only the stop /t/, although the tongue during a very short period of time is the position of /s/.  If the blockage is released more slowly, we hear the /s/.  /c/ is a combination of /t/ followed by /s/.  Such combinations of a stop followed by a fricative in the same position are called affricates.  /c/ is an unvoiced dental affricate, and is quite common in Esperanto.  Much less common is the combination of /d/ followed by /z/.  We could say that it is the voiced counterpart of /c/, i.e., a voiced dental affricate.  To say that 'dz' represents a voiced phoneme has the disadvantage that it violates the principle that each phoneme is represented by a unique letter in the spelling system of the langauge.  To say that 'dz' is a sequence of the two phonemes /d/ + /z/ has the disadvantage that we treat differently two very similar sounds (the 'c' in 'eco' = property and the 'dz' in 'edzo' = husband) for purely orthographic reasons and that there is an asymmetry in the table of phonemes.  There is no perfect solution, but this is purely theoretical matter and has no practical consequences.

/ĉ/ and /ĝ/ are prepalatal affricates.  /ĉ/ is a combination of an unvoiced prepalatal stop and the unvoiced prepalatal fricative /ŝ/.  Some languages, for example, Hungarian, have an unvoiced (pre)palatal stop.  Hungarian also has the unvoiced (pre)palatal affricate.  Esperanto, like many other languages (English, Spanish), has only the affricate but not the stop.  If we represent the stop by [t'], then /ĉ/ is [t'] + /ŝ/.  /ĝ/ is the voiced counterpart of /ĉ/ and is the combination of a voiced prepalatal stop and the voiced prepalatal fricative /ĵ/.  Hungarian has in the voiced case both the stop and the affricate.  Esperanto has only the affricate.


Conclusion


Thus Esperanto has the following 22 consonant phonemes (or 21 if we omit /dz/):

                         labial      dental     prepalatal     velar     glottal

         voiced stop         b           d                       g
       unvoiced stop         p           t                       k

    voiced fricative         v           z           ĵ
  unvoiced fricative         f           s           ŝ           ĥ          h

    voiced affricate                    dz           ĝ
  unvoiced affricate                     c           ĉ

      nasal (voiced)         m           n

   latteral (voiced)                     l
       flap (voiced)                                 r

Language is a very complex phenomenon which is far from being fully understood.  Linguists create models to explain their empirical observations.  The form of a model is a matter of judgment and may depend on the purpose at hand.  In the above diagram, there is no general agreement as to whether 'dz' should or should not be a phoneme.  When analyzing English linguists generally prefer to consider [ts] as a sequence of the two phonemes /t/ + /s/, whereas in Esperanto the same sound is always regarded as a phoneme.

With only a few modifications the above diagram becomes a model for the system of English consonants.  Remove the dental affricates /c/ and /dz/ and the velar fricative /ĥ/ and add the velar nasal /ng/.  All that remains is to add two interdental fricatives:  the 'th' in 'then', which is voiced, and the 'th' in 'thin', which is unvoiced.  They are called interdental because the tip of the tongue is between the upper and lower teeth.


Updated 8/27/2004