Frequently Asked Questions in Acoustics of Speech and Hearing - Part 2

How does the Laryngograph work?

The Laryngograph is a device that monitors the vocal fold activity in the larynx without interfering with the processes of articulation. It does this by measuring the electrical impedance through the neck at the level of the larynx. To measure impedance (resistance to current flow) the Laryngograph has two guard-ring electrodes that are placed on the skin on either side of the larynx. A small high-frequency voltage is applied to the centre of one electrode and the circuit is completed using the centre of the other electrode. The use of a high frequency current, and the presence of earthed guard rings ensures that the current flows through the neck rather than across the skin.

The current flow has a (relatively!) large DC (direct current) component, which is not very informative, so the Laryngograph isolates the AC (alternating current) component at frequencies between about 20Hz and 2000Hz. These changes in current must be due to changes in the impedance of the neck at these frequencies. The only mechanism which causes such relatively high frequency changes in impedance is the vibration of the vocal folds in the larynx during phonation. The AC component is significantly amplified to a useful size, which may be recorded on a tape recorder or displayed on an oscilloscope.

Studies with the laryngograph and a simultaneous fibrescope imaging system have shown the relationship between the phases of the Laryngograph waveform (Lx) and the phases of vocal fold vibration. With the vocal folds apart, current flow is at a minimum. As the vocal folds snap together within a normal cycle, the current flow rises rapidly, indicating that it is degree of vocal fold contact that most affects the measured impedance. During the vocal fold closed phase, the current flow rises to a maximum, and as the vocal folds peel apart, the current slowly falls again.

What are the characteristics of modal voicing?

Modal or typical voicing corresponds to a regular, non-breathy voice quality used in everyday communication situations. It is characterised by sharp vocal closures producing effective excitation of the supra-laryngeal resonances. It usually has regular closures giving a reliable indication of pitch. It usually has complete closures, along the full length of the folds, preventing air-escape and turbulence during the closed phase, and wide opening preventing turbulence in the open phase. The closed phase duration is often quite long, typically 60-70% of the total cycle. This long closed phase allows the vocal tract resonances to ring for a long time without damping.

What are the characteristics of breathy voicing?

Breathy voicing, associated with confidential or intimate communication situations, is characterised by air escape during voicing, usually by incomplete closure of the vocal folds. The vocal folds are connected to a single point at the front of the larynx, on the thyroid cartilage. However, at the back, they are connected to separate arytenoid cartilages. The arytenoids can rotate and swivel to draw the vocal folds across the air-way and to tension them. If the arytenoids fail to close off the air way properly, then a gap can occur at the back through which air can escape even in the vocal fold cycle closed phase. The presence of the gap creates a narrowing through which the air-flow can become turbulent, generating a noise signal, which in combination with a weaker closing pulse is indicative of breathy voice. A very short closed phase is also associated with breathy voice, perhaps 30-40% of the total voicing cycle duration.

What are the characteristics of creaky voice?

Creaky voice, which is often found in phrase-final positions at the low end of a speaker's pitch range, is characterised by irregularity of cycle-to-cycle duration. Typically creaky voice occurs when the vocal folds are tightly approximated but weakly tensed. The tight approximation leads to very low air-flow, and very long closed phase durations, perhaps 80% or more. The slackness of the folds seems to disturb the normal mode of vibration of the folds, causing irregularities in cycle duration. One common style of creaky voice has alternating strong and weak cycles with long and short durations. This is called diplophonia, and perceptually appears as a kind of double pitch.

What are the characteristics of falsetto voice?

Falsetto voice is characterised by a very high fundamental frequency and a rather weak energy. The vocal folds are at a very high tension, and rather rigid, so that only the edges of the folds can actually vibrate. This leads to a much reduced mass of fold involved in vibrating and very short vibration cycles. The high tension also reduced the effectiveness of closure, and the closed phase and open phase are approximately equal.

What are Lx, Tx, Fx, Dx, Cx, etc?

These terms have originated at University College London to describe the various graphs related to the use of the Laryngograph for the analysis of voice.

  • Lx is the name give to the current-flow waveform generated by the Laryngograph. An Lx waveform has a vertical axis representing current flow through the larynx, which is related to vocal fold contact area.
  • Tx is the name given to the sequence of pitch period durations that can be generated from the Lx waveform. Tx data is used as the basis for the calculation of instantaneous fundamental frequency (= the fundamental frequency associated with a single voicing cycle), which is used to generate fundamental frequency distributions.
  • Fx is the name give to the graph of fundamental frequency against time. You will see this described as F0 elsewhere in the literature. We prefer the term Fx because it reminds us that this is the "frequency of excitation" to the vocal tract, and is not to be confused with F1, F2, etc which are the resonant frequencies of the vocal tract.
  • Dx is the name given to distributions of fundamental frequency, that is histograms of fundamental frequency usage. These histograms tell us how much time a speaker spends at each fundamental frequency. From these we can estimate his modal frequency (= most commonly used frequency) and his fundamental frequency range (typically the range in which a speaker spends 90% of his time voicing). Sometimes we distinguish first-order histograms (Dx1) which include all voicing cycles, to second-order histograms (Dx2) which only include voicing cycles occurring in regular speech. The difference between Dx1 and Dx2 allows us to quantify the amount of irregularity in the speech.
  • Cx is the name given to a kind of scatterplot graph in which adjacent pairs of Tx values are plotted against one another on a frequency scale. This graph shows us the degree of irregularity in the voicing. In regular voicing, the scattering of points on the Cx graph is along the diagonal, whereas for irregular voicing, many points occur off the diagonal.

What is the best way to measure the "average" fundamental frequency?

There are basically three ways to obtain an average from a probability distribution: use the mean, the median or the mode. Distributions of fundamental frequency have some odd characteristics which affects the decision of which of these is most useful. Among these are:

  • Perception of fundamental frequency is known to be related more closely to the logarithm of Fx rather then linearly in Hertz. Thus should we plot Fx or log(Fx) on our histogram?
  • Usage of fundamental frequency could be measured in terms of the number of vocal fold cycles used by a speaker at each frequency, or by the total amount of time spent by the speaker at each frequency.
  • Not all speech is voiced, and there are regions where the voicing is starting up or stopping which may not be typical of normal vibration.
  • Instruments for measuring fundamental frequency are prone to measurement error: pitch halving and pitch doubling being common. Even the laryngograph has poor performance on some speakers.
  • Some speakers use a great deal of creakiness in their phonation, and this can give odd fundamental frequency values.
  • Fundamental frequency distributions can often be far from normally distributed (Gaussian shaped), with many outlier values, and sometimes more than one peak.

Together, these considerations suggest that the mode is the most useful measure. It is unaffected by the log/linear consideration or the shape of the distribution. Its weakness is for distributions with more than one peak. These should be documented specially. Both the mean and the median can be strongly affected by the odd shape of distributions.

What is the best way to measure the "range" of fundamental frequency?

There are basically three ways to measure the breadth of a distribution: the standard deviation, the inter quartile range, and the total range. Distributions of fundamental frequency have some odd characteristics, some of which are listed in the answer to the last question. The fact that the distribution often has a large number of outliers means that the use of the standard deviation is not satisfactory: it would give values which are much broader than the truth. Similarly, the total range is only set by two values from a distribution containing possibly thousands of values: the very highest and the very lowest. Thus the total range is also unsatisfactory. Thus measures based on percentiles, like the inter quartile range seem to be our best bet.

It is worth asking ourselves what we require of a measure of range? We want a measure that is reliable in the sense that if we repeat the measure on a different recording of the same speaker we would hope to get a similar answer. On the other hand, we want a measure that is sensitive to differences in fundamental frequency use: between one speaker and another, between one style of text and another, before and after therapy, etc. Thus we have to come to some compromise. At UCL we have settled on the 90% range as our preferred measure. This is the range of fundamental frequency that the speaker stays within 90% of the time (of his voiced speech). Not only is this measure fairly reliable, it is also easy to understand. The 90% range discards 5 percentiles of the distribution at the top and the bottom, making it less sensitive to outliers. On the other hand, the measure does not deal adequately with very irregular voicing. It may be better to use the second order Dx in these circumstances.

What are the sources of variability in vowel production?

The acoustic realisations of phonoligical vowels vary for a large number of reasons. We shouldn't expect the signal we pick up from the microphone to be identical for every vowel in every environment on every occasion by every speaker.

One way to classify types of variability is to consider the speech chain story of vowel production:

  1. Phonological. The phonological specification of the vowel can change from speaker to speaker: particularly if they have different accents. Speakers of a General American accent have a different set of open back vowels to RP speakers. The distribution of segments across lexical items can vary too, for example for Southern and Northern pronunciations of "bath".
  2. Phonetic. The realisation of the phonological vowel by the articulators will depend on the phonetic context: what adjacent articulations need to be made; on the position of the segment in the prosodic structure: what duration and pitch are required; and on the particular gestural preferences of the speaker.
  3. Acoustic-Phonetic. The filtering of the sound source generated by the larynx depends on the configuration and size of the vocal tract. Adults and children have different size tracts as well as different size larynxes. Larger cavities lead to lower formant frequencies. Larger larynxes lead to lower fundamental frequencies.
  4. Acoustics. Between the sound being generated and the sound being picked up by the microphone, the sound can be altered by background noise and by other "channel" effects such as reverberation, or telephone line distortion. Also note that speakers can change their style of speaking in noisy surroundings.

Finally, remember that the articulators are not precise mechanical devices and you should expect random variation from repetition to repetition.

What is "Locus Frequency"?

Formant transitions from vowels into obstruents, or from obstruents into vowels vary in shape depending on the formant frequencies characteristic of the vowel and on the place and manner of the consonant. Human listeners appear to be able to use these formant transitions to identify the place and manner of the consonant, even when other aspects of the spectrographic pattern of the consonant are missing or ambiguous. But since the transitions are different in shape before different vowels (the formant trajectory must have one end rooted at the vowel formant frequencies), what is it about the transition that informs the listener about the consonant? The hypothesis is that it is the frequency that each formant transition is heading towards as a obstruction is made, or the frequency from which the transition comes as the obstruction is released that is important. This frequency seems to be characteristic of the consonantal place and manner, and appears to be roughly the same in different vowel contexts. Thus each formant for each consonant has a 'target' frequency which the listener can use to help identification of the consonant and which is the 'locus' of all formant transitions.

In voiceless sounds, what happens to vocal fold vibration?

In voiceless sounds there is no vocal fold vibration. Unlike in voicing, where the arytenoids bring the folds together to begin a cycle of vibration, the folds are left open during voiceless sounds so that the air from the lungs can pass through freely. Because there is no subglottal pressure increase caused by bringing the folds together, the airstream passing through the open folds in voiceless sounds is not as fast-flowing as it is for voiced sounds, and thus no Bernouilli Effect takes place. Because there is no vocal fold vibration during voiceless sounds, we often say that voiceless sounds give us no sensation of pitch.

Can a speech vibration be random?

Certainly. When structures above the larynx are involved in setting the airstream into vibration, these vibrations are characterised as aperiodic, or random. The two main types of random vibration are turbulence and transience. We associate turbulence with the class of sounds known as the fricatives. When the airstream from the lungs moves rapidly through a narrow constriction in the vocal tract appropriate for a fricative, it becomes turbulent. Turbulence is a series of random vibrations. These random vibrations can additionally strike an object in their path (such as the teeth) and will set whatever air is in front of them into vibration. Transience is a form of random vibration normally associated with the class of sounds called stops or plosives. Because stops are made with a complete closure somewhere in the vocal tract, the release of this closure is associated with an audible burst of sound because pressure has built up behind the closure. This audible burst is sharp and short (hence the term ‘transient’) and will vibrate whatever body of air is in front of it. Both turbulence and transience are characterised as random because they do not have a clear, repetitive structure characteristic of the periodic vibration at the vocal folds.

What are the applications of the laryngograph in the clinic?

The laryngograph is a non-invasive device used for examining vocal fold vibration. It consists of two electrodes that are placed on either side of the thyroid cartilage, over each of the vocal folds. A current is passed from one electrode to the other and the amount of impedance is measured. If the vocal folds are closed, the current flows freely as it has an easy route through soft tissue. If the vocal folds are open, the current flow is much more impeded, as it must find an alternative path round the arytenoids in order for it to pass from one electrode to the other. The amount of current flow can then be plotted on a graph. This enables some of the characteristics of the client’s vocal fold vibration to be observed - such as the ratio of opening to closing duration of the folds, how rapidly each of these occur, and how regular the vibrations are. The laryngograph technique is designed as a supplement to other, more invasive techniques. It provides a quick an easy way of assessing the patterns of vibration at the vocal folds. Of course, for more detailed analysis of the vocal folds, other techniques would have to be used.

What is the relationship between F1 & F2 frequencies and the articulation of vowels?

We observe by experiment that the major part of perceived vowel quality can be explained with a system comprising only two resonances. Furthermore it turns out that it is only the frequency of these resonances that change much with different articulations (their bandwidth is fairly constant). Thus we should be able to find some regularities between the frequency of F1 and F2 and the articulation of vowels.

Bearing in mind always that such regularities are only approximations to the truth, we can see that when a neutral vowel is articulated, the formants are fairly equally spaced apart (F1=500, F2=1500, F3=2500, etc). If we shift the tongue forward, we find that F2 rises in frequency. Since we know that the resonant frequencies of cavities gets higher as they get smaller, it makes sense to ask which cavity is getting smaller when we move front from a neurtal vowel? The answer must be the cavity at the front of the mouth. Similarly if we move from a neutral vowel to an open vowel, we find that F1 rises in frequency. F1 must be associated with a cavity that gets smaller as we make a more open articulation. In this case we observe that the pharyngeal cavity behind the tongue hump gets smaller with more open articulations (because with an open articulation the tongue is depressed in the mouth).

Thus a 'rule of thumb' is that F1 is associated with the vocal tract cavity behind the tongue hump, while F2 is associated with the cavity in front of the tongue hump.

Copyright © 2023 Mark Huckvale

Last modified: 16:10 06-Jun-2010