College of Santa Fe Auditory Theory

Lecture 006 - Hearing I


2.0 Introduction to Hearing

  1. The anatomy of the hearing system
  2. Outer ear function
  3. Middle ear function
    1. Ossicles
  4. Inner ear function
    1. Cochlea
    2. Auditory Nerve
    3. The Organ of Corti
    4. Basilar Membrane
  5. The Place Theory
  6. Sharpening Pitch Perception
  7. Critical Bands
  8. Brain Bullets

Psychoacoustics is the study of how humans perceive sound. We need to understand

2.1 The anatomy of the hearing system 66

Sound pressure waves are funneled through the acoustic meatus of the external ear (E) to the tympanic membrane (in green), and mechanically transduced through the middle ear (M) into a fluid-filled chamber called vestibulus. The inner ear (I) organs branch off from this chamber.


The ear consists of three parts


2.1.1 Outer ear function

The outer ear consists of an external flap of tissue known as the pinna with many grooves ridges and depressions. These serve to create a pattern of reflections and delays which are of greate assistance in localizing sounds. These patterns are unique to every person and if a test subject is asked to listen through a model of another person's ear his ability to localize sound will drop markedly. Each brain learns the signals for each ear by experience.

The depression at the entrance to the auditory canal is called the concha which assists us in identifying the location of sounds that are in front of or behind us and to some extent above and below us. The auditory canal itself is between 25 and 35 mm long between the concha and the tympanic membrane. This means that thee auditory canal acts as a mixed boundary air column (closed at one end and open at the other) with a resonant frequency around 4 kHz. This is a very critical factor in explaining the hot spots and weaknesses of human hearing.

The tympanic membrane itself is comprised of three layers:

2.1.2 Middle ear function

The function of the middle ear is two fold:

The tympanic membrane or "eardrum" receives vibrations traveling up the auditory canal and transfers them through the tiny ossicles to the oval window, the port into the inner ear. The eardrum is some 13 times larger than the oval window, giving an amplification of about 13 compared to the oval window alone.



The effective pressure acting on the oval window is arranged by mechanical means to be greater than that acting on the tympanic membrane. This is to overcome the higher resistance to movement of the cochlear fluid compare to that of air. The ossicles act as a mechanical impedance converter by using

The three tiniest bones in the body form the coupling between the vibration of the eardrum and the forces exerted on the oval window of the inner ear. With a long enough lever, you can lift a big rock with a small applied force on the other end of the lever. The amplification of force can be changed by shifting the pivot point.

The ossicles can be thought of as a compound lever which achieves a multiplication of force. This lever action is thought to achieve an amplification by a factor of about three under optimum conditions, but can be adjusted by muscle action to actually attenuate the sound signal for protection against loud sounds. These muscles are called the tensor timpani and the stapedius muscle. These muscles contract automatically in response to sounds with levels greater than approximately 75 db and they have the effect of increasing the impedence of the middle ear by stiffening the ossicular chain. This reduces the efficiency with which vibrations are transmitted from the tympanic membranes to the to the inner ear. Approximately 12 to 14 db of attenuation is provided my these muscles but only for frequencies below 1kHz. This effect is known as the acoustic reflex and it takes between 60ms and 120ms to activate.


The vibration of the eardrum is transmitted to the oval window of the inner ear by means of the ossicles, which achieve an amplification by lever action. The lever is adjustable under muscle action and may actually attenuate loud sounds for protection of the ear.

CAUTION: The acoustic reflex, although designed for our protection can today be our own worst enemy. In the ear headphones and modern amplification systems have no trouble generating sound pressure levels in excess of 75dB. What happens then is that the acoustic reflex kicks in and the music seems too quiet to the listener who bumps it up. At this point the signal which was 75dB has been bumped up to 90dB and is starting to do significant hearing damage. You should always listen to music at the quietest level that allows you to hear the parts.

2.1.3 Inner ear function

The cochlea

The inner ear consists of the snail-like structure known as the cochlea. The function of the cochlea is to convert mechanical vibrations into nerve firings to be processed eventually by the brain. Mechanical vibrations reach the cochlea at the oval window via the stapes footplate of the middle ear. The cochlea consists of a tube coiled into a spiral with approximately 2.75 turns. The end with the oval and round windows is the 'base' and the other end is the 'apex'. The tube is divided into three sections by Reissner's membrane and the basilar membrane. The outer channels, the scala vestibuli and scala tympani , are filled with an incompressible fluid known as perilymph, and the inner channel is the scala media. The scala vestibuli terminates at the oval window and the scala tympani at the round window. There is a small hole at the apex known as the helicotrema through which the perilymph fluid can flow. Input acoustic vibrations result in a piston-like movement of the stapes footplate at the oval window which moves the perilymph fluid within the cochlea. The membrane covering the round window moves to compensate for oval window movement since the perilymph fluid is essentially incompressible. Inward movements of the stapes footplate at the oval window cause the round window to move outwards and outward movement of the stapes footplate cause the round window to move inwards. These movements cause travelling waves to be set up in the scala vestibuli which displace both Reissner's membrane and the basilar membrane.

Soon after the discovery of the hearing organ by Alfonso Corti in 1851, other anatomists, among which Reissner, Deiters, Lowenberg, Hensen, Hasse, contributed to the clarification of the cochlear structure. The first relevant contribution to the understanding of the processes underlying sound transduction by the ear is due to Hermann von Helmholtz. The eminent German scientist pointed out the function of the ossicle chain in the middle ear. In an exemplar description, von Helmoltz explained that the mechanical coupling provided by the middle ear optimizes the transfer of energy from the air that sets the tympanic membrane into motion, to the fluid filling the cochlea.


The inner ear structure called the cochlea is a snail-shell like structure divided into three fluid-filled parts. Two are canals for the transmission of pressure and in the third is the sensitive organ of Corti, which detects pressure impulses and responds with electrical impulses which travel along the auditory nerve to the brain.

The cochlea has three fluid filled sections. The perilymph fluid in the canals differs from the endolymph fluid in the cochlear duct. The organ of Corti is the sensor of pressure variations.

The pressure changes in the cochlea caused by sound entering the ear travel down the fluid filled tympanic and vestibular canals which are filled with a fluid called perilymph. This perilymph is almost identical to spinal fluid and differs significantly from the endolymph which fills the cochlear duct and surrounds the sensitive organ of Corti. The fluids differ in terms of their electrolytes and if the membranes are ruptured so that there is mixing of the fluids, the hearing is impaired

Organ of Corti

The organ of Corti is the sensitive element in the inner ear and can be thought of as the body's microphone. It is situated on the basilar membrane in one of the three compartments of the Cochlea. It contains four rows of hair cells which protrude from its surface. Above them is the tectoral membrane which can move in response to pressure variations in the fluid- filled tympanic and vestibular canals. There are some 16,000 -20,000 of the hair cells distributed along the basilar membrane which follows the spiral of the cochlea.

The place along the basilar membrane where maximum excitation of the hair cells occurs determines the perception of pitch according to the place theory. The perception of loudness is also connected with this organ.

The hair cells of the organ of Corti are arranged in four rows along the length of the basilar membrane. Individual hair cells have multiple strands called stereocilia. There may be 16,000 - 20,000 such cells. The place theory of pitch perception suggests that pitch is determined by the place along this collection at which excitation occurs. The pitch resolution of the ear suggests a collection of hair cells like this associated with each distinguishable pitch.

This is another conception of the arrangement of the outer three rows of hair cells, consistent with the above picture, but showing that a cluster of the cilia is associated with a single hair cell. It is drawn roughly from the work of McGutin. I think that the best that we know about the cilia arrangements comes from electron micrographs like those of Hudspeth.

The sensitive hair cells of the organ of Corti may have about 100 tiny stereocilia which in the resting state are leaning on each other in a conical bundle. In response to the pressure variations in the cochlea produced by sound, the stereocilia may dance about wildly and send electrical impulses to the brain.

Between 1985 and 1986 William Brownell, Bechara Kachar and their collaborators used video microscopy to demonstrate low-frequency electro motile responses in isolated outer hair cells. In 1987 Jonathan Ashmore used the patch-clamp technique to elicit electro motile responses from outer hair cells (movie below) at kHz frequencies and measured the cell fractional length change as a function of trans membrane potential.

Basilar Membrane


Von Helmholtz provided the first mechanical model of the cochlea. Ignoring the hydrodynamical effects of the fluid, he represented the basilar membrane as an elastic strip radially clamped across the cochlear duct with different and graded tension coefficients in the radial directions. Assuming a negligible tension in the longitudinal direction, this basilar membrane model is equivalent to a set of harmonic oscillators tuned to different frequencies. Imagining that each portion of the basilar membrane senses a force proportional to the stapedial input, the basilar membrane response to a tone of given frequency (say 2 kHz) should be similar to that shown in the movie above, although with a much smaller amplitude. Accordingly, the cochlea was thought of as a sort of spectrum analizer providing a frequency-position map of sound Fourier components.

The idea that the frequency analysis of sound is provided by such a simple mechanism survived until 1927. Realizing the importance of hydrodynamic interactions, Georg von Békésy built a brass and glass frame with two uniformly tapered chambers, filled with water and separated by a strip having graded elastic properties. Cochlear travelling waves were discovered! Von Békésy's experiments on freshly dissected human temporal bones confirmed the observations performed on his physical model.

The movie here above illustrates, with exaggerated amplitude, the basilar membrane response to a tone of about 2 kHz. At variance from the resonator-bank model imagined by von Helmholtz, the inertial effects of the co-moving fluid impart a phase delay of a few cycles to the basilar membrane oscillation.

Von Békésy's findings stimulated the production of numerous cochlear models that reproduced the observed wave shapes, but were in contrast with psychophysical data on the frequency selectivity of the cochlea. Beginning in the '70s with Rhode's results and more definitively in the '80s, with the improved measurement by Sellick, Patuzzi and Johnstone, it was realized that the vibration of the basilar membrane was non-linear and more sharply tuned, at low sound pressure levels, than previously thought. In 1983, this prompted Davies to state that it was necessary to accept a revolutionary new hypothesis concerning the action of the cochlea namely, that an active process within the organ of Corti increases the vibration of the basilar membrane.

Note the greater sensitivity at low volumes and the reduced senitivity at high volumes. THe important lesson here ius that our ears are FAR more sensitive to changes at low volumes than at high volumes


Auditory Nerve

Taking electrical impulses from the cochlea and the semicircular canals, the auditory nerve makes connections with both auditory areas of the brain.


This schematic view of some of the auditory areas of the brain shows that information from both ears goes to both sides of the brain - in fact, binaural information is present in all of the major relay stations illustrated here. That is, when the auditory nerve from one ear takes information to the brain, that information is directly sent to both the processing areas on both sides of the brain.

The Place Theory

High frequency sounds selectively vibrate the basilar membrane of the inner ear near the entrance port (the oval window). Lower frequencies travel further along the membrane before causing appreciable excitation of the membrane. The basic pitch determining mechanism is based on the location along the membrane where the hair cells are stimulated.

The place theory unrolls the cochlea and represents the distribution of sensitive hair cells on the organ of Corti. Pressure waves are sent through the fluid of the inner ear by force from the stirrup



The extremely small size of the cochlea and the extremely high resolution of human pitch perception cast doubt on the sufficiency of the place theory to completely account for the human ear's pitch resolution. Some typical data:

This would require a separate detectable pitch for every 0.002 cm, which is physically unreasonable for a simple peaking action on the membrane.

The normal human ear can detect the difference between 440 Hz and 441 Hz. It is hard to believe it could attain such resolution from selective peaking of the membrane vibrations. Some pitch sharpening mechanism must be operating.

Sharpening of Pitch Perception

The high pitch resolution of the ear suggests that only about a dozen hair cells, or about three tiers from the four banks of cells are associated with each distinguishable pitch. It is hard to conceive of a mechanical resonance of the basilar membrane that sharp. So we look for enhancements of the basic place theory of pitch perception.

There must be some mechanism which sharpens the response curve of the organ of Corti, as suggested schematically in the diagram. Several such mechanisms have been suggested.

Mechanisms for Sharpening

Since it seems unlikely that the basic place theory for pitch perception can explain the extraordinary pitch resolution of the human ear, some sharpening mechanism must be operating. Several of the proposed mechanisms have the nature of lateral inhibition on the basilar membrane. One way to sharpen the pitch perception would be bring the peak of the excitation pattern on the basilar membrane into greater relief by inhibiting the firing of those hair cells which are adjacent to the peak. Since nerve cells obey an "all-or-none" law, discharging when receiving the appropriate stimulus and then drawing energy from the metabolism to recharge before firing again, one form the lateral inhibition could take is the inhibition of the recharging process since the cells at the peak of the response will be drawing energy from the surrounding fluid most rapidly. Inhibition of the lateral hair cells could also occur at the ganglia, with some kind of inhibitory gating which lets through only those pulses from the cells which are firing most rapidly. It is known that there are feedback signals from the brain to the hair cells, so the inhibition could occur by that means.

2.2 Critical bands 74

Section 2.1 describes how the inner ear carries out a frequency analysis of sound due to the mechanical properties of the basilar membrane and how this provides the basis behind the 'place' theory of hearing. The next important aspect of the place theory to consider is how well the hearing system can discriminate between individual frequency components of an input sound. This will provide the basis for understanding the resolution of the hearing system and it will underpin discussions relating to the psychoacoustics of how we hear music, speech and other sounds.

Each component of an input sound will give rise to a displacement of the basilar membrane at a particular place. The displacement due to each individual component is spread to some extent on either side of the peak. Whether or not two components that are of similar amplitude and close together in frequency can be discriminated depends on the extent to which the basilar membrane displacements due to each of the two components are clearly separated or not.

NOTE: Use SignalSuite to demonstrate

Suppose two pure tones, or sine waves, with amplitudes A1 and A2 and frequencies F1 and F2 respectively are sounded together. If F1 is fixed and F2 is changed slowly from being equal to or in unison with F1 either upwards (downwards) in frequency, the following is generally heard (see Figure 2.6). When F1 is equal to F2, a single note is heard. As soon as F2 is moved higher (lower) than F1 a sound with a clearly undulating amplitude variations known as beats is heard. The frequency of the beats is equal to (F2 - F1), or (F1 - F2) if F1 is greater than F2, and the amplitude varies between (A1 + A2) and (A1 - A2), or (A1 + A2) and (A2 - A1) if A2 is greater than A1. Note that when the amplitudes are equa

For the majority of listeners beats are usually heard when the frequency difference between the tones is less than about 12.5 Hz, and the sensation of beats generally gives way to one of a 'fused' tone which sounds 'rough' when the frequency difference is increased above 15 Hz. As the frequency difference is increased further there is a point where the fused tone gives way to two separate tones but still with the sensation of roughness, and a further increase in frequency difference is needed for the rough sensation to become smooth. The smooth separate sensation persists while the two tones remain within the frequency range of the listener's hearing.

Fig 2.6 An illustration of the perceptual changes which occur when a pure tone fixed at frequency F1 is heard cobined with a pure tone at frequency F2

The changes from fused to separate and from beats to rough to smooth are shown red in the figure above to indicate that there is no exact frequency difference at which these changes in perception occur for every listener. However, the approximate frequencies and order in which they occur is common to all listeners, and in common with most psychoacoustic effects, average values are quoted which are based on measurements made for a large number of listeners. The point where the two tones are heard as being separate as opposed to fused when the frequency difference is increased can be thought of as the point where two peak displacements on the basilar membrane begin to emerge from a single maximum displacement on the membrane. However, at this point the underlying motion of the membrane which gives rise to the two peaks causes them to interfere with each other giving the rough sensation, and it is only when the rough sensation becomes smooth that the separation of the places on the membrane is sufficient to fully resolve the two tones. The frequency difference between the pure tones at the point where a listener's perception changes from rough and separate to smooth and separate is known as the 'critical bandwidth', and it is therefore marked CB in the figure.

A more formal definition is given by Scharf (1970): 'The critical bandwidth is that bandwidth at which subjective responses rather abruptly change.''

In order to make use of the notion of critical bandwidth practically, an equation relating the effective critical bandwidth to the filter centre frequency has been proposed by Moore and Glasberg (1983). They define a filter with an ideal rectangular frequency response curve which passes the same power as the filter in question, and provide an equation for the 'equivalent rectangular bandwidth', or ERB, of the critical bandwidth as follows:

ERB = ([6.23 x 10-6 x fc2] + [93.39 x 10-3 x fc] + 28.52] Hz



This relationship is plotted in Figure 2.7 and lines represent where the bandwidth is equivalent to I, 2, 4 and 7 semitone or a semi tone, whole tone, major third and perfect fifth respectively-are also plotted for comparison purposes. A third oct filter is often used in the studio as an approximation to the critical bandwidth, this is shown in the figure as the 4 semitone ] since there are 12 semitones per octave. A keyboard is shown on the filter centre frequency axis to show the equivalent fundamental frequency values of notes (middle C is marked with a dot).

Fig 2.7 The variation of equivalent rectangular bandwidth (ERB) with filter center frequency and lines indication where the bandwidth would be equivalent to 1,2,4,and 7 semitones. Middle C is marked with a dot on the keyboard.

Example 2.2 Calculate the critical bandwidth at 200 Hz and 2000 Hz to three significant figures.

Using Equation (2.6) and substituting 200 Hz and 2000 Hz for fc gives the critical bandwidth (ERB) as:

[6.23 x 10-6 x 2002] + [93.39 x 10-3 x 200] + 28.52 = 47.5 Hz

[6.23 x 10-6 x 20002] + [93.39 x 10-3 x 2000] + 28.52 = 240 Hz

The change in critical bandwidth with frequency can be demonstrated if the fixed frequency FI in Figure 2.6 is altered to a new value and the new position of CB is found. In practice, critical bandwidth is usually measured by an effect known as 'masking' (see Chapter 5) in which the 'rather abrupt change' is more clearly perceived by listeners. The response characteristic of an individual filter is illustrated in the bottom curve in Figure 2.8, the vertical axis of which is marked 'filter response' (notice that increasing frequency is plotted from right to left in this figure in keeping with Figure 2.5 relating to basilar membrane displacement). The other curves

Fig 2.8 Derivation of response of an auditory filter with center frequency fc Hz based on an idealized envelope of basilar membrane movement to pure tones with frequencies local to the center frequency of the filter


The critical bandwidth varies from a little less than 100 Hz at low frequency to between two and three musical semitones (12 to 19%) at high frequency (from Rossing, 1982).

The auditory system performs a Fourier analysis of complex sounds into their component frequencies. The cochlea acts as if it were made up of overlapping filters having bandwidths equal to the critical bandwidth. The critical bandwidth varies from slightly less than 100 Hz at low frequency to about 1/3 of an octave at high frequency, as shown in Fig. 1. The audible range of frequencies comprises about 24 critical bands. It should be emphasized that there are not 24 independent filters, however. The ear's critical bands are continuous, in that a tone of any audible frequency will find a critical band centered on it.

Considerable understanding of the way in which the cochlea performs its frequency analysis resulted from the experiments of von Békésy, who observed the patterns of actual basilar membranes when sound waves of different frequencies were applied. High frequencies created peaks toward the near (oval window) end of the basilar membrane, while low frequencies caused peaks toward the far (apex) end. Békésy tuning curves for the basilar membrane led to the place theory of hearing.

Békésy tuning curves, measured in cadavers at very high sound intensities, were too broad to account for the observed frequency resolution of the auditory system. Experiments by Johnstone and Boyle (1967) and by Rhode and Robles (1974), using the Mossbauer effect to measure basilar membrane motion in animals at much lower amplitude, led to sharper tuning curves. Mechanical measurements on the basilar membrane of the cat, using laser interferometry, yielded tuning curves about as sharp as electrophysiological tuning curves measured in the 8th nerve (Khanna and Leonard, 1982). Still, the greater frequency resolution observed in other types of experiments suggests the existence of a "second filter", which might very well be associated with the hair cells of the basilar membrane.

Critical Bands by Masking (Audio Demo)

Critical Bands by Loudness Comparison (Audio Demo)

Fig 2.10 Idealized bank of bandpass filters model of the frequency analysis capability of the basilar membrane.

The action of the basilar membrane can be thought of as being equivalent to a large number of overlapping band-pass filters, or a 'bank' of band-pass filters, each responding to a particular band of frequencies. Each filter has an asymmetric shape to its response with a steeper roll-off on the high-frequency side than on the low-frequency side, and the bandwidth of a particular filter is given by the critical bandwidth for any particular centre frequency. It is possible to be particularly exact with regard to the extent to which the filters overlap. A common practical compromise, for example in studio third octave graphic equaliser filter banks, is to overlap adjacent filters at the -3 dB points on their response curves.

In terms of the perception of two pure tones illustrated in Figure 2.6, the 'critical bandwidth' can be thought of as the bandwidth of the band-pass filter in the bank of filters, the centre frequencies of which are exactly half way between the frequencies of the two tones. This ignores the asymmetry of the basilar membrane response and the consequent asymmetry in the individual filter response curve, but it provides a good working approximation for calculations. Such a filter (and others close to it in centre frequency) would capture both tones while they are perceived as 'beats', 'rough fused' or 'rough separate', and at the point where rough changes to smooth, the two tones are too far apart to be both captured by this or any other filter. At this point there is no single filter which captures both tones, but there are filters which capture each of the tones individually and they are therefore resolved and the two tones are perceived as being 'separate and smooth'.

A musical sound can be described by the frequency components which make it up, and an understanding of the application of the critical band mechanism in human hearing in terms of the analysis of the components of musical sounds gives the basis for the study of psychoacoustics. The resolution with which the hearing system can analyse the individual components or sine waves in a sound is important for understanding psychoacoustic discussions relating to, for example, how we perceive:

Reading Assignment

Before next class please read Sections

pages 79 to 91 of Acoustics and Psychoacoustics.

We may have a brief quiz on these sections at the beginning of the next class.

You Need to Know