College of Santa Fe Auditory Theory

Lecture 010 Hearing V


  1. Hearing pitch
  2. Place theory of pitch perception
  3. Problems with the place theory
  4. Temporal theory of pitch perception
  5. Problems with the temporal theory
  6. Contemporary theory of pitch perception
  7. Secondary aspects of pitch perception
  8. The doppler effect
  9. Brain Bullets

Note Data from Note Name

Input note name in the form "A#3" (no b)
Note Name:
MIDI Note Number: A440=A4
Cents Detune: cents
Frequency: Hz

3.2 Hearing Pitch 119

Perception of pitch is basic to the hearing of tonal music. Familiarity with current theories of pitch perception as well as aspects of psychoacoustics enable a well founded understanding of musically important matters such as tuning, intonation, perfect pitch, vibrato, electronic synthesis of new sounds, and pitch paradoxes.

Pitch relates to the perceived position of a sound on a scale from low to high and its formal definition by the American National Standards Institute (1960) is couched in these terms as: 'pitch is that attribute of auditory sensation in terms of which sounds are ordered on a scale extending from low to high'. The measurement of pitch is therefore 'subjective' because it requires a human listener (the 'subject') to make a perceptual judgement. This in contrast to the measurement in the laboratory of for example, the fundamental frequency (f0) of a note, which is an objective measurement.

In general, sounds which have a periodic acoustic pressure variation with time are perceived as having a pitch associated with them, and sounds whose acoustic pressure waveform is non periodic are perceived as having no pitch. The relationship between waveforms and spectra of pitched and non-pitched sounds is summarised in Table 3.2 and examples of each have been discussed

  Pitched Non Pitched
Waveform (time domain) Periodic
(regular repetitions)
Non Periodic
(no regular repetitions)
Spectrum (frequency domain) Line
(harmonic components)
(no harmonic components)

Table 3.2 The nature of the waveforms and spectra for pitched and non-pitched sounds

in relation to Figures 3.2 and 3.6. The terms 'time domain' and 'frequency domain' are widely used when considering time (waveform) and frequency (spectral) representations of signals. The pitch of a note varies as its f0 is changed, the greater the f0 the higher the pitch and vice versa. Although the measurement of pitch and f0 are subjective and objective and measured on a scale of high/low and Hz respectively, a measurement of pitch can be given in Hz. This is achieved by asking a listener to compare the sound of interest by switching between it and a sine wave with a variable frequency. The listener would adjust the frequency of the sine wave until the pitch of the two sounds are perceived as being equal, at which point the pitch of the sound of interest is equal to the frequency of the sine wave in Hz. Two basic theories of pitch perception have been proposed to explain how the human hearing system is able to locate and track changes in the f0 of an input sound: the 'place' theory and the 'temporal' theory. These are described below along with their limitations in terms of explaining observed pitch perception effects.

3.2.1 Place theory of pitch perception

The place theory of pitch perception relates directly to the frequency analysis carried out by the basilar membrane in which different frequency components of the input sound stimulate different positions, or places, on the membrane. Neural firing of the hair cells occurs at each of these places, indicating to higher centres of neural processing and the brain which frequency components are present in the input sound. For sounds in which all the harmonics are present, the following are possibilities for finding the value of f0 based on a place analysis of the components of the input sound and allowing for the possibility of some 'higher processing of the component frequencies at higher centers of neural processing and/or the brain.

Method 1: locate the f0 component itself.
Method 2: find the minimum frequency difference between adjacent harmonics. The frequency difference between the (n + 1)th and the (n)th harmonic, which are adjacent by definition if all harmonics are present, is:

((n + 1)f0) - (n f0) = (n f0) + (1 f0) - (nf0) = f

where n = 1, 2, 3, 4, ... .

Method 3: find the highest common factor (the highest number that will divide into all the frequencies present giving an integer result) of the components present. Table 3.3 illustrates this for a sound consisting of the first ten harmonics whose f0 is 100 Hz, by dividing each frequency by integers, in this case up to 10, and looking for the largest number in the results which exists for every frequency. The frequencies of the harmonics are given in the left-hand column (the result of a place analysis), and each of the other columns show the result of dividing the frequency of each component by integers (m = 2 to 10). The highest common factor is the highest value appearing in all rows of the table, including the frequencies of the components themselves ([0 + 1) or (m = 1), is 100 Hz which would be perceived as the pitch. In addition, it is of interest to notice that every value which appears in the row relating to the to, in this case 100 Hz, will appear in each of the other rows if the table were extended far enough to the right. This is the case because by definition, 100 divides into each harmonic frequency to give an integer result (n) and all values appearing in the 100 Hz row are found by integer (m) division of 100 Hz; therefore all values in the 100 Hz row can be gained by division of harmonic frequencies by (m x n) which must itself be an integer. These are to values (50 Hz, 33 Hz, 2S Hz, 20 Hz, etc.) whose harmonic series also contain all the given components, and they are known as 'sub-harmonics'. This is why it is the highest common factor which is used.

Table 3.3 Processing method to find the highest common factor of the frequencies of the first ten harmonics of a sound whose f0=100HZ (calculations to 4 significant figures).

One of the earliest versions of the place theory suggests that the pitch of a sound corresponds to the place stimulated by the lowest frequency component in the sound which is f0 (method 1 above). The assumption underlying this is that to is always present in sounds and the theory was encapsulated by Ohm in his second or 'acoustical' law (his first law being basic to electrical work: voltage = current x resistance): 'a pitch corresponding to a certain frequency can only be heard if the acoustic wave contains power at that frequency'.

This theory came under close scrutiny when it became possible to carry out experiments in which sounds could be synthesised with known spectra. Schouten (1940) demonstrated that the pitch of a pulse wave remained the same when the fundamental component was removed, thus demonstrating: (i) that f0 did not have to be present for pitch perception, and (m that the lowest component present is not the basis for pitch perception because the pitch does not jump up by one octave (since the second harmonic is now the lowest component after f0 has been removed). This experiment has become known as: 'the phenomenon of the missing fundamental', and suggests that Method 1 cannot account for human pitch perception.

Method 2 seems to provide an attractive possibility since the place theory gives the positions of the harmonics, whether or not f0 is present, and it should provide a basis for pitch perception providing some adjacent harmonics are present. For most musical sounds, adjacent harmonics are indeed present. However, researchers are always looking for ways of testing psychoacoustic theories, in this case pitch perception, by creating sounds for which the perceived pitch cannot be explained by current theories. Such sounds are often generated electronically to provide accurate control over their frequency components and temporal development.

Figure 3.7 shows an idealised spectrum of a sound which contains just odd harmonics (1 f0, 3 f0 , 5 f0 , ...) and shows that measurement of the frequency distance between adjacent harmonics would give f0 , 2 f0 , 2 f0 , 2 f0 , etc. The minimum spacing between the harmonics is f0 which gives a possible basis for pitch perception. However, if the f0 component were removed (imagine removing the dotted f0 component in Figure 3.7), the perceived pitch would not change. Now however, the spacings between adjacent harmonics is 3 f0 , 2 f0 , 2 f0 , 2 f0 , etc. and the minimum spacing is 2 f0 but the pitch does not jump up by an octave.

Fig 3.7 Idealized spectrum for a sound with odd harmonics only to show the spacing between adjacent harmonics when the fundamental frequency (shown dashed) is present or absent.

The third method will give an appropriate f0 for: (i) sounds with missing f0 components (see Table 3.3 and ignore the f0 row), m) sounds with odd harmonic components only (see Table 3.3 and ignore the rows for the even harmonics), and (iii) sounds with odd harmonic components only with a missing f0 component (see Table 3.3 and ignore the rows for f0 and the even harmonics). In each case, the highest common factor of the components is f0 . This method also provides a basis for understanding how a pitch is perceived for non-harmonic sounds, such as bells or chime bars, whose components are not exact harmonics (integer multipliers) of the resulting f0 .

As an example of such a non-harmonic sound, Schouten in one of his later experiments produced sounds whose component frequencies were 1040 Hz, 1240 Hz, and 1440 Hz and found that the perceived pitch was approximately 207 Hz. The f0 for these components, based on the minimum spacing between the components (Method 2), is 200 Hz. Table 3.4 shows the result of applying Method 3 (searching for the highest common factor of these three components) up to an integer divisor of ten. Schouten's proposal can be interpreted in terms of this table by looking for the closest set of values in the table which would be consistent with the three components being true harmonics and taking their average to give an estimate of f0 . In this case, taking 1040 Hz as the fifth 'harmonic', 1240 Hz as the sixth 'harmonic' and 1440 Hz as the seventh 'harmonic' gives 208 Hz, 207 Hz and 206 Hz respectively. The average of these values is 207 Hz, and Schouten referred to the pitch perceived in such a situation as the 'residue pitch' or 'pitch of the residue'. It is also sometimes referred to as 'virtual pitch'.

Virtual Pitch(Audio Demo)

By way of a coda to this discussion, it is interesting to note that these components 1040 Hz, 1240 Hz and 1440 Hz do, in fact, have a true f0 of 40 Hz of which they are the 26th, 31st and 36th harmonics, which would appear if the table were continued well over to the right. However, the auditory system appears to find an f0 for which the components present are adjacent harmonics.

Fig 3.8 Just Noticeable Difference for pitch perception and the equivalent rectangular bandwidth

Just Noticeable Difference (Audio Demos)

3.2.2 Problems with the place theory

The place theory provides a basis for understanding how f0 could be found from a frequency analysis of components. However, there are a number of problems with the place theory because it does not explain:

Each will be considered in turn. Psychoacoustically, the ability to discriminate between sounds that are nearly the same except for a change in one aspect (fo, intensity, duration, etc.) is measured as a 'difference limen' (DL), or 'just noticeable difference' (JND). JND is preferred in this book. The JND for human pitch perception is shown graphically in Figure 3.8 along with the critical bandwidth curve. This JND graph is based on an experiment by Zwicker et al. (1957) in which sinusoidal stimuli were used (fixed waveshape) and the sound intensity level and sound duration remained constant. It turns out that the JND is approximately one thirtieth of the critical bandwidth across the hearing range. Musically, this is equivalent to approximately one twelfth of a semitone. Thus the JND in pitch is much smaller than the resolution of the analysis filters (critical bandwidth).

The place mechanism will resolve a given harmonic of an input sound provided that the critical bandwidth of the filter concerned is sufficiently narrow to exclude adjacent harmonics. It turns out that, no matter what the f0 of the sound is, only the first 5 to 7 harmonics are resolved by the place analysis mechanism. This can be illustrated for an example as follows with reference to Table 3.5.

Consider a sound consisting of all harmonics (fo, 2 f0 , 3 f0 , 4 /0' 5 /0' etc.) whose /0 is 110 Hz. The frequencies of the first ten harmonics are given in the left hand column of the table. The next column shows the critical bandwidth of a filter centred on each of these harmonics by calculation using Equation 2.8. The critical bandwidth increases with filter centre frequency (e.g. see Figure 3.8), and the frequency analysis action of the basilar membrane is equivalent to a bank of filters. Harmonics will cease to be resolved by the place mechanism when the critical bandwidth exceeds the frequency spacing between adjacent harmonics, which is /0 when all adjacent harmonics are present. In the table, it can be seen that the critical bandwidth is greater than f0 for the filter centered at 880 Hz (the eighth harmanic), but this filter will resolve the eighth harmonic since it is centered over it and its bandwidth extends half above and half below 880 Hz. In order to establish when harmonics are not resolved, consider the filter centered midway between adjacent harmonic positions (their center frequencis and critical bandwidths are shown in the table). The filter centered between the eighth and ninth harmonic has a critical bandwidth of 121 Hz whkh exceeds f0 (110 Hz) in this example and therefore the eighth and ninth harmonics will not be resolved by this filter, and the filter between the seventh and eighth hamonic has a critical bandwidth of 110 Hz and the seventh and eighth harmonics would not be resolved either, especially as the frequeney range 'seen' by a band-pass filter extends beyond its nominal bandwidth (see Chapter 1). Due to the continuous nature of the travelling wave along the basilar membrane, no harmonics will in this example be resolved above the seventh (possibly above the sixth) since there will be areas on the membrane responding to at least adjacent pairs of harmonics everywhere above the place where the seventh (or possibly the sixth) harmonic stimulates it.

Observation of the relationship between the critical bandwidth and centre frequency plotted in Figure 3.8 allows the general conclusion that no harmonic above about the fifth to seventh is resolved for any to to be approximately validated as follows. The centre frequency for which the critical band exceeds the to of the sound of interest is found from the graph and no harmonic above this centre frequency will be resolved. To find the centre frequency, plot a horizontal line for the to of interest on the y axis, and find the frequency on the x axis where the line intersects the critical band curve. Only harmonics below this frequency will be resolved and those above will not. It is worth trying this exercise for a few to values to reinforce the general conclusion about resolution of harmonics, since this is vital to understanding of other aspects of psychoacoustics as well as pitch perception.

There are sounds which have non-harmonic spectra for which a pitch is perceived; these are exceptions to the second part of the general statement given earlier that: 'sounds whose acoustic pressure waveform is non-periodic are perceived as having no pitch'. For example, listen to examples of the 'ss' in sea and the 'sh' in shell (produce these yourself or ask someone else to) in terms of which one has the higher pitch. Most listeners will readily respond that 'ss' has a higher pitch than 'sh'. The spectrum of both sounds is continuous and an example for each is shown in Figure 3.9. Notice that the energy is biased more towards lower frequencies for the 'sh' with a peak around 2.5 kHz, compared to the 'ss' where the energy has a peak at about 5 kHz. This 'centre of gravity' of the spectral energy of a sound is thought to convey a sense of higher or lower pitch for such sounds which are noise based, but the pitch sensation is far weaker than for that perceived for periodic sounds. This pitch phenomenon is, however, important in music when considering the perception of the non-periodic sounds produced, for example by some groups of instruments in the percussion section, but the majority of instruments on which notes are played in musical performance produce periodic acoustic pressure variations. Pitch sensations arising associated with nonperiodic acoustic waveforms cannot be described by the place theory since there are no frequency components on which to base the place analysis since the spectra of such sounds is continuous.

The final identified problem is that the pitch perceived for sounds with components only below 50 Hz cannot be explained by the place theory, because the pattern of vibration on the basilar membrane does not appear to change in that region.

Sounds of this type are rather unusual, but not impossible to create by electronic synthesis. Since the typical lowest audible frequency for humans is 20 Hz, a sound with an f0 of 20 Hz would have harmonics at 40 Hz, 60 Hz etc., and only the first two fall within this region which no change is observed in basilar membrane response. Harmonics falling above 50 Hz will be analysed by the place mechanism in the usual manner. Sinusoids in the 20 Hz to 50 Hz range are perceived as having different pitches and the place mechanism cannot explain this. These are some of the key problems which the place mechanism cannot explain, and attention will now be drawn to the temporal theory of pitch perception which was developed to explain some of these problems with the place theory.

3.2.3 Temporal theory of pitch perception

The temporal theory of pitch perception is based on the fact that the waveform of a sound with a strong musical pitch repeats or is periodic (see Table 3.2). An example is shown in Figure 3.1 for A4 played on four instruments. The f0 for a periodic sound can be found from a measurement of the period of a cycle of the waveform using Equation 3.1.

The temporal theory of pitch perception relies on the timing of neural firings generated in the organ of Corti (see Figure 2.3) which occur in response to vibrations of the basilar membrane. The place theory is based on the fact that the basilar membrane is stimulated at different places along its length according to the frequency components in the input sound. The key to the temporal theory is the detailed nature of the actual waveform exciting the different places along the length of the basilar membrane. This can be modelled using a bank of electronic band-pass filters whose centre frequencies and bandwidths vary according to the critical bandwidth of the human hearing system as illustrated, for example in Figure 3.8.

Fig 3.10 Output from a transputer-based model of human hearing to illustrate the nature3 of basilar membrane vibration at different places along its length for C4 played on a violin.

Figure 3.10 shows the output waveforms from such a bank of electronic filters, implemented using transputers by Howard et al. (1995), with critical bandwidths based on the ERB equation (Equation 2.6) for C4 played on a violin. The nominal to for C4 is 261.6 Hz (see Figure 3.21). The output waveform from the filter with a centre frequency just above 200 Hz, the lowest centre frequency represented in the figure, is a sine wave at 10' This is because the 10 component is resolved by the analysing filter, and an individual harmonic of a complex periodic waveform is a sine wave (see Chapter 1). The place theory suggests (see calculation associated with Table 3.5) that the first five to seven harmonics will be resolved by the basilar membrane. It can be seen in the example note shown in Figure 3.10 that the second (around 520 Hz), third (around 780 Hz), fourth (around 1040 Hz) and fifth (around 1300 Hz) harmonics are resolved and their waveforms are sinusoidal. Some amplitude variation is apparent on these sine waves, particularly on the fourth and fifth, indicating the dynamic nature of the acoustic pressure output from a musical instrument. The sixth harmonic (around 1560 Hz) has greater amplitude variation, but the individual cycles are clear.

Output waveforms for filter centre frequencies above the sixth harmonic in this example are not sinusoidal because these harmonics are not resolved individually, demonstrating that the frequency range of a band-pass filter extends beyond its nominal bandwidth (see Chapter 1). At least two harmonics are combined in the outputs from filters which are not sinusoidal in Figure 3.10. When two components close in frequency are combined, they produce a 'beat' waveform whose amplitude rises and falls regularly if the components are harmonics of some fundamental. The period of the beat is equal to the difference between the frequencies of the two components. Therefore if the components are adjacent harmonics, then the beat frequency is equal to their f0 and the period of the beat waveform is (1/f0). This can be observed in the figure by comparing the beat period for filter outputs above 1.5 kHz with the period of the output sinewave at f0 Thus the period of output waveforms for filters with centre frequencies higher than the sixth harmonic will be at (1/f0) for an input consisting of adjacent harmonics.

The periods of all the output waveforms which stimulate the neural firing in the organ of Corti forms the basis of the temporal theory of pitch perception. There are nerve fibres available to fire at all places along the basilar membrane, and they do so in such a manner that a given nerve fibre may only fire at one phase or instant in each cycle of the stimulating waveform, a process known as 'phase-locking'. Although the nerve firing is phase locked to one instant in each cycle of the stimulating waveform, it has been observed that no single nerve fibre is able to fire continuously at frequencies above approximately 300 Hz. It turns out that the nerve does not necessarily fire in every cycle and that the cycle in which it fires tends to be random, which according to Pickles 0982) may be 'perhaps as little as once every hundred cycles on average. However due to phase locking, the time between firings for any particular nerve will always be an integer 0, 1, 2, 3, 4, ...) multiple of periods of the stimulating waveform and there are a number of nerves involved at each place. A 'volley firing' principle has also been suggested by Wever (1949) in which groups of nerves work together, each firing in different cycles to enable frequencies higher than 300 Hz to be coded. A full discussion of this area is beyond the scope of this book, and the interested reader is encouraged to consult, for example Pickles (1982); Moore (1982, 1986); Roederer (1975). What follows relies on the principle of phase locking.

The minimum time between firings (1 period of the stimulating waveform) at different places along the basilar membrane can be inferred from Figure 3.10 for the violin playing C4, since it will be equivalent to the period of the output waveform from the analysis filter. For places which respond to frequencies below about the sixth harmonic, the minimum time between firings is at the period of the harmonic itself, and for places above, the minimum time between firings is the period of the input waveform itself (i.e. 1/ f0).

The possible instants of nerve firing are illustrated in Figure 3.11. This figure enables the benefit to be illustrated that results from the fact that nerves fire phase locked to the stimulating waveform but not necessarily during every cycle. The figure shows an idealised unrolled basilar membrane with the places corresponding to where maximum stimulation would occur for input components at multiples of f0 up to the sixteenth harmonic, for any f0 of input sound. The assumption on which the figure is based is that harmonics up to and including the seventh are analysed separately. The main part of the figure shows the possible instants where nerves could fire based on phase locking and the fact that nerves may not fire every cycle, and the lengths of the vertical lines illustrate the proportion of firings which might occur at that position, on the basis that more firings are likely with lower times between them. These approximate to the idea of a histogram of firings being built up, sometimes referred to as an 'inter-spike interval' histogram, where a 'spike' is a single nerve firing.

Thus at the place on the basilar membrane stimulated by the to component, possible times between nerve firing are: 0/ to>, (2//0) and (3//0) in this figure as shown, with less firings at the higher intervals. For the place stimulated by the second harmonic, possible firing times are: [1/(2 f0)]' [2/(2 f0)] or (1/f0)' [3/(2 f0 )],

Fig 3.11 The possible instants for nerve firing across the places on the basilar membrane for the first 16 harmonics of an input sound

[4/(210)] or (2/10) and so on. This is the case for each place stimulated by a harmonic of f0 up to the seventh. For place corresponding to higher frequencies than (7 f0), the stimulating waveform is beat like and its fundamental period is (1/f0) and therefore the possible firing times are: (1/f0), (2/f0) and (3/f0) in this figure as shown. Visually it can be seen in the figure that if the entries in all these inter-spike interval histograms were added together vertically (i.e. for each firing time interval), then the maximum entry would occur for the period of f0. This is reinforced when it is remembered that all places higher than those shown in the figure would exhibit outputs similar to those shown above the eighth harmonic. Notice how all the places where harmonics are resolved have an entry in their histograms at the fundamental period as a direct result of the fact that nerve may not fire in every cycle. This is the basis on which the temporal theory of pitch perception is thought to function.

3.2.4 Problems with the temporal theory

The temporal theory gives a basis for understanding how the fundamental period could be found from an analysis of the nerve firing times from all places across the basilar membrane. However, not all observed pitch perception abilities can be explained by the temporal theory alone, the most important being the pitch perceived for sounds whose to is greater than 5 kHz. This cannot be explained by the temporal theory because phase locking breaks down above 5 kHz. Any ability to perceive the pitches of sounds with f0 greater than 5 kHz is therefore thought to be due to the place theory alone. Given that the upper frequency limit of human hearing is at best 20 kHz for youngsters, with a more practical upper limit being 16 kHz for those over 20 years of age, a sound with an to greater than 5 kHz is only going to provide the hearing system with two harmonics ([0 and 2*f0) for analysis. In practice it has been established that human pitch perception for sounds whose f0 is greater than 5 kHz is rather poor with many musicians finding it difficult to judge accurately musical intervals in this frequency range. Moore (1982) notes that this ties in well with to for the upper note of the piccolo being approximately 4.5 kHz. On large organs, some stops can have pipes whose to exceeds 8 kHz, but these are provided to be used in conjunction with other stops (see Section 5.4).

3.2.5 Contemporary theory of pitch perception

Psychoacoustic research has tended historically to consider human pitch perception with reference to the place or the temporal theory and it is clear that neither theory alone can account for all observed pitch perception abilities. In reality, place analysis occurs giving rise to nerve firings from each place on the basilar membrane that is stimulated. Thus nerve centres and the parts of the brain concerned with auditory processing are provided not only with an indication of the place where basilar membrane stimulation occurs (frequency analysis) but also with information about the nature of that stimulation (temporal analysis). Therefore neither theory is likely to explain human pitch perception completely, since the output from either the place or temporal analysis makes use of the other in communicating itself on the auditory nerve.

Figure 3.12 shows a model for pitch perception of complex tones based on that of Moore (1982) which encapsulates the benefits described for both theories. The acoustic pressure wave is modified by the frequency response of the outer and middle ears

Fig 3.12 A model for Human pitch perception by Moore (1982)

(see Chapter 2), and analysed by the place mechanism which is equivalent to a filter bank analysis. Neural firings occur stimulated by the detailed vibration of the membrane at places equivalent to frequency components of the input sound based on phase locking but not always once per cycle, the latter is illustrated on the right-hand side of the figure. The fact that firing is occurring from particular places provides the basis for the place theory of pitch perception. The intervals between neural firings (spikes) are analysed and the results are combined to allow common intervals to be found which will tend to be at the fundamental period and its multiples, but predominantly at (1/ f0 ). This is the basis of the temporal theory of pitch perception. The pitch of the sound is based on the results.

3.2.6 Secondary aspects of pitch perception

Fig 3.13 The pitch shifts perceived when the intensity of a sine wave with a constant fundamental frequency is varied

Pitch intensity (Audio Demo)

The perceived pitch of a sound is primarily affected by changes in f0, which is why the pitch of a note is usually directly related to its f0, for example, by stating that A4 has an f0 of 440 Hz as a standard pitch reference. The estimation of f0 forms the basis of both the place and temporal theories of pitch perception. A change in pitch of a particular musical interval manifests itself if the f0 values of the notes concerned are in the appropriate frequency ratio to give the primary acoustic (objective) basis for the perceived (subjective) pitch of the notes and hence the musical interval. Changes in pitch are also, however, perceived by modifying the intensity or duration of a sound while keeping to constant. These are by far secondary pitch change effects compared to the result of varying to, and they are often very subtle.

These secondary pitch effects are summarised as follows. If the intensity of a sine wave is varied between 40 dBSPL and 90 dBSPL while keeping its to constant, a change in pitch is perceived for all to values other than those around 1-2 kHz. For to values greater than 2 kHz the pitch becomes sharper as the intensity is raised, and for to values below 1 kHz the pitch becomes flatter as the intensity is raised. This effect is illustrated in Figure 3.13, and the JND for pitch is shown with reference to the pitch at 60 dBSPL to enable the frequencies and intensities of sine waves for which the effect might be perceived to be inferred. This effect is for sine waves which are rarely encountered in music, although electronic synthesisers have made them widely available. With complex tones the effect is less well defined; Rossing (1989) suggests around 17 cents (0.17 of a semitone) for an intensity change between 65 dBSPL and 95 dBSPL. Rossing give two suggestions as to where this effect could have musical consequences: (i) he cites Parkin (1974) to note that this pitch shift phenomenon is apparent when listening in a highly reverberant building to the release of a final loud organ chord which appears to sharpen as the sound level diminishes, and he suggests that the pitch shift observed for sounds with varying rates of waveform amplitude change, while to is kept constant, should be 'taken into account when dealing with percussion instruments'.

The effect that the duration of a sound has on the perception of the pitch of a note is not a simple one, but it is summarised graphically in Figure 3.14 in terms of the minimum number of

Fig 3.14 The effect of duration on pitch in terms of the number of cycles needed for a distinct pitch to be perceived for a given fundamental frequency

cycles required at a given f0 for a definite distinct pitch to be perceived. Shorter sounds may be perceived as being pitched rather than non-pitched, but the accuracy with which listeners can make such a judgement worsens as the duration of the sound drops below that shown in the figure.

By way of a coda to this section on the perception of pitch, a phenomenon known as 'repetition pitch' is briefly introduced, particularly now that electronic synthesis and studio techniques make it relatively straightforward to reproduce. Repetition pitch is perceived (by most but not all listeners) if a non-periodic noise-based signal, for example the sound of a waterfall, the consonants in see, shoe, fee, or a noise generator, is added to a delayed version of itself and played to listeners. When the delay is altered a change in pitch is perceived. The pitch is equivalent to a sound whose f0 is equal to (1/delay), and the effect works for delays between approximately 1 ms and 10 ms depending on the listener, giving an equivalent f0 range for the effect of 100 Hz to 1000 Hz. With modern electronic equipment it is quite possible to play tunes using this effect!

Repetition Pitch (Audio Demo)


The doppler effect

It is important to note that although pitch and frequency are related they are not necessarily the same. The a sound source is moving the waves are compressed in front of it and the apparent pitch rises. This is called the Dopper effect.

Java Applet demonstrating the Doppler effect

The Doppler Effect (Video Demo)


Reading Assignment

Before next lecture please read Sections

pages 136 to 144 of Acoustics and Psychoacoustics. We may have a brief quiz on these sections at the beginning of the next class.

You Need to Know