Nothing Special   »   [go: up one dir, main page]

Sound Theory KMAE Ref Material

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

SOUND THEORY

SOUND THEORY

Sound Basics

1. The Elements Of Communication

Communication: transfer of information from a source or stimulus through a medium to a reception point. The
medium through which the information travels can be air, water, space or solid objects. Information that is carried
through all natural media takes the form of waves - repeating patterns that oscillate back and forth. E.g. light,
sound, electricity radio and TV waves.

Stimulus: A medium must be stimulated in order for waves of information to be generated in it. A stimulus
produces energy, which radiates outwards from the source in all directions. The sun and an electric light bulb
produce light energy. A speaker, a vibrating guitar string or tuning fork and the voice are sound sources, which
produce sound energy waves.

Medium: A medium is something intermediate or in the middle. In an exchange of communication the medium
lies between the stimulus and the receptor. The medium transmits the waves generated by the stimulus and
delivers these waves to the receptor. In acoustic sound transmission, the primary medium is air. In electronic
sound transmission the medium is an electric circuit, Sound waves will not travel through space although light will.
In space no-one can hear you scream.

Reception/Perception: A receptor must be capable of responding to the waves being transmitted through the
medium in order for information to be perceived. The receptor must be physically configured to sympathetically
tune in to the types of waves it receives. An ear or a microphone is tuned in to sound waves. An eye or a camera is
tuned in to light waves. Our senses respond to the properties or characteristics of waves such as frequency,
amplitude and type of waveform.

2. The Sound Pressure Wave

All sound waves are produced by vibrations of material objects. The object could be the vibrating string of a guitar,
which is very weak, but considerably reinforced by the vibrating wooded body of the instruments soundboard.

Any vibrating object can act as a sound source and produce a sound wave, and the greater the surface area the
object presents to the air the more it can move, or more medium it can displace. All sounds are produced by
mechanical vibration of objects: e.g. Rods, Diaphragms, Stings, Reeds, Vocal chords and Forced airflow

The vibrations may have wave shapes that are simple or complex. These wave shapes will be determined by the
shape, size and stiffness of the source and the manner in which the vibrations are initiated.

They could be initiated by: Hammering (rods), Plucking (string), Bowing (strings), Forced air flow (vibration of air
column - Organ, voice). Vibrations from any of these sources cause a series of pressure fluctuations of the
medium surrounding the object to travel outwards through the air from the source.

It is important to note that the particles of the medium in this case molecules of air, do not travel from the source
to the receiver, but vibrate in a direction parallel to the direction of travel of the sound wave. Thus sound is a
longitudinal wave motion, which propagates at right angles away from the source. The transmission of sound
energy via molecular collision is termed propagation.

© KM College of Music & Technology Page 1


SOUND THEORY

When air molecules are at random position or when there is no sound present in the air medium, normal
atmospheric pressure exists. (0.0002 Dynes per cm2, 0.00002 Pascal or 20mpa)

When a sound source is made to vibrate, it causes the air particles surrounding it to be alternately compressed
and rarefied. Thereby fluctuating the mean air pressure between a higher than normal state and a lower than
normal state. This fluctuation is determined by the rate of vibration and the force at which the vibration was
initiated upon the source.

At the initial forward excursion of the vibrating source, the particle nearest to that sound source is thrown
forward to a point where it comes into contact with another adjacent molecule. After coming into contact with
this adjacent molecule, the molecule will move back along the path of its original travel where its momentum will
cause it to bypass its normal rest position and regress to its extreme rear-ward position from where it will swing
back and finally come to its normal rest.

Recap:

• When there is no sound wave present all particles are in a state of equilibrium and normal air
pressure exists throughout the medium.
• Higher pressure occurs where air particles press together, causing a region of higher than
normal pressure called compression.
• Lower pressure occurs in the medium where adjacent particles are moving a part to create a
partial vacuum causing a region of lower than normal pressure called a rarefaction
• The molecules vibrate at the same rate as the source vibration. Each air-particle vibrates about
its position at rest at the same frequency as the sound source, i.e. there is sympathetic
vibration
• The molecules are displaced from their mean positions by a distance proportional to the
amount of energy in the wave. This means the higher energy from the source the more
displacement from mean position.
• A pressure wave radiates away from the source at a constant speed i.e. the speed of sound in
air.
• Sound pressure fluctuates between positive and negative amplitudes around a zero pressure
median point, which is in fact the prevailing atmospheric pressure.
• These areas of compressions and rare factions move away from the body in the form of a
Longitudinal wave motion -each molecule transferring its energy to another creating an
expanding spherical sound filed.

© KM College of Music & Technology Page 2


SOUND THEORY

3. Speed of sound wave

The speed of sound (c or S) with which the compressions and rarefaction move through the medium is the
velocity of the sound wave or the speed at which it travels through a medium. Sound waves need a material
medium for transmission; any medium that has an abundance of molecules can transmit sound.

The speed of sound varies with the density and elasticity of the medium in which it is travelling through. Sound is
capable of travelling through liquids and solid bodies, through water or steel and other substances. In air, the
velocity is affected by:

Density: A medium with more closely packed molecules; the faster sound will travel in that medium.

Temperature: The speed of sound increases as temperature rises according to the formula

V = 331 + 0.6t m/s Where t is now in Celsius.

Approximately 1 m/s rise for every degree increase in temperature

Humidity: With increase of RH, high frequencies will suffer due to absorption of sound.

At normal temperature (30 degree C) and air pressure (0.0002 dynes/cm2) the velocity of sound is 342 meters per
second. Generally sound moves through water about 4 times as fast as it does through air and through iron it
moves approximately 14 times faster.

The speed of sound can be determined from:

Speed = Distanced travelled/Time taken or Speed = d/t

The speed of sound in another medium is referred to by S. For example, the speed of sound on magnetic tape is
equal to the tape speed at the time of recording i.e. S = 15 or 30 inches per sec (ips) Velocity refers to speed in a
particular direction. As most sound waves move in all directions unless impeded in some way, the velocity of a
sound wave is equivalent to the speed.

It is worth remembering that at normal temperature, pressure and sea level sound travels approx: 1meter in 3
milliseconds i.e. 1 meter in 2.92 milliseconds

Example:

How long does it take for sound to travel 1 km

Time Taken = 1000/342 = 2.92 s

As mentioned above any vibrating object can act as a sound source and thus produce a sound wave. The greater
the surface area the tile object presents to the air, the more air it can move. The object could be the vibrating
string of a guitar, which is very weak, but considerably reinforced by the vibrating wooded body of the
instruments soundboard.

The disturbance of air molecules around a sound source is not restricted to a single source, two or more sources
can emit Sound waves and the medium around each of the sources would be distributed by each of them (the

© KM College of Music & Technology Page 3


SOUND THEORY

instruments) Air by virtue of is elasticity, can support a number of independent sound waves and produce them
simultaneously.

Refers to the velocity at which a particle in the path of a sound wave is moved (displaced) by the wave
as it passes.

Should not be confused with the velocity at which the sound wave travels through the medium, which
is constant unless the sound wave encounters a different medium in which case the sound wave will
refract.

If the sound wave is sinusoidal (sine wave shape), particle velocity will be zero at the peaks of
displacement and will reach a maximum when passing through its normal rest position.

4. Amplitude/Loudness/ Volume/Gain

When describing the energy of a sound wave the term amplitude is used. It is the distance above or below the
centre line of a waveform (such as a pure sine wave).

The greater the displacement of the molecule from its centre position, the more intense the pressure variation or
physical displacement, of the particles, within the medium. In the case of air medium it represents the pressure
change in the air as it deviates from the normal state at each instant.

Amplitude of a sound wave in air is measured in Pascal or dynes per sq, both units of air pressure.
However for audio purposes air pressure differences are more meaningful and these are expressed by
the logarithmic power ratio called the Bel or Decibel (dB).

Waveform amplitudes are measured using various standards.

a. Peak Amplitude refers to the positive and negative maximums of the wave.
b. Root Means Squared Amplitude (RMS) gives a meaningful average of the peak values and
more closely approximates the signal level perceived by our ears. RMS amplitude is equal
to 0.707 times the peak value of the wave.

© KM College of Music & Technology Page 4


SOUND THEORY

Our perception of loudness is not proportional to the energy of the sound wave this means that the
human ear does not perceive all the frequencies at the same intensity. We are most sensitive to tones
in the middle frequencies (3kHz to 4kHz) with decreasing sensitivity to those having relatively lower or
higher frequencies.

Loudness and Volume are not the same: Hi-fi systems have both a loudness switch and a volume
control. A volume control is used to adjust the overall sound level over the entire frequency range of
the audio spectrum (20Hz to 20kHz). A volume control is not frequency or tone sensitive, when you
advance the volume control; all tones are increased in level. A loudness switch increases the low
frequency and high frequency range of the spectrum while not -affecting the mid range tortes.

Fletcher & Munson Curves or equal loudness contours show the response of the human ear
throughout the audio range and reveal that more audio sound power is required at the low end and
high end of the sound spectrum to obtain sounds of equal loudness.

© KM College of Music & Technology Page 5


SOUND THEORY

5. Frequency

The rapidity, which a cycle of vibration repeats, itself is called the frequency.

It is measured in cycles-per-second (c.p.s) or simply (Hertz) One complete excursion of a wave plotted over a 360
degree axis of a circle is known as a cycle. The number of cycles that occur over the period of one second is known
as a Hertz.

Frequency = 1 (second)/ t (period)


E.g.

1/0.01 (seconds per cycle) =100Hz

A cycle can begin at any point on the waveform but to be complete (1 cycle) it must pass through the zero line and
end at a point that has the same value as the starting point.

5.1 Frequency Spectrum

The scope of the audible spectrum of frequencies is from 20Hz to 20KHz. This spectrum is defined by
the particular characteristics of human hearing and corresponds to the pitch or frequency ranges of all
commonly used musical instruments.

5.2 Pitch:

Describes the fundamental or basic tone of a sound. It is determined by the frequency of the tone. The
frequency of the tone is a measure of the number of complete vibrations generated per second. The
greater the number of waves per second the higher the frequency or higher the pitch of the sound.

5.3 Wavelength:

The wavelength of a wave is the actual physical distance covered by a waveform, or stance between
any two corresponding points of a given cycle

Formula: λ= v/f

Where:

λ = lambda or the wavelength in the medium, in meters (m)

V is the velocity of sound in the medium (m/s)

F is the frequency in Hertz (Hz)


Typical wavelengths encountered acoustics:

Frequency Wavelength
20Hz 17.1m
1kHz 34cm
8kHz 4.3cm
20kHz 1.7cm

© KM College of Music & Technology Page 6


SOUND THEORY

5.4 Period

Is the amount of time required for one complete cycle of the sound wave. That is the fraction of time
required for a complete progression of the wave. eg. a 30 Hz sound wave completes 30 cycles each
second or one cycle every 1/30th of a second (0.033s)

Formula: P (or time taken for one complete oscillation) = 1/f

E.g.

Period (t) = 1/30

= 0.033s to complete one cycle of 30 Hz Frequency

5.5 Phase

The concept of phase is important in describing sound waves. It refers to the relative, displacement in
time between waves of the same frequency. The studio engineer must always contend with two
distinct waves.

1. Direct waves

2. Reflected Waves

When the direct sound wave strikes a reflective surface, part of that wave passes through the surface
while that surface material absorbs part of it. The rest is reflected as a delayed wave.

The direct and reflected wave may be wholly or partially in phase with each other: the result is that
they will either reinforce each other or cancel each other at the point of the cycle where they converge.

Since a cycle can begin at any point on a waveform, it is possible to have 2 waves interacting with each
other. If 2 waveforms, which have the same frequency and peak amplitude, are offset in time, they will
have different signed amplitudes at each instant in time. These two waves are said to be out of phase
with respect to each other.

When 2 waveforms are completely in phase (0 degree phase different) and of the same frequency and
peak amplitude are added, the resulting waveform is of the same frequency and phase but will have
twice the original amplitude.

If two waveforms are completely out of phase (180 degree phase different), they will cancel each other
out when added resulting in zero amplitude.

© KM College of Music & Technology Page 7


SOUND THEORY

© KM College of Music & Technology Page 8


SOUND THEORY

5.6 Phase Shift

(Ø) is often used to describe the relationship between 2 waves. When two sound sources produce two
waves in close proximity to each other they will interact or interfere with each other. The type of
interference these two waves produce is dependent upon the phase relationship or Phase Shift
between them. Time delays between waveforms introduce different degrees of phase shift.

Phase relationship characteristics for 2 identical waves:

0 degree phase shift -The waves are said to he completely in-phase or correlated or 100% coherent.
They interfere constructively with their amplitudes being added together.

180-degree phase-shift -The waves are completely out of phase or uncorrelated. Zero Coherence. They
interfere destructively with each other with their amplitudes cancelling each other to produce zero
signal.

Other degrees of phase shift -The waves are partially in phase. Both additions and cancellations will
occur. E.g.

90/270 degrees - equal addition cancellation - 50% coherent

< 90 degrees - more constructive interference

>90 degrees - more destructive interference.

If one wave is offset in time, it will be out of phase with the other. If two waveforms which have the
same frequency and peak amplitude are off set in time, they will interfere constructively or
destructively with each other at certain points of the wave form. The result would be certain
frequencies will be boosted and others will he attenuated.

In music we deal with complex waveforms, it is difficult to perceive the actual phase addition or
subtraction. But the result of out-of-phase conditions will have cancellation of certain frequencies; with
bass frequencies being affected the most. An experiment with home hi-fi can best demonstrate the
phenomenon. Where changing the + and -connections of one of the speakers will result in a phase
different of 180 degrees between the second speaker.

The result will be:

Loss of bass

A change of the mid + high frequencies response, this could be a boost or cut

Loss of stereo image-instrument placement, depth

Overall loss of amplitude

Acoustical phase cancellation is the most destructive condition that can arise during recording. Care
must be taken during recording to position stereo mics to maintain equilateral distance from source.

© KM College of Music & Technology Page 9


SOUND THEORY

Phase shift can he calculated by formula:

Ø = T x Fr x 360

Ø is the phase-shift in degrees

T= time delay in seconds

Fr = Frequency in Hertz

E.g. what will be the phase shift if a 100Hz wave was delayed by 5 milliseconds?

Ø = 0.005 X 100 X 360

= 180 degrees or total phase shift.

The arrival of the second wave will be 180 out of phase and will result in zero amplitude.

What would be the degree of phase shift if a 100Hz wave was delayed by 2.5 milliseconds?

When two waveforms are completely in phase (zero phase different), and of the same frequency and
peak amplitudes are added, the resulting waveform is of the same frequency and phase but will have
twice the original amplitude.

It two waves are completely out of phase (180 phase different) they will cancel each other when added
-resulting in zero amplitude, hence no output.

6. Difference between musical sound and noise

Sound carries an implication that it is something we can hear or that is audible. However sound exists above the
threshold of our hewing called ultrasound -20kHz and above and below our hearing range called infrasound -20Hz
and below.

Regular vibrations produce musical tones. The series of tones/frequencies are vibrating in tune with one another.
Because of their orderliness, we find their tones pleasing.

Noise produces irregular vibrations their air pressure vibrations are random and we perceive them as unpleasant
tones.

A musical note consists of a Fundamental wave and a number of overtones called Harmonies. These harmonies
are known as the harmonic series.

6.1 Harmonic content:

Is the tonal quality or timbre of the sound. Sound waves are made up of a fundamental tone and
several different frequencies called Harmonics or Overtones. Harmonic modes: When strings are struck
as in a piano, plucked (guitar or bowed violin), they tend to vibrate in quite a complex manner. In the
fundamental mode (1st harmonic) the string vibrates or oscillates as a whole with respect to the two
fixed ends. In the case of middle C the frequency generated will be 261.63 Hz.

© KM College of Music & Technology Page 10


SOUND THEORY

But there is also a tendency for the two halves to oscillate. Thus producing a second harmonic, at about
twice the frequency of the fundamental note.

Harmonics are whole numbered multiples of the fundamental. Overtones may or may not be
harmonically related to the fundamental.

The composite waveform comprising the fundamental and its numerous harmonics can look quite
irregular with many sharp peaks and dips. Yet the fundamental tone and each harmonic is made up of a
very regular shape waveform.

6.2 Timbre:

The first mode of vibration has the fundamental frequency vibrating at its extreme ends. There is a
nd
tendency for the string to vibrate at 1/2, 1/3, 1/4 its length. This is described as its 2 mode of
rd
vibration, 3 mode of vibration etc. These subsequent vibrations, along with the fundamental
constitute the timbre or tonal characteristic of a particular instrument.

The factor that enables us to differentiate the same note being played by several instruments is the
harmonic/overtone relationship between the two instruments playing the same note.

A Violin playing 440Hz has a completely different fundamental/ harmonic relationship to a viola. This is
the factor that allows us to recognise the difference between the two instruments although they may
be playing the same note. If the fundamental frequency is 440 Hz its second harmonic will be 880Hz or
twice the fundamental (440 x 2 = 880). The third harmonic will be 1320Hz (440x3=1320)

No matter how complex a waveform is, it can be shown to be the sum of sine waves whose frequencies
are whole numbered multiples of the fundamental.

6.3 Octaves

The octave is a musical term, which refers to the distance between one note and its recurrence higher
or lower in a musical scale. An octave distance is always a doubling of frequency so the octave above
the A at 440 Hz is 880 Hz or the second harmonic. However the next octave above 880 Hz is 1720 Hz or
four times the fundamental. Therefore the octave scale and the harmonic scale are different. The
octave scale is said to be logarithmic while the harmonic scale is linear. The octave range relates
directly to the propensity for human hearing to judge relative pitch or frequency on a 2:1 ratio.

© KM College of Music & Technology Page 11


SOUND THEORY

7. Waveforms Types:

Waveforms are the building blocks of sound. By combining raw waveforms one can simulate any acoustic
instrument. This is how synthesisers make sound. They have waveform generators, which create all types of
waves, which they combine to form composite waveforms, which approximate real instruments.

Generally musical waveforms can be divided into two categories.

1. Simple
2. Complex

Simple - Where the most basic wave is the Sine wave, which traces a simple harmonic motion e.g.
tuning forks, Pendulum, Flute. These waveforms are called simple because they are continuous and
repetitive. One cycle looks exactly like the next and they are perfectly symmetrical around the zero line.
Simple waves contain harmonics.

Complex: Speech and music depart from the simple sine form. We can break down a complex
waveform into a combination of sine waves. This method is called Fourier Analysis, named after the
19th century Frenchman who proposed this method. Wave synthesis combines simple waves into
complex waveforms e.g. the synthesiser. The ear mechanism also distinguishes frequency in complex
waves by breaking them down into sine wave components.

Noise - Noise is a random mixture of sine waves continuously shifting in frequency, amplitude and
phase. There are generally two types of synthetic noise:

i. White Noise - equal energy per frequency


ii. Pink noise: equal energy per octave.

8. Wave shape

Wave shape is created by the amplitude and harmonic components in the wave. To create a Square wave the sine
wave fundamental is combined with a number of odd harmonics at regular intervals.

© KM College of Music & Technology Page 12


SOUND THEORY

Adding even harmonies components with the fundamental creates a Triangle wave shape. A Saw tooth wave is
made of odd and even harmonics added to the fundamental frequency.

© KM College of Music & Technology Page 13


SOUND THEORY

9. Acoustic Envelop:

An important aspect influencing the waveform of a sound is its envelope. Every instrument produces its own
envelop which works in combination with its timbre the determines the subjected sound of the instrument

The envelope of a waveform describes the way its intensity varies in the time that the sound is produced and dies
away. The envelope therefore describes a relationship between time and amplitude. This can be viewed on a
graph by connecting a wave's the peak points of the same polarity over a series of cycles. An acoustic envelope
bus 4 basic sections: attack decay, sustain and release.

1. Attack time is the time it takes for the sound to rise to maximum level.
2. Decay time is the internal dynamics of the instrument (resonance of a Tom drum) can be longer than
main decay (sustain) the Tom can ring for a duration.
3. Sustain time is the sound source is maintained from max levels to mid levels.
4. Release Time is the time it takes for a sound to fall below the noise floor.

THE HUMAN EAR

The organ of hearing the ear operates as a Transducer i.e. it translates wave movement through several mediums
- air pressure variations into mechanical action then to liquid variations and finally to electrical/neural impulses.

1. The Outer Ear

Consists of the Pinna and the ear canal (external meatus). Here sound waves are collected and directed toward
the middle ear. Both the pinna and the ear canal increase the loudness of a sound we hear by concentrating or
focusing the sound waves e.g. the old ear trumpet as a hearing aid.

The ear canal is often compared to an organ pipe in that certain frequencies (around 3KHz) will resonate within it
because of its dimensions (3cm x 0.7cm) i.e. frequencies whose quarter wavelengths are similar in size to the
length of the canal. The ear will perceive this frequency band as louder, which corresponds to critical bandwidth
for speech intelligibility. “Pipe resonance” amplifies the sound pressure falling on the outer ear by around 10dB by
the time it strikes the eardrum, peaking in the 2-4kHz region.

© KM College of Music & Technology Page 14


SOUND THEORY

Wiener and Ross have found that diffraction around the head results in a further amplification effect adding a
further 10dB in the same bandwidth.

2. The Middle Ear

The mechanical movements of the tympanic membrane are transmitted through three small bones known as
ossicles, comprising the malleus, incus and stapes – more commonly known as the hammer, anvil and stirrup – to
the oval window of the cochlea. The oval window forms the boundary between the middle and inner ears.

The malleus (Hammer) is fixed to the middle fibrous layer of the tympanic membrane in such a way that when the
membrane is at rest, it is pulled inwards. Thus the tympanic membrane when viewed downt he auditory canal
from outside appears concave and conical in shape. One end of the stapes (Stirrup) is the stapes footplate is
attached to the oval window of the cochlea. The malleus and incus (Hammer and anvil) are joined quite firmly
such that at normal intensity levels they act as a single unit, rotating together as the tympanic membrane vibrates
to move the stapes via a ball and socket joint in a piston-like manner. Thus acoustic vibrations are transmitted via
the tympanic membrane and ossicles as mechanical movements to the cochlea of the inner ear.

The function of the middle ear is two fold:

• To transmit the movements of the tympanic membrane to the fluid which fills the cochles without
significant loss in energy. And
• To protect the hearing system to some extent from the effects of loud sounds, whether from external
sources or the individual concerned.

In order to achieve efficient transfer of energy from the tympanic membrane to the oval window, the effective
pressure acting on the oval window is arranged by mechanical means to be greater than the that actig on the
tympanic membrane. This is to overcome the higher resistance to movement of the cochlea fluid compared to
that of air at the input tot he ear. Resistance to movement can be thought of as impedance to movement and the
impedance of fluid to movement is high compared to that of air. The ossicles act as a mechanical impedance
converter or impedance transformer and this is achieved by two means

• The lever effect of the malleus (hammer) and incus (anvil)


• The area difference between the tympanic membrane and the stirrup foot plate.

A third aspect of the middle ear which appears relevant to the impedance conversion process is the buckling
movement of the tympanic membrane itself as it moves, resulting in a twofold increase in the force applied the
malleus.

In humans, the area of the tympanic membrane is approximately 13 times larger than the area of the stapes
footplate , and the malleus is approximately 1.3 times the length of the incus. The buckling effect of the tympanic
membrane provides a force increase by a factor of 2. Thus the pressure at the stapes footplate is about (13 X 1.3 X
2 = 33.8) times larger than the pressure at the tympanic membrane.

The second function of the middle ear is to provide some protection for the hearing system from the effects of
loud sounds, whether from external sources or the individual concerned. This occurs as a result of the action of
two muscles in the middle ear: The tensor tympani and the stapedius muscle. These muscles contract
automatically in response to sounds with levels greater than approximately 75dBSPL and they have the effect of
increasing the impedance of the middle ear by stiffening the ossicular chain. This reduces the efficiency with
which vibrations are transmitted from the tympanic membrane to the inner ear and thus protects the inner ear to
some extent from loud sounds. Approximately 12 to 14 dB of attenuation is provided by this protection
mechanism, but this is for frequencies below 1Khz only. The names of these muscles derive from where they

© KM College of Music & Technology Page 15


SOUND THEORY

connect with the ossicular chain: the tensor tympani is attached near the tympanic membrane and the stapedius
muscle is attached to the stapes.

This stiffening of the muscles is known as acoustic reflex. It takes some 60ms to 120 ms for the muscles to
contract in response to a loud sound. In the case of loud impulsive sound such as the firing of a large gun, it has
been suggested that the acoustic reflex is too slow to protect the hearing system.

3. The Inner Ear

The inner ear consists of 2 fluid-filled structures:

The vestibular system consisting of 3 semi-circular canals, the utricle and sacculus- these are concerned with
balance and posture.

The Cochlea is the organ of hearing. It is about the size of a pea and encased in solid bone. It is coiled up like a
seashell, filled with fluid and divided into an upper and a lower part by a pair of membranes (Basilar Membrane
and Tectorial Membrane). The Oval Window opens into the upper part of the cochlea, and the pressure releasing
Round Window into the lower part.

The rocking motion of the Oval Window caused by the Ossicles sets up sound waves in the fluid. Amplitude peaks
for different frequencies occur along the Basilar Membrane in different parts of the cochlea, with lower
frequencies (e.g. 50Hz) towards the end and higher freq. (e.g. 1500 Hz) at the beginning. High frequencies cause
maximal vibration at the stapes end of the basilar membrane where it is narrow and thick. . Low frequencies
cause greater effect at the apical end where the membrane is thin and wide.

The waves act on hair like nerve terminals bunched under the Basilar Membrane in the Organ of Corti. These
nerves convey signals in the form of neuron discharges to the brain. These potentials are proportional to the
sound pressure falling on the ear over an 80dB range. These so called “microphonic” potentials were actually
picked up and amplified from the cortex of an anaesthetized cat.

4. Neural Processing

Nerve signals consist of a number of electrochemical impulses, which pass along the fibres at about 10m/s
intervals. Intensity is conveyed by the mean rate of the impulses. Each fibre in the cochlea nerve responds most
sensitively to its own characteristic frequency (CF) requiring a minimum spl at this frequency to stimulate it or
raise it detectably. The CF is directly related to the part of the basilar membrane from which the stimulus arises.

While the microphonic signals are analog, the neuron discharges are caused by the cochlea nerves either firing
(on) or not firing (off) producing a type of binary code, which the brain interprets. The loudness of a sound is
related to the number of nerve fibres excited (3,000 maximum) and the repetition rates of such excitation. A
single fibre firing would represent the threshold of sensitivity.

In this way component frequencies (partials) of a signal are separated and their amplitudes measured. The ear
interprets this information as a ratio of amplitudes, deciphering it to give a picture of harmonic richness or Timbre
of the sound.

5. The Ear and Frequency Perception

© KM College of Music & Technology Page 16


SOUND THEORY

This section considers how well the hearing system can discriminate between individual frequency components of
an input sound. This will provide the basis for understanding the resolution of the hearing system and it will
underpin discussions relating to the psychoacoustics of how we hear music, speech and other sounds.

Each component of an input sound will give rise to a displacement of the basilar membrane. At a particular place.
The displacement due to each individual component is spread to some extent on either side of the peak. Whether
or not two components that are of similar amplitude and close together in frequency can be discriminated
depends on the extent to which the basilar membrane displacements due to each of the two components are
clearly separated or not.

5.1 Critical Bandwidth and Beats

Suppose two pure tones, or sine waves, with amplitudes A1 and A2 and frequencies F1 and F2 are
sounded together. If F1 is fixed and F2 is changed slowly from being equal to or in unison with F1 either
upwards or downwards in frequency, the following is generally heard. When F1 is equal to F2 a single
note is heard. As soon as F2 is moved higher (lower) than F1 a sound with a clearly undulating amplitude
variations known as beats is heard. The frequency of the beats is equal to the (F2 - F1 ) or (F1 – F2 ) if F1 is
greater than F2 and the amplitude varies between (A1 + A2 ) and ((A1 - A2 ), or (A1 + A2 ) and (A2 – A1 ) If
A2 is greater than A1. Note that when the amplitudes are equal, (A1 = A2 ) the amplitude of the beats
varies between 2 X A1 and 0.

For the majority of listeners, beats are usually heard when the frequency difference between the tones
is less than about 12.5Hz, and the sensation of beats generally gives away to one of a fused tone which
sounds rough when the frequency difference is increased above 15Hz. As the frequency difference is
increased further there is a point where the fused tone gives way to two separate tones but still with
the sensation of roughness, and a further increase in frequency difference is needed for the rough
sensation to become smooth. The smooth separate sensation persists while the two tones remain
within the frequency range of the listener’s hearing.

There is no exact frequency difference at which the change from fused to separate and from beats to
rough to smooth occur for every listener. However, the approximate frequency and order in which they
occur is common to all listeners, and in common with all psychoacoustic effects, average values are
quoted which are based on measurements made for a large number of listeners.

The point where the two tones are heard as separate as opposed to fused when the frequency
differnce is increased can be thought of as the point where two peak displacements on the basilar
membrane begin to emerge from a single maximum displacement on the membrane. However, at this
point the underlying motion of the membrane which gives rise to the two peaks causes them to
interfere with each other giving the rough sensation, and it is only when the rough sensation becomes
smooth that the separation of the places on the membrane is sufficent to fully resolve the two tones.
The frequency difference between the pure tones at the point where a listener’s perception changes
from rough and separate to smooth and separate is known as critcal bandwidth. A more formal
definition is given by Scharf (1970), ‘ the critical bandwidth is that bandwidth at which subjective
responses rather abruptly change.’

The critical bandwidth changes according to frequency. IN practice Critical bandwidth is usually
measured by an effect known as masking in which the rather abrupt change is more clearly perceived
by listeners. Masking is when one frequency can not be heard as a result of another frequency that is
louder and close to it.

6. Frequency range and pressure sensitivity of the ear

© KM College of Music & Technology Page 17


SOUND THEORY

The frequency range of the human ear- the human ear is usually quoted as having a frequency range of 20Hz to
20,000Hz (20Khz) but this is not necessarily the case for every individual. This range changes as part of the human
ageing process, particularly in terms of the upper limit which tends to reduce. Healthy young children may have a
full range hearing range up to 20Khz, but by the age of 20, the upper limit may have dropped to 16Khz. It
continues to reduce gradually to about 8Khz by retirement age. This is known as presbyacusis or presbycusis and
is a function of normal ageing process. This reduction in the upper freqency limit of the hearing range is
accompanied by a decline in hearing sensitivity at all frequencies with age, the decline being less for low
frequencies than for high. Hearing losses can also be induced from prolonged exposure to loud sounds.

The ear’s sensitivity to sounds of different frequencies varies over a vast sound pressure level range. On Average,
the minimum sound pressure variation which can be detected by the human hearing system around 4Khz is
--5
approximately 10 micropascals, 10 Pa. This is the threshold of hearing.

The maximum average sound pressure level which is heard rather than perceived as being painful is 20Pa. This is
the threshold of pain

7. Noise-induced hearing loss

The ear is a sensitive and accurate organ of sound transduction and analysis. However, the ear can be damaged
by exposure to excessive levels of sound or noise. This damage can manifest itself in two major forms :

A loss of hearing sensitivity

The effect of noise exposure causes the efficiency of the transduction of sound into nerve impulses to
reduce. This is due to damage to the hair cells in each of the organs of corti. Note this is different from
the threshold shift due to the acoustic reflex which occurs over a much shorter time period and is a
form of built-in hearing protection. This loss of sensitivity manifests itself as a shift in the threshold of
hearing that they can hear. This shift in the threshold can be temporary, for short times of exposures,
but ultimately it becomes permanent as the hair cells are permanently flattened as a result of the
damage, due to long-term exposure, which does not allow them time to recover.

A loss of hearing acuity

This is a more subtle effect but in many ways is more severe than the first effect. We have seen that a
crucial part of our ability to hear and analyse sounds is our ability to separate out the sounds into
distinct frequency bands, called critical bands. These bands are very narrow. Their narrowness is due
to an active mechanism of positive feedback in the cochlea which enhances the standing wave effects
mentioned earlier. This enhancement mechanism is very easily damaged; it appears to be more
sensitive to excessive noise than the main transduction system. The effect of the damage though is not
just to reduce the threshold but also to increase the bandwidth of our acoustic filters. This has two
main effects:

• Firstly, our ability to separate out the different components of the sound is impaired, and this
will reduce our ability to understand speech or separate out desired sound from competing
noise. Interestingly it may well make musical sounds which were consonant more dissonant
because of the presence of more than one frequency harmonic in a critical band.
• The second effect is a reduction in the hearing sensitivity, because the enhancement
mechanism also increases the amplitude sensitivity of the ear. This effect is more insidious
because the effect is less easy to measure and perceive; it manifests itself as a difficulty in
interpreting sounds rather than a mere reduction in their perceived level.

© KM College of Music & Technology Page 18


SOUND THEORY

Another related effect due to damage to the hair cells is noise-induced tinnitus. Tinnitus is the name
given to a condition in which the cochlea spontaneously generates noise, which can be tonal, random
noises, or a mixture of the two. In noise-induced tinnitus exposure to loud noise triggers this, and as
well as being disturbing, there is some evidence that people who suffer from this complaint may be
more sensitive to noise induced hearing damage.

Because the damage is caused by excessive noise exposure it is more likely at the frequencies at which
the acoustic level at the ear is enhanced. The ear is most sensitive at the first resonance of the ear
canal, or about 4 kHz, and this is the frequency at which most hearing damage first shows up. Hearing
damage in this region is usually referred to as an audiometric notch. This distinctive pattern is evidence
that the hearing loss measured is due to noise exposure rather than some other condition, such as the
inevitable high-frequency loss due to ageing.

How much noise exposure is acceptable? There is some evidence that the normal noise in Western society has
some long-term effects because measurements on the hearing of other cultures have shown that there is a much
lower threshold of hearing at a given age compared with Westerners. However, this may be due to other factors
as well; for example, the level of pollution, etc. There is strong evidence, however, that exposure to noises with
amplitudes of greater than 90 dB(SPL) will cause permanent hearing damage. This fact is recognised by legislation
which requires that the noise exposure of workers be less than this limit. Note that if the work environment has a
noise level of greater than this then hearing protection of a sufficient standard should be used to bring the noise
level, at the ear.

8. Protecting your hearing

Hearing loss is insidious and permanent and by the time it is measurable it is too late. Therefore in order to
protect hearing sensitivity and acuity one must be proactive. The first strategy is to avoid exposure to excess
noises. Although 90 dB(SPL) is taken as a damage threshold if the noise exposure causes ringing in the ears,
especially if the ringing lasts longer than the length of exposure, it may be that damage may be occurring even if
the sound level is less than 90 dB(SPL).

There are a few situations where potential damage is more likely.

• The first is when listening to recorded music over headphones, as even small ones are capable of
producing damaging sound levels.
• The second is when one is playing music, with either acoustic or electric instruments, as these are also
capable of producing damaging sound levels, especially in small rooms with a ‘live’ acoustic.

In both cases the levels are under your control and so can be reduced. However, the acoustic reflex, reduces the
sensitivity of your hearing when loud sounds occur. This effect, combined with the effects of temporary threshold
shifts, can result in a sound level increase spiral, where there is a tendency to increase the sound level ‘to hear it
better’ which results in further dulling, etc. The only real solution is to avoid the loud sounds in the first place.
However, if this situation does occur then a rest away from the excessive noise will allow some sensitivity to
return.

There are sound sources over which one has no control, such as bands, discos, night clubs, and power tools. In
these situations it is a good idea either to limit the noise dose or, better still, use some hearing protection. For
example, one can keep a reasonable distance away from the speakers at a concert or disco. It takes a few days, or
even weeks in the case of hearing acuity, to recover from a large noise dose so one should avoid going to a loud
concert, or nightclub, every day of the week! The authors regularly use small ‘in-ear’ hearing protectors when
they know they are going to be exposed to high sound levels, and many professional sound engineers also do the
same. These have the advantage of being unobtrusive and reduce the sound level by a modest, but useful,

© KM College of Music & Technology Page 19


SOUND THEORY

amount (15-20 dB) while still allowing conversation to take place at the speech levels required to compete with
the noise! These devises are also available with a ‘flat’ attenuation characteristic with frequency and so do not
alter the sound balance too much, and cost less than a CD recording. For very loud sounds, such as power tools,
then a more extreme form of hearing protection may be required, such as headphone style ear defenders.

Your hearing is essential, and irreplaceable, for both the enjoyment of music, for communicating, and socialising
with other people. Now and in the future, it is worth taking care of.

9. Perception of sound source direction

How do we perceive the direction that a sound arrives from ?

The answer is that we make use of our two ears, but how ? Because our two ears are separated by our head, this
has an acoustic effect which is a function of the direction of the sound. There are two effects of the separation of
our ears on the sound wave: firstly the sounds arrive at different times and secondly they have different
intensities. These two effects are quite different so let us consider them in turn.

9.1 Interaural time difference (ITD)

Because the ears are separated by about 18 cm there will be a time difference between the sound
arriving at the ear nearest the source and the one further away. So when the sound is off to the left the
left ear will receive the sound and when it is off to the right the right ear will hear it first. If the sound is
directly in front, or behind, or anywhere on the median plane, the sound will arrive at both ears
simultaneously. The time difference between the two ears will depend on the difference in the lengths
-4
that the two sounds have to travel. the maximum ITD occurs at 90° and is = 6.73 × 10 s (673 µs).

Note that there is no difference in the delay between front and back positions at the same angle. This
means that we must use different mechanisms and strategies to differentiate between front and back
sounds. There is also a frequency limit to the way in which sound direction can be resolved by the ear
in this way. This is due to the fact that the ear appears to use the phase shift in the wave caused by the
interaural time difference to resolve the direction.

When the phase shift is greater than 180° there will be an unresolvable ambiguity in the direction
because there are two possible angles—one to the left and one to the right—that could cause such a
phase shift. This sets a maximum frequency, at a particular angle, for this method of sound localization
and this angle is 743Hz.

Thus for sounds at 90° the maximum frequency that can have its direction determined by phase is 743
Hz. However, the ambiguous frequency limit would be higher at smaller angles.

9.2 Interaural intensity difference (IID)

The other cue that is used to detect the direction of the sound is the differing levels of intensity that
result at each ear due to the shading effect of the head. The levels at each ear is equal when the sound
source is on the median plane but the level at one ear progressively reduces, and increases at the
other, as the source moves away from the median plane. The level reduces in the ear that is furthest
away from the source.

This means that there will be a minimum frequency below which the effect of intensity is less useful for
localisation which will correspond to when the head is about one third of a wavelength in size (1/3λ).
For a head the diameter of which is 18 cm, this corresponds to a minimum frequency of about 637Hz

© KM College of Music & Technology Page 20


SOUND THEORY

Thus the interaural intensity difference is a cue for direction at high frequencies whereas a the
interaural intensity difference is a cue for direction at low frequencies. Note that the cross-over
between the two techniques starts at about 700 Hz and would be complete at about four times this
frequency at 2.8 kHz. In between these two frequencies the ability of our ears to resolve direction is
not as good as at other frequencies.

9.3 Pinnae and head movement effects

The above models of directional hearing do not explain how we can resolve front to back ambiguities or
the elevation of the source. There are in fact two ways which are used by the human being to perform
these tasks.

The first is to use the effect of our ears on the sounds we receive to resolve the angle and direction of
the sound. This is due to the fact that sounds striking the pinnae are reflected into the ear canal by the
complex set of ridges that exist on the ear. These pinnae reflections will be delayed, by a very small but
significant amount, and so will form comb filter interference effects on the sound the ear receives. The
delay that a sound wave experiences will be a function of its direction of arrival, in all three dimensions,
and we can use these cues to help resolve the ambiguities in direction that are not resolved by the
main directional hearing mechanism. The delays are very small and so these effects occur at high audio
frequencies, typically above 5kHz. The effect is also person specific, as we all have differently shaped
ears and learn these cues as we grow up. Thus we get confused for a while when we change our
acoustic head shape radically, by cutting very long hair short for example. We also find that if we hear
sound recorded through other people’s ears that we have a poorer ability to localise the sound,
because the interference patterns are not the same as those for our ears.

The second, and powerful, means of resolving directional ambiguities is to move our heads. When we
hear a sound that we wish to attend to, or resolve its direction, we move our head towards the sound
and may even attempt to place it in front of us in the normal direction, where all the delays and
intensities will be the same. The act of moving our head will change the direction of the sound arrival
and this change of direction will depend on the sound source position relative to us. Thus a sound from
the rear will move in different direction compared to a sound in front of or above the listener. This
movement cue is one of the reasons that we perceive the sound from headphones as being ‘in the
head’. Because the sound source tracks our head movement it cannot be outside and hence must be in
the head. There is also an effect due to the fact that the headphones also do not model the effect of
the head. Experiments with headphone listening which correctly model the head and keep the source
direction constant as the head moves give a much more convincing illusion.

9.4 The Haas effect

The effect can be summarised as follows :

• The ear will attend to the direction of the sound that arrives first and will not attend to the
reflections providing they arrive within 30 ms of the first sound.
• The reflections arriving before 30 ms are fused into the perception of the first arrival.
However, if they arrive after 30 ms they will be perceived as echoes.

These results have important implications for studios, concert halls and sound reinforcement systems.
In essence it is important to ensure that the first reflections arrive at the audience earlier than 30 ms to
avoid them being perceived as echoes. In fact it seems that our preference is for a delay gap of less
than 20 ms if the sound of the hall is to be classed as ‘intimate’. In sound reinforcement systems the
output of the speakers will often be delayed with respect to their acoustic sound but, because of this

© KM College of Music & Technology Page 21


SOUND THEORY

effect, we perceive the sound as coming from the acoustic source, unless the level of sound from the
speakers is very high.

10. Ear Training

The basic requirement of a creative sound engineer is to be able to listen well and analyse what they hear. There
are no golden ears, just educated ears. A person develops his or her awareness of sound through years of
education and practice.

We have to constantly work at training our ears by developing good listening habits. As an engineer, we can
concentrate our ear training around three basic practices - music, microphones and mixing.

Listening to Music

Try and dedicate at least half an hour per day to listening to well recorded and mixed acoustic and
electric music. Listen to direct-to-two track mixes and compare with heavily produced mixes. Listen to
different styles of music, including complex musical forms. Note the basic ensembles used, production
clichés and mix set-ups.

Also attend live music concerts. The engineer must learn the true timbral sound of an instrument and
its timbral balances. The engineer must be able to identify the timbral nuances and the characteristic of
particular instruments.

Learn the structuring of orchestral balance. There can be an ensemble chord created by the string
section, the reeds and the brass all working together. Listen to an orchestra live, stand in front of each
section and hear its overall balance and how it layers with other sections.

For small ensemble work, listen to how a rhythm section works together. How bass, drums, percussion,
guitar and piano interlock. Learn the structure of various song forms such as verse, chorus, break etc.
Learn how lead instrument and lead vocals interact with this song structure. Notice how instrumentals
differ from vocal tracks.

Listen to sound design in a movie or TV show. Notice how the music underscores the action and the
choice of sound effects builds a mood and a soundscape. Notice how tension is built up and how
different characters are supported by the sound design. Notice the conventions for scoring for different
genres of film and different types of TV.

For heavily produced music, listen for production tricks. Identify the use of different signal processing
FX. Listen for panning tricks, doubling of instruments and voices.

Analyse a musical mix into the various components of the sound stage. Notice the spread of
instruments from left to right, front to back up and down. Notice how different stereo systems and
listening rooms influence the sound of the same piece of music.

© KM College of Music & Technology Page 22


SOUND THEORY

Listening with Microphones

Mic placement relative to the instrument can provide totally different timbral colour eg proximity boost
on closely placed cardiod mics. A mic can be positioned to capture just a portion of the frequency
spectrum of an instrument to be conducive with a particular “sound” or genre. E.g. rock acoustic piano
may favour the piano’s high end and require close miking near the hammers to accent percussive
attack, a sax may be miked near the top to accent higher notes or an acoustic guitar across the sound
hole for more bass.

The way an engineer mics an instrument is influenced by:

• Type of music
• Type of instrument
• Creative Production
• Acoustics of the hall or studio
• Type of mic
• Leakage considerations

Always make A/B comparisons between mics different and different positions. The ear can only make
good judgements by making comparisons.

In the studio reflections from stands, baffles, walls, floor and ceiling can affect the timbre of
instruments. This can cause timbre changes, which can be problematic or used to capture an “artistic”
modified spectrum. When miking sections, improper miking can cause timbre changes due to
instrument leakage. The minimum 3:1 mic spacing rule helps control cross-leakage.

Diffuser walls placed around acoustic instruments can provide an openness and a blend of the direct
/reflected sound field. Mic placement and the number of diffusers and their placement can greatly
enhance the “air” of the instrument.

An Engineer should be able to recognise characteristics of the main frequency bands with their ears.

Hz Band Characteristics Positive Negative


16 – 160 Extreme lows Felt more than heard Warmth Muddyness

160-250 Bass No stereo information Fatness Boominess, Boxiness

250-2000 Low Mid- Harmonics start to Body Horn-like(500-1000 Hz)


range occur Ear fatigue (1kHz-2kHz)
2000-4000 High Mid- Vocal intelligibility Gives definition Tinny, thin
range
4000-6000 Presence Loudness and Definition, energy, Brash
closeness. Spatial closeness
information
6000-20000 Highs Depth of field Air. Crispness. Noise
Boosting/cutting helps
create senses
/distance

Listening in Foldback and Mixdown

© KM College of Music & Technology Page 23


SOUND THEORY

A balanced cue mix captures the natural blend of the musicians. Good foldback makes musicians play
with each other instead of fighting to be heard. If reed and brass players can’t hear themselves and the
rest of the group they tend to overplay. A singer will back off from the mic if their voice in the
headphone mix is to loud, or swallow the mic if the mix is too soft. They will not stay in tune if they
cannot hear backing instruments. Musicians will aim their instruments at music stands or walls for
added reflection to help them overcome a hearing problem.

Dimensional Mixing: The final 10% of a mix picture is spatial placement and layering of instruments or
sounds. Dimensional mixing encompasses timbral balancing and layering of spectral content and effects
with the basic instrumentation. For this, always think sound in dimensional space: left/right, front/back,
up/down.
Think of a mix in Three Levels:

Level A 0 to 1 meter
Level B 1 to 6 meters
Level C 6 meters and further

Instruments which are tracked in the studio are all recorded at roughly the same level (SOL) and are
often close miked. If an instrument is to stand further back in the mix it has to change in volume and
frequency. Most instruments remain on level B so you can hear them all the time. Their dynamics must
be kept relatively stable so their position does not change. Level A instruments will be lead and solo
instruments. Level C can be background instruments, loud instruments drifting in the background,
sounds, which are felt, rather than heard and Reverb.

Control Room Acoustics: The studio control room must be as neutral as possible if we are to judge the
accuracy of what we have miked or what we are mixing. The control room is an entire system that
includes:

1. Room Acoustics (modal. absorption, diffusion, isolation)


2. Early Reflections: when early, diffused energy fills in the time delay gap and enlarges the
perceived size and depth of the listening environment.
3. Shell stiffness and mass
4. Mixing areas
5. Loudspeakers
6. Loudspeaker decoupling
7. Loudspeaker placement referenced to mix position
8. System gain structure
9. Electronics
10. Grounding
11. Mechanical noise (air con, equipment etc)
12. Equipment and cabinet placement

Every effort should be made during mixdown to listen to the mix on near field and big monitors.

Work at around 85 dB SPL but listen at many levels for brief periods of time. Listen in mono. Compare
with mixes with the same production style. Take rests and don’t let your ears get fatigued.

© KM College of Music & Technology Page 24

You might also like