Annu. Rev. Neurosci. 1997. 20:331–53
Copyright c 1997 by Annual Reviews Inc. All rights reserved
NEUROBIOLOGY OF SPEECH
PERCEPTION
R. Holly Fitch, Steve Miller, and Paula Tallal
Center for Molecular and Behavioral Neuroscience, Rutgers, The State University of
New Jersey, 197 University Avenue, Newark, New Jersey 07102
KEY WORDS:
auditory system, temporal processing, acoustic cues, timing, Wernicke’s area
ABSTRACT
The mechanisms by which human speech is processed in the brain are reviewed
from both behavioral and neurobiological perspectives. Special consideration is
given to the separation of speech processing as a complex acoustic-processing
task versus a linguistic task. Relevant animal research is reviewed, insofar as
these data provide insight into the neurobiological basis of complex acoustic
processing in the brain.
Introduction
The mechanisms through which the human brain can perceive and discriminate
complex and rapidly changing components of human speech are not, as yet,
well understood. At a very basic level, research has failed to determine the neural mechanisms that encode simple high-frequency sounds in less time than the
refractory periods of individual neurons. On a larger scale, scientists have not
yet deciphered the mechanisms by which the temporally complex acoustic signals of speech, composed of multiple frequencies (i.e. formants) changing over
times as short as 10 ms, are encoded in auditory cortex and interpreted with the
rich complexity of language—all within the time constraints of ongoing speech.
Nevertheless, a variety of research approaches have been employed to address
these questions, and resulting data—with special emphasis on data relevant to
the neurobiological bases of speech perception—are reviewed here. The extent
to which existing data support neurobiological representation of speech as a
function of complex acoustic properties versus linguistic content is also examined. Discussion of research avenues is roughly divided into neurological and
behavioral studies of impaired populations; behavioral studies focusing on the
331
0147-006X/97/0301-0331$08.00
332
FITCH, MILLER & TALLAL
relative importance of different acoustic cues to normal and abnormal speech
perception; neuroimaging studies performed on intact or speech impaired humans during speech perception and/or auditory processing tasks; and animal
studies of discrimination for species-specific communicative stimuli or complex auditory stimuli, including speech. We do not cover issues pertaining to
the neural bases of higher-order aspects of language function such as syntax or
semantics (but see Garrett 1995, Petersen & Fiez 1993 for further discussion).
Rather, we focus here on the elemental neural mechanisms for perceiving and
discriminating the complex acoustic signals comprising the individual sounds
of speech (phonemes).
The inclusion of studies on central auditory processing in nonhuman species
within a paper on speech perception may be regarded by some as unjustly reductionistic, particularly by those who maintain the special nature of speech as
compared to other forms of acoustic information processing. Indeed, it can be
argued that the most fundamental controversy in the area of speech research pertains to whether human speech and language abilities emerged from a language
“module” unique to the human brain (Wilkins & Wakefield 1995, Liberman
& Mattingly 1985), or through the elaboration of more basic sensory, motor,
and cognitive neural mechanisms common to human and nonhuman species
(e.g. see Fitch et al 1993). At the very least, studies that focus on auditoryprocessing mechanisms for complex signals, including speech, in nonhuman
species provide essential comparative data that allow theories pertaining to the
evolution of human speech perception to be empirically assessed.
What Is Speech?
Speech is an acoustic signal comprised of multiple co-occurring frequencies,
called formants. Whereas vowel sounds consist of specific combinations of
temporally static, steady-state frequencies (see Figure 1), consonants contain
variable onset times and rapid transitions of frequencies that change within
syllables to/from a place of articulation (determined by the position of the speech
apparatus) to/from the frequencies required to produce component vowels (see
Figure 2).
Although speakers vary widely in the size and shape of their vocal tract and
thus the fundamental frequency (pitch) of their speech, the relative combinations
of frequency required to produce speech signals are consistent and replicable
across speakers. Thus an /æ/ sound can be consistently identified by a normal
listener regardless of the pitch of the speaker (e.g. female, male, or child).
This phenomenon indicates that absolute frequency per se is not critical to
speech recognition. Rather, recognition depends on the relative combination
of co-occurring static or transient frequencies. If a specific combination of
frequencies consistently produced a specific speech sound, regardless of the
NEUROBIOLOGY OF SPEECH PERCEPTION
Figure 1 Spectrograph for vowel stimuli /æ/ and /a/.
Figure 2 Spectrograph for consonant-vowel (CV) syllables /ba/ and /da/.
333
334
FITCH, MILLER & TALLAL
preceding and ensuing sounds, then speech could be mapped according to
relatively simple acoustic codes. However, the situation is more complex.
Figure 3 shows spectrographs for a series of consonant-vowel (CV) syllables
beginning with the same consonant. Note that the specific formant transitions
produced when moving the articulators from the starting place for generation of
the consonant to that of the ensuing vowel vary considerably depending on the
initial frequencies, as well as those of the subsequent vowel. This is because
the articulators “anticipate” the vowel even while the initial consonant is being
produced. Thus, frequencies that comprise a consonant sound in a specific
context also carry information indicating in advance which vowel is “coming
up.” This process is called co-articulation.
Interestingly, despite such significant variations in acoustic temporal and
spectral characteristics, normal listeners are able to consistently identify a given
speech sound (phoneme) regardless of the context of other adjacent phonemes.
To compound this processing problem, many of the most significant cues needed
to distinguish similar speech sounds (e.g. the cue that differentiates /ba/ from
/da/) occur within extremely brief time windows. For example, the duration of
formant transitions shown in Figure 3 is approximately 40 ms, and these brief
components of acoustic information must be encoded and identified within the
time constraints of ongoing speech. How does the human brain accomplish
this feat? In order to address this question, we need to understand the role of
spectral and temporal acoustic structure in speech perception, as well as the
mechanisms by which acoustic signals with considerable variance in acoustic
structure come to be represented as the same speech sound (phoneme) in the
brain.
Basic Overview of the Auditory System
Acoustic information is encoded by physical transduction that occurs when
sound waves (vibrations) are passed from the tympanic membrane into the
cochlea. Vibrations within the organ of corti cause hair cells to bend at the frequency of incoming sound, and this transduction excites contacting spiral neurons that pass the signal through the auditory nerve. The auditory nerve projects
to the cochlear nucleus, at which point subsets of ascending fibers cross to the
contralateral superior olive and inferior colliculus, and other fibers synapse on
the ipsilateral superior olive. Projections from the superior olive move through
the lateral lemniscus, reach the inferior colliculus (IC), and continue through
the medial geniculate nucleus (MGN) of the thalamus, primary auditory cortex
(A1), and secondary auditory cortex (A2) (see Aitken et al 1984 or Miller &
Towe 1979 for more detailed discussion of the auditory system). Within secondary auditory cortex, on the superior temporal gyrus, lies Wernicke’s area,
a region traditionally associated with the perception of speech (see Figure 4).
NEUROBIOLOGY OF SPEECH PERCEPTION
335
Figure 3 Acoustic structural differences in consonant formants as a function of ensuing vowel. [Reprinted with permission
from Delattre et al (1955). Copyright © 1955 Acoustical Society of America.]
336
FITCH, MILLER & TALLAL
Figure 4 Wernicke’s and Broca’s areas in human temporal cortex. [From Geschwind (1979).
Copyright c 1979 by Scientific American, Inc. All rights reserved.]
More recent studies have also demonstrated activation in frontal regions of the
brain (e.g. Broca’s area) during speech and auditory perception tasks.
Cumulative studies on the functional organization of the auditory system
for the processing of spectral (frequency) cues have delineated a highly organized pattern of tonotopic mapping throughout the primary ascending stations
of the auditory system (e.g. Imig et al 1977, Imig & Morel 1983, Kelly 1980,
Merzenich & Reid 1974). This pattern is reflected all the way from the functional organization of the cochlea itself to the central nucleus of the IC, the
ventral nucleus of the MGN, and A1. In each of these regions, laminar tonotopic organization (or layered organization on the basis of preferred frequency)
has been demonstrated.
Organization for temporal encoding within the auditory system is less well
understood. Research suggests that temporal organization within the primary
auditory stations may reflect a differential specialization of subregions that
NEUROBIOLOGY OF SPEECH PERCEPTION
337
are based on temporal resolution (e.g. subregions of auditory structures may
be specialized for different rates of temporal encoding) (Schreiner et al 1983;
Schreiner & Urbas 1984, 1986, 1988; Schreiner & Langner 1988). Evidence
also suggests that organization within auditory relay centers appears to be topographically organized, much like for spectral cues, on the basis of sensitivity
to ranges in frequency modulation (Schreiner & Langner 1988).
Moreover, temporal information appears to be encoded with the highest degree of resolution lower in the auditory system. As one progresses up the
ascending pathway, auditory centers appear to respond to increasingly “segmented” components of temporal information and to correspondingly lose temporal resolution for individual bits (Rees & Moller 1983; Schreiner & Langner
1988). Thus at the level of the auditory nerve, temporal information may be
encoded on a virtual millisecond-to-millisecond basis (Palmer 1982), while in
the medial geniculate nucleus or primary auditory cortex, neural responses as
measured by electrophysiology appear to occur at the onset of segments of
temporal change (Schreiner & Langner 1988). Such organization may render
the cortex specialized for responding to sequences of events within complex
signals. This phenomenon may have special significance for speech perception: This process requires transformation from a complex acoustic signal that
is characterized by significant variation into a specific representation of a given
individual’s speech repertoire. These higher-level questions of neural representation can be addressed, at least in part, by neuroimaging studies of cortical
activity during speech processing tasks—a topic discussed below.
Neurologic and Behavioral Studies
in Language-Impaired Populations
Studies of speech perception in humans with quantifiable brain damage, as well
as morphometric and behavioral studies on individuals with impaired language
processing abilities, have historically provided the foundation of knowledge
pertaining to the involvement of specific neural regions in speech processing.
The classic view of the neuroanatomical representation of speech perception
in the brain was derived from clinical data obtained from adults with acquired
lesions (gunshot wounds or stroke), which involved tissue damage to rather large
and ill-defined cortical brain areas. Based on such neurological studies, speech
production functions were ascribed to frontal regions anterior and superior to
the sylvian fissure (i.e. Broca’s area, Brodmann’s area 44), whereas speech
perception was thought to reside in temporal regions posterior and inferior to
the sylvian fissure (i.e. Wernicke’s area, Brodmann’s area 22; see Figure 4).
Little or no functional significance for speech processing was attributed to
subcortical brain regions. In addition, neural substrates of speech were thought
to reside in the left hemisphere, with little or no interference in speech and
338
FITCH, MILLER & TALLAL
language processes following damage to homologous areas in the right hemisphere.
Ongoing research conducted over the past decade has, however, produced an
overwhelming amount of data that has substantially modified our views on the
neurobiology of human speech and language. For example, errors in phonological perception and production are seen when either Broca’s or Wernicke’s area
is damaged (Blumstein 1995). Phonological perception errors are frequently restricted to patients with damage to the left hemisphere. Further, several studies
have shown that left hemisphere damage interferes more with the perception of
place of articulation (place) than with voice onset time (voicing) cues, whereas
right hemisphere damage and nondamaged controls process place and voicing
cues equally well (Oscar-Berman et al 1975, Blumstein et al 1977, Micelli
et al 1978, Perecman & Kellar 1981). These findings support, at least in part,
a common pathway or representation for both the perception and production
of phonological information. They also demonstrate that widespread cortical
regions (both anterior and posterior), specifically in the left hemisphere, are
involved in speech perception.
Moreover, recent research has revealed a heretofore unrecognized role for
subcortical structures in the processing of speech and language. Data from
extracellularly recorded neuronal activity in adults undergoing surgery for intractable epilepsy (Ojemann, 1991) and behavioral deficits in individuals with
acquired language syndromes (Damasio et al 1982, Robin & Schienberg 1990)
have suggested more than a secondary role for subcortical brain structures,
particularly the basal ganglia and thalamic nuclei, in language processes (see
Crosson 1992 for a review of this literature). Importantly, the extent to which
these subcortical brain areas are abnormal may directly reflect both the outcome and the nature of language impairments in children (Aram et al 1990,
Ludlow et al 1986). For example, studies have demonstrated that children with
left neocortical damage show language deficits that appear to be recoverable,
whereas children with damage that extends into the caudate nucleus show more
pervasive and lasting language deficits (Aram et al 1985). Research has also
shown that damage to the thalamus, particularly the left ventro-lateral and pulvinar thalamic nuclei, impairs language processing (Crosson 1992, Mateer &
Ojemann 1983). Moreover, Hugdahl and colleagues (1990) found that stimulation to the left side of the thalamus increases the speech-processing ability
of the right ear in patients undergoing surgery for Parkinsonian tremor, while
left-sided lesions produced a marked decrease in this right ear advantage (see
discussion of dichotic listening paradigm, below). Hugdahl et al (1990) suggest that the thalamus may act as a gating way station for relevant speech and
language information en route to target cortical areas and that these thalamic
mechanisms may be activated by stimulation and deactivated by lesions. Such
NEUROBIOLOGY OF SPEECH PERCEPTION
339
studies have substantially enhanced our view of the diverse neural regions that
contribute to speech perception.
Evidence has also shown concurrent deficits in processing rapidly changing acoustic cues and speech following left hemisphere damage (Efron 1963,
Tallal & Newcombe 1978), suggesting a direct association between the perception of acoustic temporal cues and speech perception. Also, preliminary evidence suggests that proscribed caudate damage may impair nonlingual auditory
temporal processing (Tallal et al 1994), a finding consistent with reports that
caudate volume as measured by MRI is reduced in language-learning impaired
(LLI) children with severe auditory temporal processing deficits (Jernigan et al
1991). Combined with direct evidence from lesion studies showing that caudate damage impairs speech and language functions, these cumulative findings
provide some neuropathological evidence linking basic auditory-processing
mechanisms and speech-processing mechanisms at both subcortical and cortical levels.
MRI studies of dyslexic brains have also evidenced consistent anomalies,
such as atypical patterns of cerebral lateralization (e.g. Jernigan et al 1991,
Larson et al 1990, Leonard et al 1993, Hynd & Semrud-Clikeman 1989). Neuropathological studies reveal cortical cellular anomalies (i.e. focal developmental cortical neuropathologies, including microgyric lesions, dysplasias, and ectopias) in the brains of human dyslexics (Galaburda & Kemper 1979, Galaburda
et al 1985, Humphreys et al 1990). Research using animal models suggests that
these anomalies arise as the consequence of interference with critical periods
of neuromigration, possibly resulting from focal ischemic damage (e.g. Dvorak
et al 1978, Humphreys et al 1991, Rosen et al 1992). Animal studies examining
the behavioral consequences of developmental neuropathologies (specifically,
cortical microgyric lesions) have shown auditory temporal processing impairments in microgyric rats (Fitch et al 1994), and these processing deficits are
highly similar to auditory processing deficits seen in language-impaired children (Tallal & Piercy 1973, Tallal et al 1993).
Combined findings suggest that anomalies evident in the brains of dyslexics
may act, in part, to impair the encoding and consequent perception of rapidly
changing auditory cues, such as those that occur in speech phonemes. The hypothesis that acoustic-processing deficits could impair speech perception and
lead to consequent disabilities in the development of phonics necessary for language and reading development has strong historical support from studies of
LLI children. These studies have shown that LLI children are profoundly impaired on rapid auditory processing tasks, even when nonlingual stimuli are used
(Tallal & Piercy 1973, Tallal et al 1993). Recent evidence has shown that specific auditory temporal training can significantly speed up auditory-processing
rates in these children and that improved processing rates are correlated with
340
FITCH, MILLER & TALLAL
improved speech processing (Merzenich et al 1996, Tallal et al 1996). This field
of research highlights the critical dependence of speech perception upon more
basic prerequisite mechanisms of auditory temporal-encoding and perception.
Further, the ability to assess auditory temporal processing abilities in infants
(Benasich & Tallal 1996) may provide a valuable measure of an infant’s risk
for developing later speech and language difficulties.
Neuroanatomical evidence showing specific anomalies in the magnocellular
neurons of the auditory thalamic nuclei (MGN) of dyslexic brains (Galaburda
et al 1994) may also relate to auditory temporal processing deficits in this
population. Galaburda et al (1994) speculate that these anomalies may parallel similar defects in the magnocellular subdivision of the visual thalamic
nuclei (lateral geniculate nucleus, LGN) of dyslexics (Livingstone et al 1991).
Specifically, magnocellular anomalies of the LGN are correlated with deficits in
processing rapidly changing visual information (Livingstone et al 1991). The
behavioral significance of magnocellular anomalies in the MGN, however, remained unclear until a recent series of animal studies showed that male rats with
induced neocortical microgyria (like those seen in dyslexic brains) exhibited
significant auditory temporal-processing deficits and also specific anomalies
in magnocellular cells of the MGN (Herman et al 1995). Moreover, behavioral performance on the auditory task was correlated to MGN morphology in
sham, but not lesioned, males. These results suggested that cortico-thalamic
sensory-processing systems are anatomically aberrant in subjects with neonatal cortical injury and that these anatomic defects are behaviorally expressed as
sensory-processing deficits. This relationship could explain the coincidence of
focal cortical anomalies and MGN morphological anomalies in human dyslexic
brains, as well as the auditory-processing deficits seen in LLI populations.
We anticipate that this animal model will provide an exciting new avenue
for studying the neurobiological substrates of normal speech perception in
humans.
Psychophysical Studies of Speech Perception in Intact Humans
Behavioral studies on intact, healthy adults have also provided important insights into how speech is processed by the brain, including the mechanisms
by which psychophysical parameters of speech relate to discrimination and
perception.
From a psychophysical perspective, speech can be regarded as a tertiary structure wherein frequency and amplitude cues are “wrapped up” inside a temporal
structure or envelope. As noted above, evidence strongly suggests that these
temporal changes in acoustic structure play a critical role in speech perception.
Consistent with this assertion is the surprising demonstration that temporal cues
could be extracted from human speech and applied to bands of noise (thus
NEUROBIOLOGY OF SPEECH PERCEPTION
341
virtually eliminating spectral cues) and still be discriminated accurately as
speech by normal listeners (Shannon et al 1995). The authors conclude that
“the presentation of a dynamic temporal pattern in only a few broad spectral
regions is sufficient for the recognition of speech” (Shannon et al 1995).
Behavioral studies on normal humans have also been used to address the issue
of cerebral laterality for processing speech and language (see Bryden 1982, for
review). As noted above, clinical evidence from neurologically impaired populations has provided long-standing evidence that speech and language functions
are primarily (though not exclusively) lateralized to the left hemisphere in most
adults. In order to study this phenomenon from a behavioral approach, scientists
have employed the dichotic listening method, wherein competing information
is presented simultaneously to the two ears and discrimination or recall is assessed separately for each ear (e.g. Kimura 1967). Since auditory pathways
are primarily crossed, this method allows a relative comparison of performance
of each ear separately, from which relative performance of the contralateral
hemisphere can be inferred.
Using the dichotic listening method, it has consistently been shown that most
people exhibit a right-ear (left hemisphere) advantage (REA) for discriminating
speech sounds (see Bryden 1982, for review). Interestingly, this REA appears
to be strongly influenced by temporal parameters. Specifically, slowing down
or speeding up the formant transitions within a speech syllable alters the magnitude of the REA for speech (Schwartz & Tallal 1980). These results indicate
that specialization of the left hemisphere for rapid acoustic change may underlie
specialization for speech perception. This hypothesis is further supported by
evidence that intact human listeners exhibit an REA not only for speech, but also
for tone sequences that change within the time frame critical to speech (Brown
et al 1995). These findings are consistent with the hierarchical dependence of
speech perception upon the basic ability to process rapid acoustic change, and
lead to a provocative hypothesis that the left hemisphere regions that subserve
speech may be fundamentally specialized for the processing of rapidly changing acoustic information—an assertion consistent with the observation of left
hemisphere specialization for complex auditory discrimination in nonhuman
species (e.g. Dewson 1977, Ehret 1987, Fitch et al 1993, Gaffan & Harrison
1991, Heffner & Heffner 1986, Petersen et al 1978). If these mammalian left
hemisphere regions are indeed fundamentally specialized for processing rapid
acoustic change, then it stands to reason that these regions would be recruited
as the primary locus for higher-order speech perception and language-related
functions in humans. This hypothesis requires further study from both developmental and comparative perspectives.
In sum, combined neuropathological and behavioral studies show that (a)
speech perception can occur with limited spectral, but intact temporal cues;
342
FITCH, MILLER & TALLAL
(b) hemispheric specialization for speech appears to reflect processing of temporal acoustic cues; (c) children unable to process temporal acoustic cues exhibit concomitant language-processing deficits; (d ) neurological damage to the
left cerebral hemisphere and caudate nucleus appears to result in concurrent
auditory temporal-processing and speech-perception deficits; and (e) anatomic
anomalies are seen in the auditory thalamic nucleus (MGN) of human dyslexics.
In male rats, these anomalies are correlated with deficiencies in rapid auditory
processing. These diverse lines of research converge on the single notion that
temporal change, or temporal structure, overlaid on frequency and intensity
cues, is a necessary and perhaps sufficient acoustic cue in the processing of
human speech. Moreover, defects in the neuroanatomic systems subserving
these sensory functions appear to result in severely impaired auditory temporal processing—including, in humans, speech processing. In a developmental
context such deficits could lead to the impaired phonological perception and
language development evident in clinically language-disabled populations.
Behavioral Studies and Categorical Perception of Speech
Research has shown that a given acoustic pattern is not directly translated,
point-to-point, into a singular speech sound. Rather, the brain processes the
complex acoustic information and assigns a label, based on known categories of
speech signals, to represent each phoneme category. Although these phoneme
categories may be innately predisposed, research with infants shows that the
psychophysical boundaries of these categories are largely determined from experience beginning at birth (see Kuhl 1992, for review). Apparently, infants
learn by listening to ongoing speech where to set up acoustic “boundaries,”
which categorize the speech phonemes critical to their native language. This
ability to create perceptual boundaries between speech sounds is termed categorical perception and results in a sharply defined (categorical) response pattern when an individual is presented with speech sounds that gradually change
acoustically from one phoneme to another (Figure 5). This phenomenon was
historically thought to distinguish the discrimination of human speech from
other types of environmental or musical stimuli, where such categorical boundaries were not expected. Categorical perception by humans for speech 0thus
became a defining hallmark for a unique “speech module” in the human brain,
a module that did not apply to other complex auditory processes and had no
homologous substrate in any nonhuman species.
The hypothesis that categorical perception provided psychophysical evidence
for the unique nature of human speech processing was severely challenged
in 1975, when Kuhl and Miller demonstrated that a nonhuman species (the
chinchilla) showed categorical perception of human speech sounds [consonantvowel (CV) syllables] (Kuhl 1981, 1987; Kuhl & Miller 1975, 1978). In an
NEUROBIOLOGY OF SPEECH PERCEPTION
343
Figure 5 Categorical perception of /ba/ versus /da/ as a function of linear change in stimulus
acoustic properties (using computer-generated stimuli).
equally fatal blow to the “speech is special” hypothesis, psychophysicists began
to show categorical perception for certain types of complex nonspeech acoustic
signals in humans (Cutting & Rosner 1974, Miller et al 1976, Pisoni 1977).
Since the 1970s, ongoing research has demonstrated categorical perception by
monkeys of species-specific coos (May et al 1989) and by avian species for
species-specific bird calls (e.g. see Dooling et al 1990). These results have
severely weakened the argument that human speech perception is a unique
process, and they highlight the critical value of animal research, as well as
studies of central auditory processing of complex temporal and spectral acoustic
signals, in providing comprehensive understanding of the neurobiological basis
of human speech perception.
Neuroimaging and Electrophysiological Recording During
Speech Perception and Auditory Processing Tasks
The advent of in vivo imaging in awake, behaving subjects has revolutionized
our ability to relate structure to function in the human brain. These new tools
have particularly important implications for the study of higher cortical processes such as speech perception. Structurally, three-dimensional magnetic resonance imaging (MRI) provides detailed static images of the in vivo brain, with
spatial resolution on the order of cubic millimeters (Damasio & Frank 1992).
Functionally, several imaging approaches currently provide images of in vivo
brain activity during sensory stimulation, cognitive processing, or motor action. These approaches can be evaluated based on the extent to which they provide fine-grained temporal and/or spatial resolution. On the one hand, surface
recordings of the electrical [electroencephalography (EEG), evoked potential
(EP), event-related potential (ERP)] and magnetic [magnetoencephalography
344
FITCH, MILLER & TALLAL
(MEG)] fields of the brain provide precise temporal resolution (1 ms) of brain
activity, but relatively poor spatial resolution regarding localization or source
(for review, see Hillyard 1993). In contrast, positron emission tomography
(PET) and functional magnetic resonance imaging (fMRI) provide excellent
spatial resolution for recording changes in regional cerebral blood flow and
cellular metabolism during cognition, but these methods have poor temporal
resolution as compared with that of MEG and ERPs, which is a significant
drawback for studies of speech and other central auditory processes (Petersen
& Fiez 1993). Despite the relative limitations of each of these individual functional imaging approaches, converging results are nevertheless improving our
understanding of the neural processes subserving speech.
Consistent with the long-standing localization of Wernicke’s area in the superior temporal gyrus, bilateral activation of superior temporal gyrus has been
reported during performance of an acoustic phonemic discrimination task using fMRI (Binder et al 1994a,b). Interestingly, however, Binder et al (1994a,b)
also found activation of superior temporal gyrus during the presentation of
noise, and found no consistent semantic-specific differences in activation in
response to different types of speech stimuli, including pseudo-words. Other
recent studies utilizing PET have shown colocalization of cortical activity with
the presentation of rapidly changing tone sequences, as well as CV syllables
and CVC words (Fiez et al 1995). Specifically, Brodmann area 45, a frontal
region that leads to Broca’s aphasia when damaged, showed activation for stimuli that incorporated rapid change (CV syllables, CVC words, and nonverbal
tone triplets). Steady-state vowels, which are verbal, but do not incorporate
transient acoustic information, failed to significantly activate this area. These
results pinpoint a functional organization of speech-processing regions that is
based, at least in part, on acoustic structure, and not only linguistic relevance
or meaning. They show that similar secondary auditory cortical regions subserve the discrimination of complex acoustic stimuli, including speech and
nonspeech.
Activation of this same frontal left hemisphere region (Broca’s area) recently
occurred during a phonemic discrimination task, as measured by PET (Zattore
et al 1992). Moreover, right hemisphere activation of this region occurred for
pitch discrimination, an acoustic processing task that does not require the integration of information changing within brief time windows. More recently,
Shaywitz et al (1995), using fMRI, observed activation of the left hemisphere’s
frontal area in men during a phonemic discrimination task, whereas they observed a bilateral pattern of activation of this area in women performing the same
task. The latter result is consistent with the results of dichotic listening studies,
which show a stronger REA in men than in women for discriminating speech
sounds (Kimura & Harshman 1984, McGlone 1980). Although studies by
NEUROBIOLOGY OF SPEECH PERCEPTION
345
Zattore et al (1992) and Shaywitz et al (1995) utilized phonemic stimuli carrying both acoustic and linguistic relevance, which make separation of relevant
features underlying region-specific brain activation difficult, the PET studies
by Fiez and colleagues (1995) specifically demonstrated activity in the same
frontal cortical regions with speech and nonspeech auditory stimuli that incorporated rapid acoustic change. The latter result suggests that the results of
Zattore et al (1992) and Shaywitz et al (1995) could be interpreted to reflect
activation by rapidly changing acoustic spectra (including speech) rather than
by phonological stimuli per se.
Functional imaging studies with LLI populations have also provided insight
into how speech is processed in the brain under normal and anomalous conditions. For example, a series of imaging studies on LLI children support the
view that anomalies in anatomic asymmetry in frontal and posterior temporal
regions may relate to the inability of LLI individuals to activate these same
areas (Neville et al 1993, Rumsey et al 1992, Hagman et al 1992). Research
by Stefanatos and colleagues (1989) found that children with primary receptive language problems showed reduced auditory-evoked responses specific
to frequency modulated tones. Neville and colleagues (1993) examined electrophysiological recordings from both language-impaired and control subjects
during visual and auditory sensory–processing tasks and observed abnormal
patterns of hemispheric activation in the language-impaired group. Further,
auditory ERP components—considered to represent activity in the perisylvian
area (particularly the superior temporal sulcus)—were abnormal in a subset of
children who had difficulties in rapid auditory processing (Neville et al 1993).
Although many studies have been performed to assess the functional activity
of the brain during speech perception, only a few have investigated activity as
a function of categorical perception. The ability of MEG and scalp-recorded
ERP measures to provide temporal resolution on the order of milliseconds
makes them ideal tools for investigating the neurophysiological basis for the
categorical perception of speech stimuli. Initial investigations utilizing signalaveraging techniques have provided little evidence for a unique electrical or
magnetic signature for the perception of phonetic categories that cannot be
attributed to differences in temporal or spectral stimulus parameters (Lawson
& Gaillard 1981; Aaltonen et al 1987; Sams et al 1990; Kraus et al 1992, 1993;
Sharma et al 1993).1
In summary, several researchers using imaging techniques have examined
the brain’s functional activity in response to speech stimuli. Cumulative results
support the critical relevance of temporal acoustic cues to functional speech
1 See Molfese & Betz (1988) for review of ERP experiments (analyzed using factor analytic
approaches) examining the hemipsheric specialization for speech processing.
346
FITCH, MILLER & TALLAL
perception and, moreover, suggest that defects in the underlying ability to process rapid acoustic change are associated with anomalous patterns of cortical
activity during speech processing. Finally, although we must presume that
higher cortical activity of the brain reflects, at some level, the transformation
of complex acoustic signals into meaningful speech representations, evidence
obtained from imaging studies still fails to strongly support the presence of
a unique pattern of cortical activity that reflects processing of linguistic, but
not other acoustically complex, forms of information. Such a result is consistent with psychophysical data from categorical perception studies, which failed
to support the idea of unique processing of speech in humans. In sum, little
empirical data supports the idea of specialized brain activity in humans for
processing speech as compared with other similarly spectrally and temporally
complex acoustic signals.
Neurobiology of Central Auditory Processing in Animals
Our discussion thus far has addressed results from clinical studies of neuropathological populations, behavioral studies of normal and language-impaired
subjects, and neuroimaging studies performed during speech perception tasks in
normal and language-impaired populations. Each of these research approaches
has contributed to our understanding of the mechanisms by which the human
brain processes speech—however, these methods are also limited in their ability to probe the fundamental functions of the nervous system. For this reason,
our basic neurobiological understanding of sensory and cognitive systems has
traditionally been derived from the use of animal models. However, counter to
prolific animal research on visual processing, spatial navigation, and memory,
which has contributed to our understanding of these functions in the human
brain, the field of speech and language research has suffered from a longstanding unwillingness to use animal models in the study of mechanisms that
may subserve speech perception. This reflects a widely held belief that human
speech and language represent processes unique to humans, and as such, unamenable to study in nonhuman species. Yet, as reviewed above, both human
and animal research largely fails to support this view. Empirical evidence suggests that humans do not, in fact, possess a neural module that is activated only
by speech and that makes humans distinct from all other animals. Rather, like
every other sensory and cognitive process studied to date, critical precursors
to the functions that underlie speech perception (specifically, complex acoustic
processing) appear to be found in nonhuman species.
For example, deficits in complex auditory discrimination, including speciesspecific calls, have been observed following temporal cortical ablation in monkeys (Dewson et al 1970; Dewson 1977; Gaffan & Harrison 1991; Heffner &
Heffner 1986, 1989) and cats (Diamond & Neff 1957). Such findings parallel
NEUROBIOLOGY OF SPEECH PERCEPTION
347
evidence of human aphasia and complex acoustic-processing deficits following
temporal cortical damage. Interestingly, animal studies also support the assertion of left hemisphere specialization for the processing of complex acoustic
stimuli in nonhuman species (Dewson et al 1970, Dewson 1977, Fitch et al
1993, Gaffan & Harrison 1991) and further suggest that this specialization
may underlie left hemisphere specialization for the discrimination of speciesspecific communicative signals (Ehret 1987, Heffner & Heffner 1986, Petersen
et al 1978). As such, these results support the assertion that left hemisphere
specialization for the perception of complex acoustic stimuli may underlie the
well-known left hemisphere specialization for human speech perception.
Moreover, at a basic level, animal studies can shed light on the mechanisms
whereby temporal acoustic information is encoded in the auditory system. Research by Schreiner and colleagues (e.g. see Schreiner & Langner 1988 described above) specifically shows that (a) the fidelity of point-to-point temporal encoding is gradually replaced by segmented responses to units of complex
temporal cues as one moves up the ascending auditory system and (b) auditory
relay structures may contain subregions that are specialized for encoding of
high-resolution temporal information. The latter assertion is consistent with
findings of magnocellular anomalies in the LGN and MGN of human dyslexics (Livingstone et al 1991, Galaburda et al 1994), particularly since animal
studies have linked the presence of magnocellular anomalies in the MGN to
auditory discrimination deficits (Herman et al 1995). The notion of temporally
specialized subregions is also consistent with findings by Kraus et al (1994b),
which demonstrated a neurophysiologic response to acoustic stimulus change
(as measured by mismatch negativity, or MMN, to a deviant tone burst) in the
caudo-medial region of the guinea pig auditory thalamic nucleus (MGN). Kraus
et al (1994a) obtained similar results from the guinea pig thalamus by using CV
syllables as stimuli. These results are of particular interest because neurophysiologic response to temporal change was not elicited in the ventral MGN, the
tonotopically organized central relay station of the auditory thalamus. These
findings suggest that regions of auditory structures traditionally considered secondary because they are not strongly involved in spectral analysis may actually
be primary for the temporal analysis critical to speech perception.
Animal research utilizing speech stimuli has also shed light on potential neurobiological speech-processing mechanisms in humans. In studies in which
electrophysiological recordings in monkeys were performed during the auditory presentation of phonemes, Steinschneider et al (1994) found that a characteristic “double on” neural response pattern in auditory cortex may reflect the
categorical perception of voiced versus unvoiced consonants. Specifically, the
researchers found that the place where the second burst to voicing onset dissipated marked the categorical boundary between consonants that differed in
348
FITCH, MILLER & TALLAL
voice onset time (Steinschneider et al 1994). Moreover, results supported assertions of increasingly segmented neural responses to complex acoustic stimuli
as one ascends the auditory system: Electrophysiological responses measured
in thalamocortical fibers reflected an initial transient response to the initiation,
“on,” of a complex acoustic signal (e.g. speech) followed by phase-locked response to the syllable periodicity, whereas cortical (A1) responses were seen
at the start of periodic and aperiodic segments defining the voice-onset-time
(VOT), thus accentuating the acoustic transients (see also Steinschneider et al
1995). These findings, as well as data obtained from neural modeling of categorical perception (Buonomano & Merzenich, 1995), support an acoustic rather
than linguistic basis for the categorical perception of phonetic stimuli and,
moreover, provide neurobiological information about the mechanisms underlying this response.
Summary
Neural representation for speech processing apparently occurs over a more
widely distributed neural system than was once believed. Data have shown that
acoustic information is initially encoded on a point-to-point basis, including
the frequency information underlying the temporal envelope, at the level of the
auditory nerve. As speech signals ascend the auditory pathway, neuronal response patterns to the temporal envelope, which behavioral studies demonstrate
is sufficient for speech perception (Shannon et al 1995), become increasingly
segmented as neurons begin to respond with increasing preference to units of
temporal acoustic information (Schreiner & Langner 1988; Steinschneider et al
1994, et al 1995). Primary and secondary cortical activity reflected by human in
vivo neuroimaging measures may reflect activation to complex acoustic stimuli
incorporating very rapid acoustic changes, regardless of whether or not these
rapid acoustic changes are or are not occurring within speech (e.g. Fiez et al
1995, Binder et al 1994). These results are consistent with behavioral and neuropathological evidence supporting the critical relevance of rapid auditory processing systems to normal speech perception (e.g. Tallal et al 1993, Galaburda
et al 1994, Herman et al 1995). These results also support the conclusion that
what is selectively damaged by left hemisphere lesions involves mechanisms
critical to the processing of information within a time frame of tens of milliseconds. We suggest that a disruption of this mechanism leads to the phonological
disorders so commonly seen in aquired and developmental aphasias. We hypothesize that these mechanisms are common to both the perception and production of speech information within this time range and point to the work of
Kimura & Archibald (1974) and Ojemann (1984) in support of this hypothesis.
This review focuses on data derived from a wide variety of research approaches that provide insight into the neurobiological basis of speech perception.
NEUROBIOLOGY OF SPEECH PERCEPTION
349
Speech perception has not, traditionally, been an area firmly embraced by
modern-day neuroscientists. Historically, this may derive from a belief that
the neurobiological mechanisms that subserve speech are uniquely human and,
as such, not amenable to basic neuroscientific methods of study at the molecular
or systems level of inquiry. However, the data overwhelmingly fail to provide
support for separate or uniquely human neural-processing systems for speech.
Indeed, the data converge to suggest that speech processing is subserved by
neurobiological mechanisms specialized for the representation of temporally
complex acoustic signals, regardless of communicative or linguistic relevance,
in humans and nonhumans alike. Further, evidence obtained from animal studies suggests a neurobiological basis for the final matching of acoustic patterns to
speech templates that is consistent with categorical perception (Steinschneider
et al 1994, 1995). The fact that nonhuman species show categorical perception
for speech phonemes at both the behavioral and neurobiological level undermines the traditional view that categorical perception reflects speech-specific
processing that is unique to the human brain. This is not to say that speech
itself—and indeed, species-specific communications—is not a special form of
acoustic information. It only says that the brain does not, as far as we are
aware, process this information in a unique manner. Rather, it appears that neurobiological systems for processing temporally complex acoustic stimuli are
recruited, or even exploited, for processing acoustically rich species-specific
communications such as human speech.
As stated at the outset, we have not addressed the neurobiological mechanisms that underlie higher-order processing of speech signals that ultimately
represent language. We discuss data relating to the neurobiological processing
of subunits of human speech (phonemes) but in no way suggest that this review
reflects the final steps in the processing of meaningful language. We do suggest,
however, that in order to fully understand the neurobiological mechanisms underlying language processing as it relates to semantics, syntax, grammar, and ultimately conceptual thought, we first need to better understand how the building
blocks of sentences and words—that is, phonemes—are processed. Phonology
offers an important link between neurobiological systems (i.e. sensory, perceptual, and motor) and higher aspects of language (semantics and syntax). By
studying speech processing at the level of the acoustic mechanisms that subserve
it, neuroscientists are now in the position to make rapid and substantial advances
in understanding the fundamental neurobiological underpinnings of language.
ACKNOWLEDGMENTS
The authors wish to thank Illya Shell for technical assistance with this manuscript
and the NIDCD, Charles A. Dana Foundation, McDonnell-Pew Foundation, and
March of Dimes for research funding.
350
FITCH, MILLER & TALLAL
Visit the Annual Reviews home page at
http://www.annurev.org.
Literature Cited
Aaltonen O, Niemi P, Nyrke T, Tuhkanen M.
1987. Event-related brain potentials and the
perception of a phonetic continuum. Biol.
Psychol. 24(3):197–207
Aitkin LM, Irvine DRF, Webster WR. 1984.
Central neural mechanisms of hearing. In
Handbook of Physiology, Sect. 1: The Nervous System, Sensory Processes, ed. I DarienSmith, 3:675–737. Bethesda, MD: Am. Physiol. Soc.
Aram DM, Gillespie LL, Yamashita TS. 1990.
Reading among children with left and right
brain lesions. Dev. Neuropsychol. 6(4):301–
17
Aram H, Ekelman BL, Rose DF, Whitaker HA.
1985. Verbal and cognitive sequelae following unilateral lesions acquired in early childhood. J. Clin. Exp. Neuropsychol. 7(1):55–78
Benasich AA, Tallal P. 1996. Auditory temporal processing thresholds, habituation, and
recognition memory over the first year. Infant Behav. Dev. 19:339–57
Binder JR, Rao SM, Hammeke TA, Yetkin FZ,
Frost JA, et al. 1994a. Effects of stimulus rate
on signal response during functional magnetic resonance imaging of auditory cortex.
Cogn. Brain Res. 2(1):31–8
Binder JR, Rao SM, Hammeke TA, Yetkin
FZ, Jesmonowicz A, et al. 1994b. Functional
magnetic resonance imaging of human auditory cortex. Ann. Neurol. 35(6):662–72
Blumstein SE. 1995. The neurobiology of the
sound structure of language. In The Cogntive
Neurosciences, ed. M Gazzaniga, pp. 915–
29. Cambridge, MA: MIT Press
Blumstein SE, Baker E, Goodglass H. 1977.
Phonological factors in auditory comprehension in aphasia. Neuropsychologia 15:19–30
Brown CP, Fitch RH, Tallal P. 1995. Gender and
hemispheric differences for auditory temporal processing. Soc. Neurosci. Abstr. 1:440
Bryden MP. 1982. Laterality: Functional
Asymmetry in the Intact Brain. New York:
Academic
Buonomano D, Merzenich MM. 1995. Temporal information transformed into a spatial
code by a neural network with realistic properties. Science 267:1028–30
Crosson BA. 1992. Subcortical Functions in
Language and Memory. New York: Guilford
Cutting JE, Rosner BS. 1974. Categorical
boundaries in speech and music. Percept.
Psychophys. 16:564–70
Damasio AR, Damasio H, Rizzo M, Varney N,
Gersh F. 1982. Aphasia with nonhemorrhagic
lesions in the basal ganglia and internal capsule. Arch. Neurol. 39:15–20
Damasio H, Frank R. 1992. Three dimensional
in vivo mapping of brain lesions in humans.
Arch. Neurol. 49:137–43
Delattre P, Liberman A, Cooper FS. 1955. J.
Acoust. Soc. Am. 27(4):769–73
Dewson JH III. 1977. Preliminary evidence of
hemispheric asymmetry of auditory function
in monkeys. In Lateralization in the Nervous
System, ed. S Harnad, RW Doty, L Goldstein, J Jaynes, G Krauthamer, pp. 63–71.
New York: Academic
Dewson JH III, Cowey A, Weiskrantz L. 1970.
Disruptions of auditory sequence discrimination by unilateral and bilateral cortical ablations of superior temporal gyrus in the monkey. Exp. Neurol. 28:529–49
Diamond IT, Neff WD. 1957. Ablation of temporal cortex and discrimination of auditory
patterns. J. Neurophysiol. 20:300–15
Dooling RJ, Brown SD, Park TJ, Okanoya K.
1990. Natural perceptual categories for vocal
signals in Budgerigars (Melopsittacus Undulatus). In Comparative Perception: Complex Signals, ed. WC Stebbins, MA Berkley,
2:345–74. New York: Wiley
Dvorák K, Feit J, Juránková Z. 1978. Experimentally induced focal microgyria and status
verrucosus deformis in rats—pathogenesis
and interrelation histological and autoradiographical study. Acta Neuropathol. 44:121–
29
Efron R. 1963. Temporal perception, aphasia
and deja vu. Brain 86:403–24
Ehret G. 1987. Left hemisphere advantage in the
mouse brain for recognizing ultrasonic communication calls. Nature 325(6101):249–51
Fiez JA, Raichle ME, Miezin FM, Peterson SE,
Tallal P, Katz WF. 1995. Activation of a left
frontal area near Broca’s area during auditory detection and phonological access tasks.
J. Cogn. Neurosci. 7(3):357–75
Fitch RH, Brown CP, O’Connor K, Tallal P.
1993. Functional lateralization for auditory
temporal processing in male and female rats.
Behav. Neurosci. 107(5):844–50
Fitch RH, Brown CP, Tallal P. 1993. Left hemisphere specialization for auditory temporal
processing in rats. In Temporal Information
Processing in the Nervous System: Special
NEUROBIOLOGY OF SPEECH PERCEPTION
Reference to Dyslexia and Dysphasia, ed.
P Tallal, AM Galaburda, RR Llinás, C von
Euler, pp. 346–47. New York: NY Acad.
Sci.
Fitch RH, Tallal P, Brown CP, Galaburda AM,
Rosen GD. 1994. Induced microgyria and auditory temporal processing in rats: a model
for language impairment. Cerebr. Cortex
4:260–70
Gaffan D, Harrison S. 1991. Auditory-visual
associations, hemispheric specialization and
temporal-frontal interaction in the rhesus
monkey. Brain 114:2133–44
Galaburda AM, Kemper TL. 1979. Cytoarchitectonic abnormalities in developmental
dyslexia: a case study. Ann. Neurol. 6(2):94–
100
Galaburda AM, Menard MT, Rosen GD, Livingstone MS. 1994. Evidence for abberrant
auditory anatomy in developmental dyslexia.
Proc. Natl. Acad. Sci. USA 91:8010–13
Galaburda AM, Sherman GF, Rosen GD,
Geschwind AF. 1985. Developmental dyslexia: four consecutive patients with cortical
anomalies. Ann. Neurol. 18(2):222–33
Garrett M. 1995. The structure of language processing: neurophysiological evidence. In The
Cognitive Neurosciences, ed. MS Gazzaniga,
pp. 881–99. Cambridge, MA: MIT Press
Geschwind N. 1979. Specialization of the human brain. Sci Am. 241(3):180
Hagman JO, Wood F, Buchsbaum MS, Tallal P, Flowers L, Katz W. 1992. Cerebral
brain metabolism in adult dyslexic subjects
assessed with positron emission tomography
during performance of an auditory task. Arch.
Neurol. 49:734–39
Heffner HE, Heffner RS. 1986. Effects of
unilateral and bilateral auditory cortex lesions on the discrimination of vocalizations by Japanese macaques. J. Neurophysiol.
56:683–701
Heffner HE, Heffner RS. 1989. Effects of restricted lesions on absolute thresholds and
aphasia-like deficits in Japanese Macaques.
Behav. Neurosci. 103:158–69
Herman A, Fitch, RH, Galaburda, AM, Rosen
GD. 1995. Induced microgyria and its effects
on cell size, cell number, and cell packing
density in the medial geniculate nucleus. Soc.
Neurosci. Abstr. 21:1711
Hillyard SA. 993. Electrical and magnetic brain
recordings: contributions to cognitive neuroscience. Curr. Op. Neurobiol. 3:217–24
Hugdahl K, Wester K, Abjornsen A. 1990. The
role of the left and right thalamus in language
asymmetry: dichotic listening in Parkinsonian patients undergoing stereotactic thalamotomy. Brain Lang. 39:1–13
Humphreys P, Kaufmann WE, Galaburda AM.
1990. Developmental dyslexia in women:
351
neuropathological findings in three patients.
Ann. Neurol. 28(6):727–38
Humphreys P, Rosen, GD, Press, DM, Sherman,
GF, Galaburda, AM. 1991. Freezing lesions
of the newborn rat: a model for cerebrocortical microgyria. J. Neuropathol. Exp. Neurol.
50:145–60
Hynd-Semrud GW, Clikeman MS. 1989.
Dyslexia and neurodevelopmental pathology: relationships to cognition, intelligence,
and reading skill acquisition. Learn. Disabil.
22:204–15
Imig TH, Morel A. 1983. Organization of the
thalamocortical auditory system in the cat.
Annu. Rev. Neurosci. 6:95–120
Imig TJ, Ruggero MA, Kitzes LM, Javel E,
Brugge JF. 1977. Organization of auditory
cortex in the owl monkey (Aotus Trivirgatus). J. Comp. Neurol. 171:111–28
Jernigan TL, Hesselink JR, Sowell E, Tallal PA.
1991. Cerebral structure on magnetic resonance imaging in language- and learningimpaired children. Arch. Neurol. 48(5):539–
45
Kelly J. 1980. The auditory cortex of the rat.
In Cerebral Cortex of the Rat, ed. B Kolb,
RC Tees, p. 381–405. Cambridge, MA: MIT
Press
Kimura D. 1967. Functional asymmetry of the
brain in dichotic listening. Cortex 3:163–78
Kimura D, Archibald Y. 1974. Motor function
of the left hemisphere. Brain 97:337–50
Kimura D, Harshman R. 1984. Sex differences
in brain organization for verbal and nonverbal functions. In Progress in Brain Research, ed. GJ DeVries, 61:423–41. Amsterdam: Elsevier
Kraus N, McGee T, Carrell T, King C, Littman
T, Nicol T. 1994a. Discrimination of speechlike contrasts in the auditory thalamus and
cortex. J. Acoust. Soc. Am. 96:2758–68
Kraus N, McGee T, Carrell T, Sharma A, Micco
A, Nichol T. 1993. Speech-evoked cortical
potentials in children. J. Am. Acad. Audiol.
4(4):238–48
Kraus N, McGee T, Littman T, Nicol T. 1992.
Reticular formation influences on primary
and non-primary auditory pathways as reflected by the middle latency response. Brain
Res. 587:186–94
Kraus N, McGee T, Littman T, Nicol T, King C.
1994b. Nonprimary auditory thalamic representation of acoustic change. J. Neurophysiol.
72:1270–77
Kuhl PK. 1981. Discrimination of speech by
non-human animals: basic auditory sensitivities conducive to the perception of speechsound categories. J. Acoust. Soc. Am. 70:340–
49
Kuhl PK. 1987. The special-mechanisms debate
in speech research: categorization tests on
352
FITCH, MILLER & TALLAL
animal and infants. In Categorical Perception: The Groundwork of Cognition, ed. S.
Harnad, p. 355–86. Cambridge: Cambridge
Univ. Press
Kuhl PK. 1992. Psychoacoustics and speech
perception: internal standards, perceptual anchors, and prototypes. In Developmental Psychoacoustics, ed. LA Werner, EW Rubel, pp.
293–332. Washington, DC: Am. Psychol. Assoc.
Kuhl PK, Miller JD. 1975. Speech perception
by the chinchilla: voiced-voiceless distinction in alveolar plosive consonants. Science
190:69–72
Kuhl PK, Miller JD. 1978. Speech perception
by the chinchilla: identification functions for
synthetic VOT stimuli. J. Acoust. Soc. Amer.
63:905–17
Larsen JP, Hoien T, Lundberg I, Odegaard H.
1990. MRI evaluation of the size and symmetry of the planum temporale in adolescents
with developmental dyslexia. Brain Lang.
39:289–301
Lawson EA, Gaillard AW. 1981. Evoked potentials to constant-vowel syllables. Acta Psycholog. 49(1):17–25
Leonard CM, Voeller KKS, Lombardino LJ,
Morris MK, Hynd GW, et al. 1993. Anomalous cerebral structure in dyslexia revealed
with magnetic resonance imaging. Arch. Neurol. 50:461–69
Liberman AM, Mattingly IG. 1985. The motor
theory of speech perception revisited. Cognition 21(1):1–36
Livingstone MS, Rosen GD, Drislane FW, Galaburda AM. 1991. Physiological and anatomical evidence for a magnocellular defect in developmental dyslexia. Proc. Natl. Acad. Sci.
USA 88:7943–47
Ludlow CL, Rosenberg J, Fair C, Buck D,
Schesselman S, Salazar A. 1986. Brain lesions associated with nonfluent aphasia fifteen years following penetrating head injury.
Brain 109:55–80
Mateer C, Ojemann GA. 1983. Thalamic mechanisms in language and memory. In Language Function and Brain Organization, ed.
S Segalowitz, pp. 171–91. New York: Academic
May B, Moody DB, Stebbins WC. 1989. Categorical perception of nonspecific communication sounds by Japanese macaques, Macaca fuscata. J. Acoust. Soc. Am. 85:837–47
McGlone J. 1980. Sex differences in human
brain asymmetry: a critical review. Behav.
Brain Sci. 3:215–63
Merzenich MM, Jenkins WM, Miller SL,
Schreiner C, Tallal P, et al. 1996. Temporal
processing deficits of language learning impaired children are remediated by training.
Science 271:77–81
Merzenich MM, Reid MD. 1974. Representation of the cochlea within the inferior colliculus of the cat. Brain Res. 77:397–415
Miceli G, Caltagirone C, Gainotti G, Payer-Rigo
P. 1978. Discrimination of voice versus place
contrasts in aphasia. Brain Lang. 6:47–51
Miller JD, Wier CC, Pastore RE, Kelly WJ,
Dooling RJ. 1976. Discrimination and labelling of noise-buzz sequences with varying
noise-lead times: an example of categorical
perception. J. Acoust. Soc. Am. 60:410–17
Miller JM, Towe AL. 1979. Audition: structural
and acoustical properties. In Physiology and
Biophysics, The Brain and Neural Function,
ed. T Ruch, HD Patton, 1:339–75. Philadelphia: Saunders. 20th ed.
Molfese DL, Betz JC. 1988. Electrophysiological indices of the early development of lateralization for language and cognition, and
their implications for predicting later language development. In Brain Lateralization
in Children: Developmental Implications,
ed. DL Molfese, SJ Segalowitz, pp. 171–90.
New York: Gilford
Neville HJ, Coffey SA, Holcomb PJ, Tallal P.
1993. The neurobiology of sensory and language processing in language-impaired children. J. Cogn. Neurosci. 5:235–53
Oscar-Berman M, Zurif E, Blumenstein S.
1975. Effects of unilateral brain damage on
the processing of speech sounds. Brain Lang.
2:345–55
Ojemann GA. 1984. Common cortical and thalamic mechanisms for language and motor
function. Am. J. Physiol. 246:R901–3
Ojemann GA. 1991. Cortical organization of
language. Neuroscience 11(8):2281–87
Palmer AR. 1982. Encoding of rapid amplitude fluctuations by cochlear nerve fibers
in the guinea pig. Arch. Otorhinolaryngol.
236:197–202
Perecman E, Kellar L. 1981. The effect of voice
and place among aphasia, nonaphasic right
damaged and normal subjects on a metalinguistic task. Brain Lang. 12:213–22
Petersen MR, Beecher MD, Zoloth SR, Moody
DB, Stebbins WC. 1978. Neural lateralization of species-specific vocalizations by
Japanese macaques (Macaca fuscata). Science 202:325–27
Petersen SE, Fiez JA. 1993. The processing of
single words studied with positron emission
tomography. Annu. Rev. Neurosci. 16:509–30
Pisoni DB. 1977. Identification and discrimination of the relative onset time of two component tones: implications for voicing perception in stops. J. Acoust. Soc. Amer. 61:1352–
61
Rees A, Moller AR. 1983. Responses of neurons
in the inferior colliculus of the rat to AM and
FM tones. Hear. Res. 10:301–30
NEUROBIOLOGY OF SPEECH PERCEPTION
Rosen GD, Press DM, Sherman GF, Galaburda
AM. 1992. The development of enduced
cerebrocortical microgyria in the rat. J. Neuropathol. Exp. Neurol. 51(6):601–11
Robin DA, Schienberg S. 1990. Subcortical lesions and aphasia. J. Speech Hear. Disord.
55:90–100
Rumsey JM, Andreason P, Zametkin AJ,
Aquino T, King AC, et al. 1992. Failure to
activate the left temporoparietal cortex in
dyslexia. Arch. Neurol. 49:527–34
Sams M, Aulanko R, Aaltonen O, Naatanen
R. 1990. Event-related potentials to infrequent changes in synthesized phonetic stimuli. Cogn. Neurosci. 2(4):344–57
Schreiner CE, Langner G. 1988. Coding of temporal patterns in the central auditory nervous
system. In Functions of the Auditory System:
Neurobiological Bases of Hearing, ed. GM
Edelman, WE Gall, WM Cowan, pp. 337–
61. New York: Wiley
Schreiner CE, Urbas JV. 1984. Functional
differentiation of cat auditory cortex areas
demonstrated using amplitude modulation.
Neurosci. Lett. 14:S334 (Suppl.)
Schreiner CE, Urbas JV. 1986. Representation
of amplitude modulation in the auditory cortex of the cat. I. The anterior auditory field
(AFF). Hear. Res. 21(3):227–41
Schreiner CE, Urbas JV. 1988. Representation
of amplitude modulation in the auditory cortex of the cat. II. Comparison between cortical fields. Hear. Res. 32(1):49–64
Schreiner CE, Urbas JV, Mehrgardt S. 1983.
Temporal resolution of amplitude modulation
and complex signals in the auditory cortex of
the cat. In Hearing: Physiological Bases and
Psychophysics, ed. R Klinke, R Hartman, pp.
169–75. Berlin: Springer-Verlag
Schwartz J, Tallal P. 1980. Rate of acoustic change may underlie hemispheric specialization for speech perception. Science
207:1380–81
Shannon RV, Zeng FG, Kamath V, Wygonski
J, Ekelid M. 1995. Speech recognition with
primary temporal cues. Science 270(5234):
303–4
Sharma A, Kraus N, McGee T, Carrell T,
Nicol T. 1993. Acoustic versus phonetic rep-
353
resentation of speech as reflected by the
mismatch negativity event-related potential.
Electroencephal. Clin. Neurophysiol. 88:64–
71
Shaywitz BA, Shaywitz SE, Pugh KR, Constable RT, et al. 1995. Sex differences in the
functional organization of the brain for language. Nature 373(6515):607–9
Stefanatos GA, Green GGR, Ratcliff GG.
1989. Neurophysiological evidence of auditory channel anomalies in developmental
dysphasia. Arch. Neurol. 46:871–75
Steinschneider M, Schroeder CE, Arezzo JC,
Vaughan HG. 1994. Speech-evoked activity in primary auditory cortex: effects of
voice onset time. Electroencephal. Clin. Neurophysiol. 92(1):30–43
Steinschneider M, Schroeder CE, Arezzo JC,
Vaughan HG. 1995. Physiologic correlates
of the voice onset time boundary in primary
auditory cortex (A1) of the awake monkey.
Brain Lang. 48(3):326–40
Tallal P, Jernigan T, Trauner D. 1994. Developmental bilateral damage to the head of
the caudate nuclei: implications for speechlanguage pathology. J. Med. Speech Lang.
Pathol. 2:23–28
Tallal P, Miller S, Bedi G, Byma G, Wang
X, et al. 1996. Language comprehension
in language-learning impaired children improved with acoustically modified speech.
Science 271:81–84
Tallal P, Miller S, Fitch RH. 1993. Neurobiological basis of speech: a case for the preeminence of temporal processing. Ann. NY Acad.
Sci. 682:27–47
Tallal P, Newcombe F. 1978. Impairment of auditory perception and language comprehension in dysphasia. Brain Lang. 5:13–24
Tallal P, Piercy M. 1973. Defects of non-verbal
auditory perception in children with developmental aphasia. Nature 241:468–69
Wilkins W, Wakefield J. 1995. Brain evolution and neurolinguistic preconditions. Behav. Brain Sci. 18:161–226
Zatorre RJ, Evans AC, Meyer E, Gjedde A.
1992. Lateralization of phonetic and pitch
discrimination in speech processing. Science
256:846–49