Nothing Special   »   [go: up one dir, main page]

Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 1.
Published in final edited form as: J Cogn Neurosci. 2020 May 19;32(9):1735–1748. doi: 10.1162/jocn_a_01581

Syllables in Sync Form a Link: Neural Phase-Locking Reflects Word Knowledge During Language Learning

Laura Batterink 1
PMCID: PMC7395883  NIHMSID: NIHMS1596124  PMID: 32427066

Abstract

Language is composed of small building blocks, which combine to form larger meaningful structures. To understand language, we must process, track and concatenate these building blocks into larger linguistic units as speech unfolds over time. An influential idea is that phase-locking of neural oscillations across different levels of linguistic structure provides a mechanism for this process. Building on this framework, the goal of the current study was to determine whether neural phase-locking occurs more robustly to novel linguistic items that are successfully learned and encoded into memory, compared to items that are not learned. Participants listened to a continuous speech stream composed of repeating nonsense words while their EEG was recorded, and then performed a recognition test on the component words. Neural phase-locking to individual words during the learning period strongly predicted the strength of subsequent word knowledge, suggesting that neural phase-locking indexes the subjective perception of specific linguistic items during real-time language learning. These findings support neural oscillatory models of language, demonstrating that words that are successfully perceived as functional units are tracked by oscillatory activity at the matching word rate. In contrast, words that are not learned are processed merely as a sequence of unrelated syllables, and thus not tracked by corresponding word-rate oscillations.

Introduction

A hallmark of human language is that it is composed of building blocks, which combine hierarchically to form an infinite number of meaningful expressions. For example, phonemes are combined into syllables, which in turn form words, then phrases and finally full sentences. During speech comprehension, the brain must simultaneously track and concatenate these different linguistic structures across time. Recent evidence suggests that this task may be accomplished through phase-locking of endogenous neural oscillations to linguistic units unfolding at different timescales (Giraud and Poeppel, 2012; Gross et al., 2013; Peelle and Davis, 2012). Neural phase-locking is established by demonstrating a consistent phase lag between recorded neural responses and some sort of external stimulus, as measured over time, over trials, or over participants (Peelle and Davis, 2012). According to several prominent models (Giraud and Poeppel, 2012; Peelle and Davis, 2012), ongoing neural oscillations phase-lock to linguistic segments occurring at different rates, and coupling between phase-locked signals at slow and faster frequencies supports the integration of smaller linguistic elements into larger units. The alignment of neural oscillations with a periodic or quasi-periodic stimulus stream has also been described as “neural tracking” or “neural entrainment” (Ding et al., 2016; Peelle and Davis, 2012, and is a general phenomenon that also occurs to non-linguistic stimuli.

Importantly, neural tracking of slow auditory fluctuations (< 10 Hz) does not merely reflect low-level acoustic features of a stimulus, but is sensitive to listeners’ abstract knowledge and subjective perceptions of a stimulus, as guided for example by imagined rhythms (Nozaradan et al., 2011, 2016) or syntactic rules (e.g., Ding et al., 2016, 2017). For instance, Ding and colleagues (2016, 2017) demonstrated that sequences of monosyllabic words, hierarchically organized into phrases and sentences, elicit spectral MEG/EEG peaks at frequencies corresponding to syllable, word and phrase presentation rates. Critically, phrasal and word peaks were observed only when sentences were presented in language known to participants, and not an unknown foreign language, indicating that neural tracking of higher-level units depends on language-specific knowledge.

Neural tracking of linguistic structures also emerges during language learning, reflecting the moment-by-moment acquisition of new linguistic representations. Getz and colleagues (2018) recorded learners’ MEG while they listened to a miniature artificial language that contained embedded phrases made up of words presented at an isochronous rate. Within 3.5 min of exposure, learners showed a robust spectral peak in MEG power at the phrase structure rate, reflecting phrase-level tracking. We found similar results in two recent statistical learning studies (Batterink and Paller, 2017, 2019), in which participants were exposed to an isochronous artificial speech stream composed of repeating trisyllabic nonsense words, concatenated together without pauses (e.g., tupirogolabu…). In both studies, neural phase-locking to the hidden component words increased over the exposure period and predicted performance on an implicit, reaction-time based measure of statistical word knowledge. Finally, Buiatti and colleagues (2009) also found neural tracking of trisyllabic repeating “AXC” pseudowords with nonadjacent dependencies, although only when words were cued by subliminal 25 ms pauses; this trisyllabic spectral EEG response also correlated with the number of correctly reported words. Taken together, these results indicate that neural tracking of embedded, novel linguistic structures emerges during the learning process and predicts subsequent linguistic knowledge at the behavioural level.

Notably, all of the above studies follow a “frequency-tagging” approach: the experimental stimuli are presented at a steady, isochronous rate, which drives the neural population that codes for the stimulus to oscillate at the same rate. Thus, clear peaks in the frequency spectrum of the recorded neural signal can be detected at the stimulus presentation frequency, providing a “frequency tag” to identify the associated brain response. While this approach offers an excellent signal-to-noise ratio relative to classical event-related potential analysis, it requires averaging the neural signal across continuous stimulation blocks, precluding the isolation of neural responses to individual items (e.g., tupiro versus golabu; from here on, item refers to the combined collection of individual instances of the same word or linguistic unit, at the single participant level). Thus, these past studies are unable to address whether the neural tracking of these isochronous linguistic structures reflects the specific learning of individual items in the artificial language, or whether this rhythmic neural response primarily reflects more general individual differences in statistical learning ability. Disentangling these two possibilities would provide novel and important insight into the functional significance of neural phase-locking responses during language learning. Below, I elaborate further on each of these two possibilities in turn.

According to the first possibility, neural phase-locking during language learning may directly reflect the discovery, perception and encoding of individual linguistic items (e.g., a word in speech). An individual item that is successfully learned should be perceived and categorized as a relevant linguistic segment rather than as an unrelated sequence of syllables, and should be represented by an underlying neural population that codes for this item as a meaningful chunk. Thus, better-learned items should elicit stronger neural phase-locking at the corresponding presentation rate. In contrast, an individual item that has not been learned would be encoded and represented only at the syllabic level, eliciting phase-locking at the syllable rate but not at the word rate. Under this scenario, the previous findings that better learners show higher neural entrainment over the learning period (Getz et al., 2018; Batterink & Paller, 2017, 2019; Buiatti et al., 2009) would be driven by these learners’ successful discovery of more total items in the language and/or their stronger representations for each item.

A second, alternative possibility is that neural tracking of rhythmic linguistic structures is not sensitive to item-level differences in learning, but is primarily driven by differences at the individual level. To illustrate, the neural entrainment response observed in previous studies (Getz et al., 2018; Batterink & Paller, 2017, 2019; Buiatti et al., 2009) may index an individual’s general sensitivity to rhythmic temporal patterns, which could represent a stable individual trait that in turn predicts statistical learning ability. In line with this idea, it has been shown that stronger endogenous neural entrainment at the beat frequency to auditory rhythms is associated with superior temporal prediction abilities (Nozaradan at al., 2016). Another recent study found that individuals vary distinctly in their sensitivity to external patterns, as assessed by whether they spontaneously synchronize their own speech to an isochronous speech rhythm (Assaneo et al., 2019). This individual predisposition predicts statistical learning performance, and also correlates with neuroanatomical and neurophysiological outcomes, suggesting that it is a stable individual trait. Given that previous studies showing a link between neural entrainment and statistical language learning all presented isochronous stimuli (Getz et al., 2018; Batterink & Paller, 2017, 2019; Buiatti et al., 2009), better learners may be those individuals who are generally more sensitive to temporal rhythms, or who demonstrate a high degree of spontaneous synchronization to external stimuli. These traits could produce both a higher neural entrainment response, as well as lead to better statistical learning performance on subsequent tests. Under this hypothesis, the contribution of individual items to variance in the neural tracking response would be negligible, and the relationship between neural entrainment and word learning would be driven primarily by stable individual differences (such as temporal prediction ability) that operate similarly across all items.

The goal of the present study was to address the question of whether neural phase-locking is sensitive to the discovery of individual items during language learning, thereby advancing our understanding of the functional significance of neural phase-locking in the context of language. In particular, I tested whether neural phase-locking during learning reflects acquired knowledge of specific words in an artificial language, as opposed to more generally indexing interindividual differences in processing that would operate similarly across all items. Following a classic statistical learning design, participants listened to a continuous stream composed of repeating nonsense words, and then completed a recognition test. I hypothesized that neural phase-locking to a given word during learning should be higher when that word is successfully perceived as a functional unit, such that phase-locking and subsequent recognition performance should correlate at the individual item level. Alternatively, if neural phase-locking during learning and subsequent word recognition correlate at the individual participant level (replicating prior findings), but not at the individual item level, this would provide evidence for the alternative hypothesis: the idea that neural phase-locking during statistical learning is primarily driven by interindividual differences that operate similarly across all items.

Materials and Methods

Participants

A total of 21 participants (10 female) contributed data to this study. Participants were recruited at Northwestern University and were paid $10/h. They were all fluent English speakers, between 19 to 23 years old (mean = 20.5 y) and had no history of neurological problems. The study was undertaken with the understanding and written consent of each participant.

Data from other tasks completed by this sample of participants have been reported in a previous publication (implicit training group; Batterink, Reber and Paller, 2015). Briefly, the goal of this previous study was to test whether the ability to predict incoming stimuli, a key function of statistical learning, can be enhanced through explicit training. This previous study did not analyze EEG data recorded during the statistical learning exposure period. This previous study also included an additional group of participants assigned to an explicit training condition, whose data are not included here.

Stimuli

Stimuli in the exposure phase were modelled after previous auditory statistical learning studies (e.g., Saffran et al., 1996, 1997). This language consists of 11 syllables combined to create six trisyllabic nonsense words (babupu, bupada, dutaba, patubi, pidabu, tutibu). Some members of the syllable inventory occur in more words than others, which produces varying transitional probabilities between the syllables within the words, as in natural language. Each nonsense word was repeated 300 times in pseudorandom order, with the restriction that the same word never occurred consecutively. Because the speech stream contained no pauses or other acoustic indications of word onsets, the only cues to word boundaries were transitional probabilities, which were higher within words than across word boundaries (cf. Saffran et al., 1996, 1997).

A speech synthesizer (Mac text-to-speech application, female voice “Victoria”) was used to generate a continuous speech stream composed of the six trisyllabic nonsense words. To achieve more natural sounding speech, speech synthesis technology makes use of automated techniques to produce acoustic variations in the speech output. Thus, as in speech produced by a human talker, individual tokens of a given word type in the speech stream were acoustically variable (e.g., each instance of “babupu” was not uttered in an identical manner). Descriptive statistics summarizing token durations for each of the six words are shown in Table 1. As can be seen from the table, there was considerable acoustic variability across the tokens within each word type. Across the synthesized stream, the average syllable to syllable latency was 235 ms (std = 39.8 ms).

Table 1.

Descriptive statistics for the durations of individual tokens within each of the six words in the artificial language. Token durations represent the total time between the onset of the first word to the onset of the subsequent word in the continuous speech stream.

Word Mean Duration (ms) Standard Deviation (ms) Min Duration (ms) Max Duration (ms) Word Presentation Rate for ITPC Calculation (Hz)
babupu 705 54 588 887 1.4
bupada 749 49 645 836 1.3
dutaba 658 48 576 746 1.5
patubi 694 36 587 818 1.4
pidabu 711 52 595 822 1.4
tutibu 708 24 644 756 1.4

The speech stream was edited to include a total of 31 pitch changes. Each pitch change represented either a 20-Hz increase or decrease from the baseline frequency (~160 Hz). Pitch changes occurred randomly, rather than systematically on certain syllables, and thus could not provide a cue for segmentation. Syllables that spanned pitch changes were excluded from EEG analysis. The detection of infrequent pitch changes was used as a cover task during the learning period, in order to ensure adequate attention to the auditory stimuli.

EEG event codes were sent at the onset of each syllable. The timing of syllable onsets in the continuous speech stream was determined by three trained raters using both auditory information and visual inspection of sound spectrographs, with the mean rating used. Any discrepancy >20 ms among one or more raters was resolved by a fourth independent rater. The first 30 syllables of the stream were not coded and thus not included in the analysis to avoid auditory onset effects.

For the recognition test phase, six nonword foils were created (batabu, bipabu, butipa, dupitu, pubada, tubuda). The nonwords consisted of syllables from the language’s syllable inventory that never directly followed each other in the speech stream, even across word boundaries. The frequency of individual syllables across words and nonword foils was matched.

Procedure

At the beginning of the experiment, participants were fitted with an elastic EEG cap embedded with electrodes. After EEG setup, participants were informed that they would listen to a speech stream of nonsense words. They were instructed that the speech stream contained occasional pitch changes and that they should detect these pitch changes using the keypad, with one button indicating a low pitch change and another indicating a high pitch change. To increase interest in the task, participants earned a small amount of additional money (12 cents) for each successfully detected pitch change. Due to technical issues, behavioural data for the pitch- detection task from two participants could not be analyzed. Overall, the remaining participants performed well on the pitch-detection task, detecting 94.6% (SD = 4.2%) of the 31 pitch changes. The total exposure stream of approximately 21 min was divided into three equal blocks and participants were given a brief break between each block.

After finishing the listening phase of the experiment, participants were informed that the nonsense language that they had just listened to was composed of individual words. They were then given a free recall task in which they were asked to recall the six words by writing them down on a piece of paper. Overall performance was very poor on this task (mode of 0 words correct across participants) and was not analyzed further.

Participants then completed a forced-choice recognition judgment task. Each trial included a word and a nonword foil. Participants gave two responses for each trial, (1) indicating which of the two sound strings sounded more like a word from the language, and (2) reporting on their awareness of memory retrieval, with “remember” indicating confidence based on retrieving specific information from the learning episode, “familiar” indicating a vague feeling of familiarity with no specific retrieval, and “guess” indicating no confidence in the selection. Each of the six words and six nonword foils were paired exhaustively for a total of 36 trials. In half of the trials the word was presented first, while in the other half the nonword foil was played first; presentation order for each individual trial (whether presented first/second) was counterbalanced across participants. Participants then completed a final speeded target detection task, designed to assess implicit memory of the syllable patterns. Behavioural and EEG data for the remember/familiar/guess component of the recognition test and for the target detection task have been analysed previously (Batterink et al., 2015) and are not included in the current paper.

Behavioural Data Analysis (2AFC Recognition Task).

For each participant (1–21) and word type (1–6), I computed a “recognition score,” which represents the total number of correct trials out of six for each word. These 126 values were then analyzed as the dependent variable in the main linear mixed effects model (described below), in order to examine the relationship between recognition and neural phase-locking at the item level.

In addition, a one-sample t-test was used to test whether recognition performance was above chance, with 50% correct representing chance-level performance. Finally, a repeated-measures ANOVA with word type (1–6) as a within-subjects factor was used to test whether recognition performance differed across the different component words of the language.

EEG Recording and Analysis

Recording and preprocessing.

EEG during the exposure phase was recorded with a sampling rate of 512 Hz from 64 Ag/AgCl-tipped electrodes attached to an electrode cap using the 10/20 system. Recordings were made with the Active-Two system (Biosemi, Amsterdam, The Netherlands), which does not require impedance measurements, an online reference, or gain adjustments. Additional electrodes were placed on the left and right mastoid, at the outer canthi of both eyes, and below both eyes. Scalp signals were recorded relative to the Common Mode Sense (CMS) active electrode and then re-referenced off-line to the algebraic average of the left and right mastoid.

All EEG analyses were carried out using EEGLAB (Delorme and Makeig, 2004). First, EEG data were band-pass filtered from 0.1 to 30 Hz. Sections of data in which no auditory cues were present (i.e., during breaks or pauses in the auditory stimulation) were removed from the continuous dataset. Next, the data were submitted to an automatic artifact correction procedure, based on the Artifact Subspace Reconstruction algorithm developed by Mullen and colleagues (Mullen et al., 2015), which is designed for the removal of occasional large-amplitude noise/artifacts. Critical parameters for the implementation of this algorithm were selected conservatively based on empirical testing and previously established guidelines (Chang et al., 2018; high-pass transition = 0.25 to 0.75 Hz, minimum channel correlation = 0.8, line noise = ‘off’; burst criterion= 5; window criterion = 0.25). This step resulted in the removal of the noisiest sections of data (mean = 1.4%, std = 1.6%) and interpolation of an average of 4.47 out of 64 scalp electrodes (std = 4.46). Next, for each participant and for each word type (1–6), epochs time-locked from −2.5 to 2.5 s relative to word onset were extracted from the continuous dataset, and baseline corrected using mean amplitude across the whole epoch, producing a total of 126 separate datasets.

Neural phase-locking analysis.

For each of the 126 item-level datasets, neural phase-locking was quantified by measuring inter-trial phase coherence (ITPC) across all epochs. ITPC is a measure of event-related phase locking. ITPC values range from 0, indicating purely non-phase-locked activity, to 1, indicating strictly phase-locked activity. A significant ITPC indicates that the EEG activity in single trials is phase-locked at a given time and frequency, rather than phase-random with respect to the time-locking experimental event. ITPC was computed using a continuous Morlet wavelet transformation from 0.3 to 6.0 Hz via the newtimef function of EEGLAB. Wavelet transformations were computed in 0.1 Hz steps with 1 cycle at the lowest frequency (0.3 Hz) and increasing by a scaling factor of 0.5, reaching 10 cycles at the highest frequency (6.0 Hz). A scaling factor of 0.5 indicates that width of the wavelet used for the highest frequency is half (0.5) the width of the wavelet used at the lowest frequency (Dickter & Kieffaber, 2014), allowing better frequency resolution at higher frequencies than wavelet approaches using constant cycle lengths (Delorme & Makeig, 2004). A total of 200 output times was computed for each frequency; timepoints spanned an interval from −643 to 643 ms, separated by an average of 6.5 ms. For each word type, a specific frequency of interest was selected, corresponding to the mean token duration (range across word types = 1.3 – 1.5 Hz; see Table 1 for specific frequency values by word type). Note that the decision to select the most appropriate frequency for each word type, rather than using a constant frequency across word types, had no major impact on the results, as the ITPC values at the different frequency bins of interest (1.3, 1.4, and 1.5 Hz) were highly correlated (mean r = 0.983). After artifact correction, an average of 268 trials contributed to each item-level dataset (range = 147 – 301; standard deviation = 26.5).

For each item, ITPC values were averaged from word onset to 576 ms, which corresponds to the minimum token duration across all words (see Table 1), in order to capture neural phase-locking across the full duration of each word. All 64 scalp electrodes were used in this calculation due to the widespread nature of the effect (see Results). These 126 average item-level ITPC values (henceforth referred to as “item-level ITPC”) were used as predictors in the main mixed effects model.

Statistical testing of item-level ITPC.

To test whether neural oscillations at the word presentation rate specifically track word identities, observed item-level ITPC values were compared against a null distribution of ITPC values. The null ITPC distribution – representing the null hypothesis that item-level ITPC is not higher to words than to pseudorandomly selected syllable triplets – was estimated by creating “surrogate” datasets for each participant and each word type. In order to control for word duration (both mean and standard deviation of durations of words in the actual datasets), surrogate datasets were created through an item-level matching procedure. For each “true” word in an actual item-level dataset, a randomly selected triplet was selected for assignment to the surrogate dataset, based on the following criteria: (1) the triplet was not simply another repetition of the true word and (2) the triplet had not already been selected previously for assignment to the surrogate dataset. The triplet with the closest duration to the true word was then selected from the pool of all candidate triplets that met these two criteria. In cases where more than one candidate triplet was an equally close match, the surrogate word was selected randomly from the closest candidates. Thus, this procedure ensured that within a given item-level dataset, onsets for each surrogate word occurred pseudorandomly across actual syllable positions and word identities. For each item-level surrogate dataset, ITPC at the corresponding word presentation rate was then computed as in the original analysis. This entire procedure was performed 100 times, producing a surrogate ITPC distribution of 100 group-averaged values for each word type. The critical ITPC value was defined as the value in this surrogate distribution corresponding to the 95th percentile (p < 0.05). If the observed item-level ITPC values within a given condition (i.e., word type and electrode) exceeded this critical value, ITPC was considered significant, providing evidence of word identity-specific phase-locking that exceeds phase-locking to randomly-selected syllable triplets in the stream.

Statistical testing of item-level ITPC by word type.

A linear mixed-effects model was used to test whether item-level ITPC differs as a function of word type. The model included word type (1–6) as a fixed effect, participant intercept as a random effect and item-level ITPC as the dependent variable, using maximum likelihood estimation. In addition, to explore potential acoustic factors driving any potential differences in ITPC between word types, a separate linear mixed-effects model tested whether mean word duration and variability in word duration across utterances predicted ITPC. This model included the mean token duration and standard deviation of token durations for each word as fixed effects (see Table 1 for values), participant intercept as a random effect and item-level ITPC as the dependent variable, using maximum likelihood estimation.

Statistical testing of relationship between item-level ITPC and recognition.

My main hypothesis was that neural phase-locking to each word in the speech stream should predict knowledge of that word, as assessed during the forced-choice recognition task. The underlying logic here is that, to the extent that a word is successfully “learned,” acoustically-variable tokens should be segmented from the continuous speech stream and perceived as a cohesive, functional unit. These units should in turn be tracked by neural oscillations at the word presentation rate, with better learned words eliciting stronger neural phase-locking across word tokens (Figure 1). Thus, ITPC should be higher to words that are better recognized compared to words that are more poorly recognized.

Figure 1.

Figure 1.

Summary of paradigm and analysis approach. (A) During the exposure period, a continuous speech stream made up of six repeating nonsense words was presented while listeners’ EEG was recorded. If statistical learning of a given component word occurs, acoustically-variable tokens should be perceived as functionally equivalent units and thus elicit more similar brain responses across word presentations. This similarity across word presentations is quantified by intertrial phase coherence (ITPC). (B) Knowledge of each of the six component words was assessed in a subsequent 2AFC recognition test. Across 36 trials, each word is pitted against six nonword foils, such that the maximum possible recognition score is 6.

A linear mixed-effects model was used to test whether recognition differs as a function of item-level ITPC. To account for the potential effect of acoustic factors on word learning, an initial model was run with maximum likelihood estimation including token variability (standard deviation of duration across tokens; Table 1), mean token duration (Table 1) and item-level ITPC as fixed effects, and recognition score as the dependent variable. The Wald Z statistic was used to estimate variance at the participant level and to test whether a random intercept for participant should be included in the model, with p > 0.05 indicating that a random effect is needed (Seltman, 2012). Because the Wald Z test was not significant (Wald Z = 0.97, p = 0.33), participant was not included as a random intercept in the model.

As discussed in the Introduction, one theoretical possibility is that an individual’s average neural phase-locking to all words in the language—indexing a general sensitivity to temporal patterns—could entirely account for the previously demonstrated relationship between phase-locking (or neural entrainment) and statistical learning outcomes (Batterink and Paller, 2017, 2019; Buiatti et al., 2009). I therefore tested whether item-level ITPC accounts for item-level word recognition over and above an individual’s average ITPC values. Each participant’s average word-rate ITPC (henceforth referred to as “average ITPC”) was computed by averaging the six item-level ITPC values included in the original analysis. To directly compare the predictive value of item-level ITPC to average ITPC in item-level recognition performance, I ran a linear mixed-effects model with recognition score as the dependent variable and both item-level ITPC and average ITPC as fixed effects. In addition, I tested a linear mixed-effects model with average ITPC as a fixed effect, and compared this model against the winning item-level ITPC model (as described above) using BIC. Finally, Spearman’s correlation was used to test the relationship between average ITPC and overall recognition performance across individuals, as a conceptual replication of previous findings (Batterink and Paller, 2017, 2019; Buiatti et al., 2009).

Time course analysis of ITPC.

To examine the time course of ITPC over the course of exposure to the artifiical language, a fine-grained time course analysis was carried out. Given previous evidence that statistical learning in the context of artificial speech segmentation paradigms occurs within 2 min in infants (Saffran et al., 1996) and that learning occurs most rapidly during early stages of exposure and follows a logarthmic curve, (Siegelman et al., 2018; Choi, Batterink, et al., 2020), I expected the most reliable changes in ITPC to occur relatively early on during exposure. Thus, the time course analysis was resticted to the first block of exposure (corresponding to roughly 7 min, or ~800 total word presentations). One participant was excluded from this analysis, as part of the first exposure block for this participant was not recorded due to experimenter error.

For each word type, single trial wavelet decompositions were computed and stored as complex coefficients using EEGLAB’s tfdata ouptut variable, using the same parameters as in the original analysis. To improve the signal-to-noise ratio of single trial estimates, every 10 consecutive trials were then grouped together using a moving window approach (e.g., trial 1–10, 2–11, 3–12, etc). Following our previous approach (Choi, Batterink et al., 2020), ITPC was then computed for each group of 10 consecutive trials. The key prediction here is that the phase of neural oscillations at the word rate should become more consistent if word learning occurs over the course of exposure, such that ITPC at the word rate should increase over time (see Batterink and Paller, 2017; 2019). A linear mixed-effects model was used to test whether item-level ITPC significantly increased over the course of the first block. The model included the word presentation number of the first trial in a given group of 10 trials (i.e., number of presented items within each word type; 1–80), word type (1–6), and the interaction between word type and number of word presentations as fixed factors, participant intercept as a random effect, and ITPC values for each group of 10 conseuctive trials as the dependent variable, using maximum likelihood estimation.

Temporal dynamics of ITPC.

Finally, to explore the temporal dynamics of neural phase-locking across the unfolding of a word over time, a separate analysis was run in which item-level ITPC was computed only within the corresponding word presentation rate bins of interest (i.e., 1.3 – 1.5 Hz; see Table 1), as well as at the average syllable presentation rate (4.3 Hz). Discarding lower frequencies from the analysis allowed for computing ITPC across a longer time-window (i.e., −1570 to 1570 ms, compared to −643 to 643 ms in the original analysis). All other parameters were matched to the original analysis parameters. The timecourse of ITPC at both the word and syllable frequencies was plotted across time in order to visualize the temporal trajectory of entrainment. A running paired t-test across all post-word onset time intervals was used to test at which timepoints ITPC values exceeded the prestimulus value (that is, the value occurring immediately prior to word onset).

All p values are from two-tailed tests with an alpha of .05. Greenhause–Geisser corrections are reported for factors with more than two levels.

Results

Behavioural Results (2AFC Recognition Test)

Recognition performance was significantly above chance across participants (mean = 59.3%, 11.7%, t(20) = 3.62, p = 0.002), providing evidence of successful word segmentation due to statistical learning. Recognition performance varied significantly across the six component words of the language words (word type: F(5,100) = 2.76, p = 0.049; Figure 2B), indicating that some words were learned better and recognized at higher levels than others.

Figure 2.

Figure 2.

Quantification of neural phase-locking to component words in the speech stream at the word presentation rate. (A) Intertrial phase coherence (ITPC) as a function of frequency, averaged across the six component words of the speech stream. Two peaks are observed, corresponding to the word rate (range of 1.3 – 1.5 Hz across words) and the syllable rate (mean of ~4.3 Hz). Shaded regions represent SEM. (B) ITPC and recognition score as a function of word type, across all participants. Across participants, words that showed higher ITPC values were also better recognized. (C) Distribution of word-rate ITPC across the scalp for each of the six words in the language. Electrode locations with significant word-rate ITPC as contrasted against the shuffled surrogate distribution are denoted with a black circular marker.

EEG Results (Exposure Period)

Item-level ITPC shows peaks at word and syllable frequencies.

The average of all item-level ITPC values is plotted as a function of frequency in Figure 2A. As shown in the figure, there is a clear ITPC peak in the frequency range corresponding to the average word rate of the speech stream, providing evidence of word-level neural phase-locking. A second peak between 4–5 Hz can also be seen, corresponding to the average syllabic rate of the speech stream.

Item-level ITPC shows significant tracking of word identities.

Across all electrodes, item-level ITPC within each word type was highly significant when tested against the null distribution of ITPC values, which reflects phase-locking to randomly-selected syllable triplets that were equated for duration (all p values < 0.01, with observed ITPC values exceeding the 99th percentile of the surrogate distribution). This result indicates that neural oscillations at the word rate phase-lock to words in the speech stream over and above phase-locking to random syllable triplets, providing evidence that individual-item ITPC tracks word identities.

Item-level ITPC varies as a function of word type.

Item-level ITPC varied significantly as a function of word type, with some words in the language eliciting significantly higher ITPC than other words (Effect of Word Type on Item-Level ITPC: F(5,30) = 10.6, p < 0.001. As shown in Figure 2B, word types that elicited greater ITPC also appeared to show higher recognition performance. Effects of word type and item-level ITPC on recognition performance were statistically tested through linear mixed effects modelling, with results described below (under “Item-level ITPC predicts item-level word recognition”). In addition, item-level ITPC varied significantly as a function of variability in token duration (Effect of Standard Deviation of Token Duration: F(1,40) = 8.59, p = 0.006) as well as average token duration (F(1,48) = 5.36, p = 0.025). Higher ITPC values were associated with higher duration variability and shorter mean durations across tokens.

The distribution of ITPC across the scalp for each word type is plotted in Figure 2C. As shown in the figure, ITPC values at a large majority of individual electrodes reached statistical significance when tested independently against the null distribution (p < 0.05). Overall, the distribution of ITPC across the scalp was relatively widespread, with a frontocentral maximum, consistent with an auditory response.

Item-level ITPC predicts item-level word recognition.

Critically, supporting my main prediction, the full three-predictor model indicated that item-level ITPC significantly predicted subsequent item-level word recognition (F(1,83) = 8.60, p = 0.004). Figure 3A shows item-level ITPC as a function of recognition score (1–6). The number of items that comprise each recognition score “bin” are as follows: score 1 = 13; score 2 = 22; score 3 = 21; score 4 = 28; score 5 = 24; score 6 = 16.

Figure 3.

Figure 3.

Relationship between ITPC and recognition score pooling across words and participants (A) ITPC predicts recognition score at the item level. (Here, an “item” is defined as the combined collection of individual instances of the same word at the single participant level; thus, each participant contributes 6 items total to the analysis). Items are binned by recognition score (1–6); note that a recognition score of 0 is not plotted because this bin contains only 2 items total. The dotted line represents the best linear fit between ITPC and recognition score. Error bars represent standard error of the mean. (B) ITPC as a function of frequency for high and low recognition items, divided by median split for data visualization purposes. High recognition items (n=68) have a recognition score of 4–6, and low recognition items (n=58) have a recognition score of 0–3. Shaded regions represent SEM. (C) Distribution of item-level ITPC across the scalp for high and low recognition items.

Word variability in token duration also significantly predicted recognition performance (Standard Deviation of Token Duration: F(1,57) = 5.23, p = 0.026). Mean token duration was not a significant predictor in the model (F(1,42) = 2.27, p = 0.14). Together, these results indicate that item-level ITPC and word variability both independently and positively predicted word learning (ITPC parameter estimate = 16.9, 95% CI = 5.44 – 28.4, SE = 5.76; word variability parameter estimate = 0.027, 95% CI = 0.0034 – 0.051; SE = 0.012).

The Wald Z test was not significant (Wald Z = 0.97, p = 0.33), indicating that unmeasured variance at the individual participant level did not significantly contribute to the model. When item-level ITPC was included as a single predictor in the model, the random effect of participant again did not significantly predict word recognition (Wald Z = 0.52, p = 0.60).

For visualization purposes, ITPC for the items with the highest recognition score (4–6; 68 items) and the lowest recognition score (0–4; 58 items), divided by median split, are plotted as a function of frequency and scalp distribution in Figure 3B and 3C.

Item-level ITPC predicts item-level word recognition better than average ITPC.

When both item-level ITPC and average ITPC (at the participant level) were included as predictors in the model, only item-level ITPC significantly predicted item-level word recognition (Item-level ITPC: F(1,91) = 5.45, p = 0.022; Overall ITPC = F(1,102) = 0.37, p = 0.55). This result indicates that item-level ITPC accounts for item-level word recognition over and above an individual’s average ITPC value. A model including average ITPC as a single predictor did significantly predict item-level word recognition (F(1,115) = 8.78, p = 0.004); however, this model performed more poorly than the comparison model in which item-level ITPC was used as a predictor (BIC for model with overall ITPC = 504.3; BIC for model with item-level ITPC = 500.7).

Across individuals, average ITPC predicted overall recognition performance (Spearman’s r = 0.44, p = 0.048), which conceptually replicates previous reports of correlations between neural entrainment and statistical learning at the individual level (Batterink and Paller, 2017, 2019; Buiatti et al., 2009). In sum, overall neural phase-locking at the individual level predicts statistical learning performance as measured by word recognition, but is not as good a predictor as item-level neural entrainment.

Item-level ITPC increases over the first block of exposure.

As shown in Figure 4, across all word types, item-level ITPC showed a significant increase over the first block of exposure (Effect of Number of Word Repetitions: F(1, 8854) = 7.32, p = 0.007). The item-level ITPC slope over time varied significantly as a function of word type (Word Type x Number of Word Repetitions: F(5,8852) = 8.08, p < 0.001). Follow-up analyses indicated that the two words associated with the highest recognition performance (i.e., bupada and dutaba) both independently showed significant increases in ITPC over exposure (both p values < 0.010), whereas the two words with the lowest recognition performance (babupu and tutibu) showed negative or marginally negative ITPC decreases over the first block of exposure (p = 0.023 – 0.053).

Figure 4.

Figure 4.

Modeled progression of word-rate ITPC by word as a function of exposure. Values are based on parameter estimates of fixed effects in the linear mixed- effects model within the first block (~7 min) of exposure. ITPC showed a significant increase as a function of exposure across all words, reflecting online learning. The change in ITPC over exposure varied significantly by word.

Temporal dynamics of item-level ITPC show early onset and relatively late peak.

As shown in Figure 5A, item-level ITPC at the word presentation rate peaked at approximately 420 ms post word onset, and significantly exceeded the prestimulus value from 86–701 ms (p < 0.05). ITPC during the prestimulus interval also showed a steady increase over time, which may be attributed to reduced temporal jitter relative to word onset over the prestimulus interval. In contrast, ITPC at the syllable presentation rate showed a much earlier peak, at approximately 110 ms (Figure 5B); however, these peak values were not significantly different from the prestimulus value. Again, ITPC during the prestimulus interval increased strongly over time, reflecting reduced temporal variability relative to syllable onset.

Figure 5.

Figure 5.

Temporal dynamics of item-level ITPC over the course of word presentation, relative to word onset. (A) Temporal evolution of item-level ITPC at the word rate. ITPC peaks approximately 418 ms after word onset. Shaded green regions around the line represent SEM. The shaded yellow region (spanning 86–701 ms) shows time intervals when ITPC significantly exceeds prestimulus baseline ITPC. (B) Temporal evolution of item-level ITPC at the syllable rate, demonstrating a much earlier peak (~110 ms).

Discussion

The current findings demonstrate a robust association between neural phase-locking and subsequent linguistic knowledge at the individual word level, providing novel evidence that phase-locking to “hidden” linguistic units in continuous speech delineates perceived linguistic boundaries on a word-by-word basis. Using a classical statistical learning task, participants were passively exposed to a continuous speech stream made up of repeating nonsense words. After the exposure period, learners’ memory of the nonsense words was assessed using an explicit 2AFC recognition test. Words that elicited stronger neural phase-locking during exposure, as quantified by ITPC, were recognized at higher rates on the subsequent memory test. Neural phase-locking at the word rate also significantly increased over the first block of the exposure period. These results indicate that neural phase-locking over repeated word presentations reflects the discovery, encoding and perception of individual linguistic items acquired as a result of statistical learning.

These findings support the hypothesis that continuous speech is segmented into meaningful functional units through nested, hierarchically organized neural oscillations (Giraud and Poeppel, 2012; Gross et al., 2013; Peelle and Davis, 2012). According to these models, speech is parsed into meaningful units by neural oscillations operating across a range of specific frequencies that match the rhythms of relevant linguistic components (e.g., phonemes, syllables, words, and phrases). Consistent with this idea, I found that neural phase-locking is higher to words that are successfully recognized, compared to those that are not. Presumably words with higher recognition performance were perceived as functional units and tracked by oscillatory activity at the matching word rate. In contrast, words with poor recognition performance were processed merely as a sequence of unrelated syllables rather than as a word unit, and thus were not tracked by corresponding word-rate oscillations.

Neural Entrainment and Word Knowledge May Interact Bidirectionally

The current results are correlational in nature and cannot directly disentangle the causality between neural phase-locking and word learning. However, based on previous findings, I propose that there may be bidirectional interactions between phase-locking and linguistic knowledge: (1) neural phase-locking to underlying patterns may influence the formation of high-level linguistic representations; and (2) word representations may exert a top-down influence on phase-locking of ongoing oscillations. The first idea—that modulation of neural phase may influence word learning—is supported by several recent transcranial alternating current stimulation studies (Riecke et al., 2018; Wilsch et al., 2018; Zoefel et al., 2018). By directly manipulating neural oscillations, these studies demonstrated that the phase lag between brain and speech rhythms influenced the neural responses to intelligible speech in superior temporal gyrus (Zoefel et al., 2018) as well as speech comprehension (Riecke et al., 2018; Wilsch et al., 2018). These results indicate that the phase alignment between neural oscillations and an ongoing speech signal plays a causal role in high-level speech processing, and by extension could also (in principle) influence speech segmentation and statistical word learning.

A major mechanism underlying neural phase-locking to speech is phase-resetting of low frequency oscillations in the auditory cortex to “acoustic landmarks” in the speech envelope, such as speech onsets or sharp acoustic transients (Doelling et al., 2014; Gross et al., 2013). Certain words may thus be more learnable because they contain acoustic features that elicit stronger or more consistent phase-resetting across word presentations. Consistent with this idea, in the present study I found that some words elicited higher ITPC than others, and that word-level differences in ITPC accounted for word-level differences in recognition (Figure 2B). Further, these word-level ITPC differences emerged very early on and were relatively stable across exposure; words that showed high ITPC values during early learning continued to elicit relatively higher ITPC values throughout the first block of exposure (Figure 4). These findings suggest that “baseline” acoustic features influence phase-locking, and that degree of phase-locking predicts whether a given word is more learnable. Over multiple word exposures, phase-locked oscillations at the word frequency could mediate the binding of syllables into larger temporal chunks, thereby supporting word learning. This idea is consistent with the proposal that neural entrainment functions to align phases of neural excitability to repeated temporal patterns, providing a mechanism for identifying specific patterns in upcoming sensory input (Schroeder and Lakatos, 2009).

A second possibility, which is not mutually exclusive, is that high-level word knowledge has a top-down influence on neural phase-locking. This idea is supported by the present finding that ITPC significantly increased over exposure, reflecting the gradual acquisition of word knowledge that in turn may facilitate predictive processing. This significant increase in phase-locking over time cannot be accounted for by bottom-up factors alone, given that the stimulus stream did not differ systematically over exposure, and replicates previous findings showing that neural phase-locking to words (or phrases) in an artificial language increases gradually over the course of exposure (Getz et al., 2018; Batterink & Paller, 2017, 2019; Choi, Batterink et al., 2020). Together, these results converge with mounting evidence that neural phase-locking is critically modulated by top-down processes such as selective attention and expectations (e.g., Ding and Simon, 2012; Horton et al., 2013; Lakatos et al., 2008; Rimmele et al., 2015; Zion Golumbic et al., 2013). Mechanistically, a recent MEG study demonstrated that top-down signals from frontal brain areas causally influences the phase of speech-coupled oscillations in auditory cortex, enhancing speech-brain coupling (Park et al., 2015). The idea that neural phase-locking is influenced by top-down processing is also compatible with theoretical proposals that neural phase-resetting provides an instrument for sensory selection by enabling phases of higher neural excitability to align with important stimulus events (Schroeder and Lakatos, 2009; Thut et al., 2012). In the context of speech segmentation, high-level word knowledge enables predictions to be made about upcoming syllables. In turn, these top-down predictions may function to optimally align ongoing neural oscillations with the most important or informative moments of the speech signal, acting to increase sensitivity to relevant acoustic cues and thereby facilitating speech processing (Peelle and Davis, 2012).

The current results also suggest that bottom-up acoustic factors may interact with statistical learning and top-down knowledge, with words that are the most initially “trackable” also benefitting the most from exposure and showing continual gains in learning (Figure 5). The time course analysis demonstrated that words with the highest ITPC estimates at the beginning of the exposure period (i.e., bupada and dutaba) showed significant increases in ITPC over exposure. In contrast, words with low initial-phase locking (i.e., babupu and tutibu) did not show increases in neural phase-locking over this period, suggesting that words that are not initially trackable (as measured by baseline ITPC estimates) do not benefit from exposure. In sum, both bottom-up and top-down factors appear to contribute to the observed relationship between item-level ITPC and subsequent word learning, and further, these different mechanisms are likely to interact with one another.

Word Variability Across Utterances Influences Both ITPC and Word Learning

At the behavioural level, I found a significant impact of word type on recognition performance, indicating that some words were more easily learned than others. This finding aligns well with previous findings that language-specific knowledge influences linguistic statistical learning, with words that more closely follow the phonotactic regularities of a participant’s native language being learned better (Finn & Hudson Kam, 2015; Siegelman et al, 2018). A more novel, unexpected finding was that words with more variable durations in the stream were learned better compared to words that had less variability. Some caution is warranted in interpreting this result, given that there were only six word in the language and that a full exploration of acoustic differences between word is beyond the scope of this paper. Nonetheless, this finding is consistent with prior evidence showing that variability facilitates speech learning and generalization to novel instances (e.g., Bradlow & Bent 2008; Clopper & Pisoni, 2004; Singh, 2008; Greenspan, Nusbaum, & Pisoni, 1988). For example, the perception of new sentences produced with synthetic speech improves when participants are exposed to a larger set of training stimuli compared to a restricted set (Greenspan, Nusbaum, & Pisoni, 1988). Within the context of statistical learning, Gomez (2002) demonstrated that infants’ and adults’ learning of nonadjacent dependencies (e.g., pel-X-jic) depends on sufficient variability, occurring only when the middle, nonpredictive element of the dependency (i.e., X) is drawn from a sufficiently large pool. Taken together, these results indicate that exposure to a greater variety of exemplars allows learners to better ignore irrelevant features and identify the most predictable, informative or invariant structures in a stimulus stream. In the current study, words with greater variability across utterances may promote the acquisition of more abstract word representations, as opposed to more specific, stimulus-based, acoustic representations (Vouloumanos et al., 2012). This in turn may facilitate generalization and better recognition performance when the same word presented in a new context (i.e. in isolation during the 2AFC recognition task, rather than embedded in a continuous speech stream as during the exposure phase).

Word variability in duration across utterances also influenced ITPC, with greater word variability predicting higher ITPC values. This finding does not follow from a straightforward bottom-up mechanistic account of neural phase-locking, which would predict that ITPC should be higher to items that have a more stable (i.e. less variable) duration across presentations. Rather, this finding suggests that greater word variability facilitates word learning, which in turn leads to stronger phase-locking to the embedded words, providing additional support for top-down influences on neural phase-locking.

Neural Mechanisms Underlying Statistical Learning

On a more specific note, the current findings also provide new insights into the neural mechanisms that underlie statistical learning in the context of word segmentation, extending previous work in this area. As described in the Introduction, prior studies have shown that neural tracking of repeating nonsense words predicts statistical learning performance on subsequent behavioural tests at the individual level; participants who show stronger neural entrainment responses to the underlying linguistic structures perform better on subsequent learning tests (Batterink and Paller, 2017, 2019; Buiatti et al., 2009). The current results conceptually replicate these results, demonstrating that average ITPC during learning predicts subsequent overall recognition performance across individuals. At the same time, the current findings go beyond a demonstration of interindividual correlations, showing that item-level neural entrainment predicted item-level recognition more strongly than individual-level average entrainment. Further, the effect of individual participant did not significantly account for variability in item-level word recognition when item-level ITPC was accounted for.

Taken together, these results indicate that neural phase-locking in the context of language learning primarily reflects the discovery and perception of individual items in the language inventory, rather than indexing more general interindividual differences that would operate similarly across all items, such as the tendency to “spontaneously synchronize” one’s behaviour to external stimuli (Assaneo et al., 2019). In other words, it appears that the previously documented relationship between neural entrainment and statistical learning performance (Batterink and Paller, 2017, 2019; Buiatti et al., 2009) primarily reflects the specific content of linguistic knowledge, and can be accounted for by better learners’ higher rates of word learning.

ITPC results also hint that learners may have engaged in a suboptimal parsing strategy for words that were not successfully learned (i.e., “lowest recognition items” with a score of 50% accuracy or below on the 2AFC test a recognition score). As shown in Figure 3B, low recognition items show a peak at approximately 2.1 Hz, which corresponds to the average bigram rate in the speech stream. This finding suggests that poorly learned items may be parsed as bigrams on some proportion of trials. For example, for a triplet such as “babupu,” participants may segment the bigram “babu” on some occurrences, “bupu” on other occurrences, and neither possible bigram on still other occurrences. Across all trials, this would produce a weak signature of bigram tracking. Because overall ITPC values corresponding to the bigram presentation rate are similar across highest recognition and lowest recognition items (Figure 3B), it appears that some (relatively weak) degree of erroneous bigram parsing also occurs for better learned words. This finding highlights that ITPC at the word rate specifically is a signature of statistical word learning, and that phase-locking at other low frequencies more generally (< 10 Hz) does not distinguish between better learned and poorly learned items.

Temporal Trajectory of Item-Level Neural Entrainment

The temporal dynamics of ITPC provide additional insights into the neural mechanisms that support statistical learning of novel words. As a given word unfolds, ITPC showed a steep increase beginning immediately after word onset (see Figure 5). The rapid nature of this effect converges with previous demonstrations that word onsets modulate ongoing neural responses very quickly. For example, a recent MEG study modelled neural responses to continuous narrative speech, and found a highly significant effect of word onsets with a peak latency of 103 ms (Brodbeck et al., 2018). This finding was interpreted as evidence that word boundaries are detected essentially as they occur, rather than after incorporating cues occurring subsequent to word onset. Similarly, an ERP study found an early sensory-related N100 effect to onsets of nonsense words in continuous speech (Sanders et al., 2002). This effect was observed only in learners who showed the strongest behavioural evidence of word knowledge, suggesting that high-level linguistic knowledge is a prerequisite for this early response. In the context of neural entrainment frameworks (e.g., Schroeder and Lakatos, 2009), word onsets may represent privileged sensory events, as syllables occurring at the beginning of a word are relatively information-rich and highly predictive of subsequent syllables. Successful word learning may therefore be accompanied by the rapid alignment of neural oscillations to these informative word onsets.

Although showing a rapid increase soon after word onset, the neural entrainment response did not peak until ~420 ms, which coincides to roughly 200 ms after the onset of the second syllable. This peak was followed by a decrease in entrainment, which statistically reached baseline levels by 701 ms, very close to the mean word duration of 704 ms (see Table 1). A similar neural tracking trajectory was reported by Ding and colleagues (2016; see their Figure 4), who found that neural activity reached its peak during the second word of artificial grammar phrases, and then progressively decreased with each additional word in the phrase. Taken together, these findings indicate that ITPC tracks the entire time course of a higher-level unit, rather than being a transient response occurring only at unit boundaries (cf. Ding et al., 2016). It is also interesting to note that the observed peak in ITPC is similar to the typical latency of the N400 effect (Kutas and Federmeier, 2011). This suggests that neural tracking of a given word may decline once the word is processed to the point of recognition.

In contrast to ITPC at the word presentation rate, ITPC at the syllable rate peaked very soon after word onset (~110 ms; Figure 5B). Overall, the trajectory of neural entrainment at the syllable rate resembles a symmetrical, steep parabolic curve centered just after word onset, consistent with a sensory-evoked response that is not strongly modulated by high-level knowledge.

Conclusions

In summary, the main finding of the study is that neural phase-locking accompanies an individual’s subjective perception of an individual word in continuous speech, as acquired in real time during statistical learning. These results indicate that the association between neural phase-locking and statistical learning is not limited to perfectly isochronous syllable sequences (e.g., Batterink and Paller, 2017, 2019; Buiatti et al., 2009; Ding et al., 2016; Getz et al., 2018), but is generalizable to continuous speech containing nonidentical word tokens. The demonstration that neural phase-locking is sensitive to recognition strength of individual words opens up the possibility of tracking the contents of learning in real time. For example, by monitoring the EEG of language learners exposed to a continuous stream of foreign language input, it may be possible to predict which words have been successfully learned and which words require additional training. This neural phase-locking approach may also be applied to investigate other aspects of language that involve the concatenation of smaller linguistic elements into larger units, such as the learning and processing of grammatical rules, as well as perceptual aspects of language acquisition, such as phonetic category learning. Thus, new applications of this approach may significantly advance our understanding of other neural mechanisms underlying language acquisition and processing.

Acknowledgments

The data in this paper were collected in the laboratory of Dr. Ken A. Paller, and I am grateful for his support. This work was supported by the National Institutes of Health (NIH grants T32 NS 047987 and F32 HD 078223).

References

  1. Assaneo MF, Ripollés P, Orpella J, Lin WM, de Diego-Balaguer R, and Poeppel D (2019). Spontaneous synchronization to speech reveals neural mechanisms facilitating language learning. Nature Neuroscience 22, 627–632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Batterink LJ, and Paller KA (2017). Online neural monitoring of statistical learning. Cortex 90, 31–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Batterink LJ, and Paller KA (2019). Statistical learning of speech regularities can occur outside the focus of attention. Cortex 115, 56–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Batterink LJ, Reber PJ, and Paller KA (2015). Functional differences between statistical learning with and without explicit training. Learning & Memory 22, 544–556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bradlow AR, & Bent T Perceptual adaptation to non-native speech. Cognition, 106, 707–729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brodbeck C, Hong LE, and Simon JZ (2018). Rapid Transformation from Auditory to Linguistic Representations of Continuous Speech. Current Biology 28, 3976–3983.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Buiatti M, Pena M, and Dehaenelambertz G (2009). Investigating the neural correlates of continuous speech computation with frequency-tagged neuroelectric responses. NeuroImage 44, 509–519. [DOI] [PubMed] [Google Scholar]
  8. Choi D*., Batterink L*, Black AK, Paller K, & Werker JF (2020). Prelingual infants discover statistical word patterns at similar rates as adults: evidence from neural entrainment. 10.31234/osf.io/fuqd2 *equal contributions [DOI] [PubMed] [Google Scholar]
  9. Clopper CG, Pisoni DB. Effects of talker variability on perceptual learning of dialects. Lang Speech. 2004;47(Pt 3):207–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Delorme A, and Makeig S (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods 134, 9–21. [DOI] [PubMed] [Google Scholar]
  11. Ding N, and Simon JZ (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences 109, 11854–11859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ding N, Melloni L, Zhang H, Tian X, and Poeppel D (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience 19, 158–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Ding N, Melloni L, Yang A, Wang Y, Zhang W, and Poeppel D (2017). Characterizing Neural Entrainment to Hierarchical Linguistic Units using Electroencephalography (EEG). Frontiers in Human Neuroscience 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Doelling KB, Arnal LH, Ghitza O, and Poeppel D (2014). Acoustic landmarks drive delta–theta oscillations to enable speech comprehension by facilitating perceptual parsing. NeuroImage 85, 761–768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Finn AS, & Hudson Kam CL (2015). Why segmentation matters: Experience-driven segmentation errors impair “morpheme” learning. Journal of Experimental Psychology: Learning, Memory and Cognition, 41, 1560–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Getz H, Ding N, Newport EL, and Poeppel D (2018). Cortical tracking of constituent structure in language acquisition. Cognition 181, 135–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Giraud A-L, and Poeppel D (2012). Cortical oscillations and speech processing: emerging computational principles and operations. Nat Neurosci 15, 511–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Gomez R (2012). Variability and Detection of Invariant Structure. Psychological Science 2002 13: 431–436. [DOI] [PubMed] [Google Scholar]
  19. Greenspan SL, Nusbaum HC, & Pisoni DB (1988). Perceptual learning of synthetic speech produced by rule. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 421–433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Gross J, Hoogenboom N, Thut G, Schyns P, Panzeri S, Belin P, and Garrod S (2013). Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain. PLoS Biology 11, e1001752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Horton C, D’Zmura M, and Srinivasan R (2013). Suppression of competing speech through entrainment of cortical oscillations. Journal of Neurophysiology 109, 3082–3093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kutas M, and Federmeier KD (2011). Thirty Years and Counting: Finding Meaning in the N400 Component of the Event-Related Brain Potential (ERP). Annu. Rev. Psychol 62, 621–647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lakatos P, Karmos G, Mehta AD, Ulbert I, and Schroeder CE (2008). Entrainment of Neuronal Oscillations as a Mechanism of Attentional Selection. Science 320, 110–113. [DOI] [PubMed] [Google Scholar]
  24. Mullen TR, Kothe CAE, Chi YM, Ojeda A, Kerth T, Makeig S, Jung T-P, and Cauwenberghs G (2015). Real-time neuroimaging and cognitive monitoring using wearable dry EEG. IEEE Trans. Biomed. Eng 62, 2553–2567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Nozaradan S, Peretz I, Missal M, and Mouraux A (2011). Tagging the Neuronal Entrainment to Beat and Meter. Journal of Neuroscience 31, 10234–10240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Nozaradan S, Peretz I, and Keller PE (2016). Individual Differences in Rhythmic Cortical Entrainment Correlate with Predictive Behavior in Sensorimotor Synchronization. Scientific Reports 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Park H, Ince RAA, Schyns PG, Thut G, and Gross J (2015). Frontal Top-Down Signals Increase Coupling of Auditory Low-Frequency Oscillations to Continuous Speech in Human Listeners. Current Biology 25, 1649–1653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Peelle JE, and Davis MH (2012). Neural Oscillations Carry Speech Rhythm through to Comprehension. Front. Psychology 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Riecke L, Formisano E, Sorger B, Başkent D, and Gaudrain E (2018). Neural Entrainment to Speech Modulates Speech Intelligibility. Current Biology 28, 161–169.e5. [DOI] [PubMed] [Google Scholar]
  30. Rimmele JM, Zion Golumbic E, Schröger E, and Poeppel D (2015). The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene. Cortex 68, 144–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Saffran JR, Newport EL, and Aslin RN (1996). Word Segmentation: The Role of Distributional Cues. Journal of Memory and Language 35, 606–621. [Google Scholar]
  32. Saffran JR, Newport EL, Aslin RN, Tunick RA, and Barrueco S (1997). Incidental Language Learning: Listening (and Learning) Out of the Corner of Your Ear. Psychological Science 8, 101–105. [Google Scholar]
  33. Sanders LD, Newport EL, and Neville HJ (2002). Segmenting nonsense: an event-related potential index of perceived onsets in continuous speech. Nature Neuroscience 5, 700–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Schroeder CE, and Lakatos P (2009). Low-frequency neuronal oscillations as instruments of sensory selection. Trends in Neurosciences 32, 9–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Seltman H Experimental Design and Analysis (Carnegie Mellon University; ). [Google Scholar]
  36. Siegelman N, Bogaerts L, Elazar A, Arciuli J, & Frost R (2018). Linguistic entrenchment: Prior knowledge impacts statistical learning performance. Cognition, 177, 198–213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Siegelman N, Bogaerts L, Kronenfeld O, & Frost R (2018). Redefining “learning” in statistical learning: What does an online measure reveal about the assimilation of visual regularities? Cognitive science, 42, 692–727 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Singh L (2008). Influences of High and Low Variability on Infant Word Recognition. Cognition, 106, 833–870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Thut G, Miniussi C, and Gross J (2012). The Functional Importance of Rhythmic Activity in the Brain. Current Biology 22, R658–R663. [DOI] [PubMed] [Google Scholar]
  40. Vouloumanos A, Brosseau-Liard PE, Balaban E & Hager A (2012). Are the products of statistical learning abstract or stimulus-specific? Frontiers in Psychology, 3:70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Wilsch A, Neuling T, Obleser J, and Herrmann CS (2018). Transcranial alternating current stimulation with speech envelopes modulates speech comprehension. NeuroImage 172, 766–774. [DOI] [PubMed] [Google Scholar]
  42. Zion Golumbic EM, Ding N, Bickel S, Lakatos P, Schevon CA, McKhann GM, Goodman RR, Emerson R, Mehta AD, Simon JZ, et al. (2013). Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party.” Neuron 77, 980–991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Zoefel B, Archer-Boyd A, and Davis MH (2018). Phase Entrainment of Brain Oscillations Causally Modulates Neural Responses to Intelligible Speech. Current Biology 28, 401–408.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES