Highlights
-
• Bilinguals and monolinguals heard two speech sounds in two language contexts.
-
• Speech sounds represented two phonemic categories in English and one in Spanish.
-
• Bilinguals exhibited interference from the non-target language.
-
• The Executive Control Network (ECN) was active in both language contexts.
-
• The ECN regulates parallel activation and adjusts processing pathways flexibly.
1. Introduction
The concept of language activation and selection is central to bilingualism research, as it involves the ability to actively use one language while inhibiting another. The present investigation examines bilinguals’ brain mechanisms in language selection during the perception of speech sounds that have no lexical meaning but compete for phonological representations across languages.
Speech perception centers on the mechanisms of activation, processing, and representation in the brain. In the auditory domain, speech perception typically follows a sequential pathway that is driven purely by the speech signal (bottom-up), but prior knowledge or expectations can refine the perception of individual sounds (top-down) (McClelland & Elman, Reference McClelland and Elman1986; see Norris et al., Reference Norris, McQueen and Cutler2000 for an opposite idea). Using a purely speech-driven approach, activation is understood as the process by which certain auditory dimensions or cues are prioritized or activated during the categorization of sounds (Holt & Lotto, Reference Holt and Lotto2006; Poeppel et al., Reference Poeppel, Idsardi and van Wassenhove2008). The concept of processing refers to how the auditory system interprets and categorizes input, involving the neural and computational mechanisms that operate on representations (acoustic, phonetic, phonological, and lexical) to facilitate speech perception and production (Li et al., Reference Li, Menon and Allen2010; Obleser et al., Reference Obleser, Scott and Eulitz2005; Poeppel et al., Reference Poeppel, Idsardi and van Wassenhove2008). Finally, the concept of representation involves the mental encoding and storage of phonetic, acoustic, phonological, and lexical information within speech processing. It describes how different aspects of speech signals are encoded in the brain, playing a crucial role in transforming acoustic input into meaningful linguistic information (Hickok & Poeppel, Reference Hickok and Poeppel2007). These representations, stored and organized in the brain, provide the foundation for recognizing and producing language and are shaped by linguistic experience.
Central to bilingualism research is the concept of language activation and selection, which entails the ability to use one language while inhibiting the other, introducing the notion of language control. This implies that, alongside speech perception processes, language selection mechanisms operate concurrently in the bilingual brain and must be regulated by a language control mechanism. The “Adaptive Control Hypothesis” proposed by Green and Abutalebi (Reference Green and Abutalebi2013) describes activation, processing, and representation in the context of language use in bilinguals. Activation refers to the engagement of specific neural areas during language selection, with both languages being simultaneously activated in the bilingual mind, even when only one is used. This concurrent activation creates competition between languages and requires control processes (e.g., the Executive Control Network [ECN]) to manage the selection of the appropriate language for a given context. Processing is defined as the mechanism by which bilinguals select and control representations in working memory, ensuring alignment with communicative goals. This involves various cognitive control processes, including conflict monitoring, executive control, interference suppression, and task switching, all adapting based on the interactional context. Finally, representation in bilinguals pertains to the verbal and nonverbal representations maintained in working memory to achieve communicative goals, encompassing the full range of both languages’ elements, from words and syntax to concepts.
The present investigation explores speech sounds with different mental representations across languages (e.g., the same sound representing the phoneme [k] in Spanish and [g] in English). Since the representations of these sounds compete for phonemic membership, we will present them in two language contexts (Spanish and English) to determine whether control mechanisms are involved in their perception, even when only one language is being used. We will rely on Event-Related Potentials (ERPs) to assess brain electrical activity associated with perceptual and cognitive processes (Luck, Reference Luck2014) and rely on standardized Low Resolution Brain Electromagnetic Tomography (sLORETA) (Pascual-Marqui, Reference Pascual-Marqui2002; Pascual-Marqui et al., Reference Pascual-Marqui, Michel and Lehmann1994) to pinpoint the brain areas associated with those processes. This approach allows us to infer how phonemic sounds are represented in both languages. Specifically, sLORETA will help determine whether the discrimination of speech sounds with competing phonemic representations is managed by control mechanisms, such as the ECN.
Even though there is a well-established body of research showing that bilinguals rely on a control mechanism when accessing competing lexical representations across languages (Abutalebi et al., Reference Abutalebi, Annoni, Zimine, Pegna, Seghier, Lee-Jahnke, Lazeyras, Cappa and Khateb2007; Abutalebi et al., Reference Abutalebi, Della Rosa, Green, Hernandez, Scifo, Keim, Cappa and Costa2011; Garbin et al., Reference Garbin, Costa, Sanjuan, Forn, Rodriguez-Pujadas, Ventura, Belloch, Hernandez and Ávila2011; Green, Reference Green1998; Green & Abutalebi, Reference Green and Abutalebi2013; Liu et al., Reference Liu, Jiao, Li, Timmer and Wang2021; Marian et al., Reference Marian, Chabal, Bartolotti, Bradley and Hernandez2014; Marian et al., Reference Marian, Bartolotti, Rochanavibhata, Bradley and Hernandez2017; Perani et al., Reference Perani, Abutalebi, Paulesu, Brambati, Scifo, Cappa and Fazio2003; Rodríguez-Pujadas et al., Reference Rodríguez-Pujadas, Sanjuán, Ventura-Campos, Román, Martin, Barceló, Costa and Avila2013; Shen et al., Reference Shen, Welton, Lyon, McCorkindale, Sutherland, Burnham, Fripp, Martins and Grieve2020; Sulpizio et al., Reference Sulpizio, Del Maschio, Fedeli and Abutalebi2020), most speech perception models have only recently begun to incorporate this aspect into their frameworks. For instance, the L2LP model (Escudero, Reference Escudero2005; Escudero et al., Reference Escudero, Benders and Lipski2009; Escudero & Boersma, Reference Escudero and Boersma2004; Van Leussen & Escudero, Reference Van Leussen and Escudero2015) and the Speech Learning Model (SLM) (Flege, Reference Flege and Strange1995; Flege et al., Reference Flege, Schirru and MacKay2003), along with its revised version (SLM-r; Bohn & Flege, Reference Bohn, Flege and Wayland2021), provide theoretical frameworks for understanding how second-language (L2) learners perceive and acquire new sounds. While these models emphasize the influence of a learner’s first language (L1) on L2 perception, they differ in their explanations of how L2 sound categories are formed and stabilized. Only recently has the L2LP model explicitly integrated language control mechanisms into its framework, as proposed in Escudero and Yazawa (Reference Escudero, Yazawa and Amengual2024) through the Language Mode Activation Hypothesis, which suggests that in the final stage of acquisition, both L1 and L2 perceptual systems remain active, with their influence varying depending on proficiency, language exposure, and cognitive control mechanisms. The present investigation aims to empirically test this hypothesis and determine whether bilingual language selection operates under this newly proposed framework.
The L2LP model (Escudero, Reference Escudero2005) and the SLM (Flege, Reference Flege and Strange1995; Flege et al., Reference Flege, Schirru and MacKay2003) offer theoretical frameworks to explain how learners acquire phonetic categories in an L2, with a particular focus on the influence of the first language. Both models recognize that L2 learners initially rely heavily on their existing L1 phonetic system when encountering L2 sounds, mapping these new sounds onto the closest L1 equivalents. This reliance often leads to perceptual difficulties, especially when the L2 contains sounds absent in the L1, as unfamiliar L2 sounds are assimilated into existing L1 categories (e.g., English [g] and [k] are not contrastive in Spanish). Consequently, the accuracy of both perception and production in the L2 is affected since the nuanced distinctions of the L2 sounds may not be accurately captured.
Despite these foundational similarities, the L2LP and SLM diverge significantly in their explanations of the mechanisms underlying L2 phonetic acquisition and in their predictions about the ultimate outcomes for learners. The L2LP model employs the Gradual Learning Algorithm (GLA) (Boersma, Reference Boersma1998; Escudero, Reference Escudero2005) to simulate L2 development, mirroring the sequential processes observed in L1 acquisition. This approach involves perceptual learning, where learners adjust to new phonetic inputs, and representational learning, which entails forming new phonological categories. The L2LP model is optimistic about adult learners’ potential to achieve native-like perception through appropriate input and learning mechanisms, emphasizing that rich and sufficient L2 input can compensate for any reduced neural plasticity compared to children.
In contrast, the SLM proposes two specific mechanisms of interaction between L1 and L2 phonetic systems: category assimilation and category dissimilation. Category assimilation occurs when L2 sounds are not sufficiently distinct from L1 sounds, leading to merged phonetic categories (Davidian & Flege, Reference Davidian and Flege1984; Flege, Reference Flege1991; Flege, Reference Flege1992; Flege, Reference Flege1992; Flege et al., Reference Flege, Schirru and MacKay2003; Williams, Reference Williams1977). In this scenario, L2 sounds are consistently mapped onto existing L1 categories, hindering the formation of new, distinct L2 categories. For example, native speakers of French or Spanish may produce the English /t/ with an intermediate voice onset time (VOT) (Lisker & Abramson, Reference Lisker and Abramson1964; Abramson & Lisker, Reference Abramson and Lisker1967), value blending characteristics of both languages (Flege et al., Reference Flege, Schirru and MacKay2003). Category dissimilation happens when new L2 categories are established and diverge from L1 categories in common phonetic space to maintain perceptual contrast (Lindblom, Reference Lindblom, Hardcastle and Marchal1990). This mechanism reflects the learner’s effort to enhance the distinction between L2 sounds and their L1 counterparts by minimizing overlap within the shared phonetic space (Flege & Eefting, Reference Flege and Eefting1987a).
Relevant to the present investigation, the two models differ in their conceptualization of language activation and the role of contextual influence. The SLM emphasizes integration and interaction within a unified phonetic space (Lindblom, Reference Lindblom, Hardcastle and Marchal1990) but does not explicitly address how linguistic context modulates the activation levels of each language. While the SLM considers the possibility of mutual influence in a common phonetic space, where L1 categories affect L2 perception and production, and vice versa, it does not explore the contextual activation of separate linguistic systems. In contrast, the L2LP model introduces the Language Mode Activation Hypothesis, an extension of Grosjean’s Language Mode Framework (Reference Grosjean1998, Reference Grosjean and Nicol2001, Reference Grosjean2008), which posits that bilinguals modulate the activation levels of their languages along a continuum. This continuum ranges from a monolingual mode, where one language dominates, to a bilingual mode, where both languages are integrated based on contextual cues. The latest account of L2LP incorporates cognitive control as a key component of this hypothesis, proposing that L2 learners can activate both L1 and L2 perceptual systems to varying degrees depending on linguistic demands. This framework allows for parallel or selective activation of two grammars, enabling bilinguals to dynamically manage perceptual resources in real time. Additionally, the Language Mode Activation Hypothesis offers an alternative explanation for the merged phonetic categories proposed by SLM. Rather than viewing assimilation as a fixed process, it suggests that simultaneous activation of both languages can lead to intermediate perceptual representations, reflecting a dynamic interplay between the two languages within the learner’s cognitive framework.
There is extensive behavioral research demonstrating that a specific language context can establish a language mode and alter the phonemic categorization of speech sounds with competing representations. For instance, García-Sierra et al. (Reference García-Sierra, Diehl and Champlin2009) presented a speech continuum ranging from /ga/ to /ka/ by manipulating VOT. The VOT continuum was presented to bilingual Spanish-English speakers and monolingual English speakers in separate Spanish and English contexts. The results showed that bilinguals, but not monolinguals, shifted their phonemic boundaries based on the active language (i.e., perceiving more /ga/ sounds in the English context than in the Spanish context). This led to the concept of a “double phonemic boundary” or “double phonemic representation” (García-Sierra et al., Reference García-Sierra, Ramírez-Esparza, Silva-Pereyra, Siard and Champlin2012), demonstrating bilinguals’ ability to shift phonemic boundaries based on the active language. The concept has been examined across various language pairs using VOT continua, including English and Spanish (Casillas & Simonet, Reference Casillas and Simonet2018; Elman et al., Reference Elman, Diehl and Buchwald1977; García-Sierra et al., Reference García-Sierra, Schifano, Duncan and Fish2021; Gonzales & Lotto, Reference Gonzales and Lotto2013; Lozano-Argüelles et al., Reference Lozano-Argüelles, Fernández Arroyo, Rodríguez, Durand López, Garrido Pozú, Markovits, Varela, de Rocafiguera and Casillas2021; Wig & García-Sierra, Reference Wig and García-Sierra2020; Williams, Reference Williams1977), English and French (Caramazza et al., Reference Caramazza, Yeni-Komshian, Zurif and Carbone1973; Gonzales et al., Reference Gonzales, Byers-Heinlein and Lotto2019; Hazan & Boulakia, Reference Hazan and Boulakia1993), English and Dutch (Flege & Eefting, Reference Flege and Eefting1987b), and English and Greek (Antoniou et al., Reference Antoniou, Best, Tyler and Kroos2010; Antoniou et al., Reference Antoniou, Tyler and Best2012). It has also been explored in studies on vowel perception (Yazawa et al., Reference Yazawa, Whang, Kondo and Escudero2020).
Although the “double phonemic representation” research suggests that bilinguals can modify the phonemic representation of pre-lexical sounds based on linguistic contexts, other models, such as the L2LP model (Van Leussen & Escudero, Reference Van Leussen and Escudero2015), propose that bilinguals’ ability to distinguish between sounds or group them into categories can shift without altering the underlying mental representations. In other words, listeners may refine or adjust their perception of the physical characteristics of speech sounds (such as vowels and consonants) before these sounds are associated with any lexical meaning or phonological representations. Spivey and Marian (Reference Spivey and Marian1999) have similarly suggested that phonetic categories may be more flexible at the processing level rather than at the representational level. This implies that when individuals process speech sounds, their brains actively interpret and integrate acoustic information, potentially adjusting or reorganizing phonetic categories as needed for efficient communication. This adjustment could involve merging previously distinct sounds or differentiating sounds that were once perceived as similar, depending on contextual or language-specific cues.
In the present investigation, by relying on sLORETA and comparing brain activation patterns between groups, we aim to gain insights into how these processing adjustments occur and whether they are associated with specific neural control mechanisms, such as those within the ECN. This approach will help us understand if bilinguals’ brain activity reflects dynamic processing pathways that accommodate linguistic context without necessarily altering the core phonemic representations.
As previously mentioned, speech perception models (McClelland & Elman, Reference McClelland and Elman1986) suggest that prior knowledge or expectations shape the way we perceive speech, refining the perception of individual sounds through top-down processes. The studies discussed above further suggest the presence of a language control mechanism that is sensitive to linguistic contexts, indicating that bilinguals’ ability to manage competing phonological representations is shaped by both their prior linguistic knowledge and the contextual demands, such as activating only one language in monolingual settings or managing both languages in bilingual or code-switching contexts. Unfortunately, only a few studies have explored the cognitive processes underlying phonemic representations across language contexts. These studies use ERPs because, unlike behavioral paradigms, ERPs enable continuous monitoring of the processes between stimuli and responses, allowing for the identification of processing stages that are affected by experimental manipulations (Luck, Reference Luck2014; Shtyrov & Pulvermüller, Reference Shtyrov and Pulvermüller2007). Because ERPs provide millisecond-level temporal precision, it is possible to investigate the speed at which processes associated with auditory discrimination are affected during the perception of speech sounds. These studies used the ERP mismatch negativity (MMN) response, which is commonly used to examine speech discrimination. The MMN is typically elicited by presenting repetitive sounds (standard) and randomly introducing a different (deviant) sound that varies in amplitude, intensity, or phonetic category (Näätänen, Reference Näätänen1992; Shtyrov & Pulvermüller, Reference Shtyrov and Pulvermüller2007). The MMN is observed approximately 200 ms after the onset of deviant sounds, and its amplitude increases as signal discrimination improves (Tiitinen et al., Reference Tiitinen, May, Reinikainen and Näätänen1994). Importantly, the MMN is elicited without requiring participants’ active attention or explicit responses to stimuli, thereby offering an advantage in assessing the effects of language contexts.
García-Sierra et al. (Reference García-Sierra, Ramírez-Esparza, Silva-Pereyra, Siard and Champlin2012) found that the amplitude of the MMN in bilingual Spanish-English speakers was modulated by language contexts. When “ga” and “ka” were presented, the English context elicited a greater MMN amplitude, indicating a more contrastive phonemic perception, whereas the Spanish context produced a smaller MMN amplitude, suggesting the phonemes were perceived as allophones. These findings imply that the auditory discrimination processes differentially “weight” VOT information depending on the linguistic context. In a separate study, Wig and García-Sierra (Reference Wig and García-Sierra2020) relied on predictive coding – a theory suggesting that the brain continuously generates predictions about incoming sensory information based on prior knowledge (Garrido et al., Reference Garrido, Kilner, Stephan and Friston2009) – to interpret their MMN findings. Predictive coding posits that when there is a mismatch between these predictions and the actual sensory input, a prediction error occurs (MMN), prompting the brain to update its predictions to minimize future errors. In the Wig and García-Sierra study, Spanish-English bilinguals and English monolinguals were presented with a series of ten stop consonants with Spanish phonetic characteristics, ranging from prevoiced sounds to short lag sounds (i.e., /da/ to /ta/; respectively). Prevoiced sounds always served as standards, while short-lags functioned as deviant sounds. Participants were required to press a button upon hearing /ta/. Perceptual “errors” were generated by presenting speech sounds with Spanish characteristics in a mismatched English language context (short videos in English). These responses were compared to a matching condition, where the same sounds were presented in a Spanish language context (short videos in Spanish). The results revealed that bilinguals, but not monolinguals, adjusted their conceptual expectations during the early stages of phonetic discrimination (larger MMN during the English language context, i.e., expecting English sounds but perceiving Spanish sounds). This finding indicates that linguistic contexts can establish expectations about the type of phonetic information to anticipate, and when the incoming speech sound deviates from these expectations, a mismatch response is triggered. This interpretation aligns with behavioral studies that show that bilinguals, unlike monolinguals, can be conceptually cued to an imagined language context. That is, bilinguals can “activate” a language without direct exposure to it (García-Sierra et al., Reference García-Sierra, Schifano, Duncan and Fish2021; Gonzales et al., Reference Gonzales, Byers-Heinlein and Lotto2019; Lozano-Argüelles et al., Reference Lozano-Argüelles, Fernández Arroyo, Rodríguez, Durand López, Garrido Pozú, Markovits, Varela, de Rocafiguera and Casillas2021). For instance, in Gonzales et al.’s study, bilinguals were told that the speech sounds to be identified (stop consonants) were produced by either an English or French speaker. The results demonstrated clear shifts in the perception of stop consonants based on the imagined language contexts.
Despite substantial evidence that bilinguals can use linguistic context to adjust their phonetic categories, no study has yet examined the language control mechanisms involved in perceiving speech sounds without lexical information but with overlapping phonological representations between languages. While the above ERP studies offer a valuable method for studying the cognitive aspects of auditory discrimination, the observed amplitude shifts only suggest changes in processing without identifying the specific brain regions involved. The present investigation aims to localize the brain areas associated with differences in auditory discrimination across language contexts, which will allow us to determine whether regions related to the ECN are involved. If the ECN is involved in the processing of speech sounds with overlapping phonological representations, it would suggest that bilinguals actively recruit cognitive control mechanisms to manage competing phonemic categories across languages. This would provide crucial insight into how the brain dynamically adjusts to language context and could offer a neural basis for the ability to resolve phonetic competition. Identifying the brain regions involved would enhance our understanding of the interaction between language control and auditory discrimination, offering a more comprehensive view of bilingual language processing.
2. Goals and overview
The primary goal of this study is to investigate the brain mechanisms involved in bilingual auditory discrimination, focusing on how cognitive control processes modulate speech sound discrimination across different language contexts. Increased brain activity in prefrontal areas, such as the dorsolateral prefrontal cortex (DLPFC), anterior cingulate cortex, and Inferior Frontal Gyrus (IFG), has been observed in bilingual individuals, reflecting their enhanced cognitive control during language processing (Abutalebi et al., Reference Abutalebi, Annoni, Zimine, Pegna, Seghier, Lee-Jahnke, Lazeyras, Cappa and Khateb2007; Green & Abutalebi, Reference Green and Abutalebi2013; Hernandez et al., Reference Hernandez, Dapretto, Mazziotta and Bookheimer2001). In the present study, we aim to localize brain regions associated with these control mechanisms by examining the MMN response, and we will use sLORETA to localize the brain regions associated with these processes. The MMN has been linked to two mechanisms: a sensory memory system with temporal generators (Alho et al., Reference Alho, Winkler, Escera, Huotilainen, Virtanen, Jääskeläinen, Pekkonen and Ilmoniemi1998; Rinne et al., Reference Rinne, Alho, Ilmoniemi, Virtanen and Näätänen2000; Scherg et al., Reference Scherg, Vajsar and Picton1989; Tervaniemi et al., Reference Tervaniemi, Medvedev, Alho, Pakhomov, Roudas, Van Zuijen and Näätänen2000) and a comparator-based mechanism tied to prefrontal areas (Giard et al., Reference Giard, Perrin, Pernier and Bouchet1990; Gomes et al., Reference Gomes, Molholm, Ritter, Kurtzberg, Cowan and Vaughan2000; Maess et al., Reference Maess, Jacobsen, Schröger and Friederici2007; Opitz et al., Reference Opitz, Mecklinger, Friederici and von Cramon1999; Roland, Reference Roland1981, Reference Roland1982). Previous research suggests that MMN frontal generators are lateralized, with the right hemisphere associated with tone paradigms (Levänen et al., Reference Levänen, Ahonen, Hari, McEvoy and Sams1996) and the left hemisphere involved in language paradigms (Näätänen et al., Reference Näätänen, Lehtokoski, Lennes, Cheour, Huotilainen, Iivonen, Vainio, Alku, Ilmoniemi, Luuk, Allik, Sinkkonen and Alho1997; Tervaniemi et al., Reference Tervaniemi, Medvedev, Alho, Pakhomov, Roudas, Van Zuijen and Näätänen2000). These frontal regions are believed to engage in processes that modulate the deviance detection system in the temporal cortex, supporting auditory change detection (Doeller et al., Reference Doeller, Opitz, Mecklinger, Krick, Reith and Schröger2003; Garrido et al., Reference Garrido, Kilner, Stephan and Friston2009).
The present study employs both within-group and between-group designs, in which bilingual and monolingual participants will passively listen to stop consonants while watching movies in both Spanish and English without making any phonetic judgments. The within-group comparison for bilinguals aims to explore whether a language control mechanism is activated by examining their sLORETA responses across different language contexts. In contrast, the between-group comparison with monolinguals will evaluate the differences in brain activation between the two groups, focusing specifically on how bilinguals process speech sounds with competing phonological representations – an issue that monolinguals do not encounter. By comparing both groups and language contexts, we aim to determine whether (1) bilinguals can adjust how they perceive and categorize speech sounds (such as vowels and consonants) without altering their core mental representations, and this flexibility may be supported by the ECN, or (2) bilinguals can refine their perception of speech sounds based on context or experience, even before these sounds are associated with word meanings or stored in the phonological system, with the ECN playing a role in regulating these pre-lexical adjustments. This approach will offer valuable insights into the neural mechanisms that underlie language control in bilingual individuals.
3. Methods
3.1. Participants
Participants were recruited at the University of Connecticut for a large-scale study of bilinguals’ social language interactions and both cortical and subcortical auditory processing. This study involved 74 recruits, yet only 64 participated in the ERP segment. Additionally, 5 participants did not show clear ERP responses, and the final sample comprised 57 normal-hearing students aged 18–23 years. All participants passed a hearing screening at or below 20 dB HL for octave frequencies ranging from 500 to 8000 Hz. Participants completed a language-screening questionnaire to determine whether they were monolingual English speakers (n = 30, including 7 males) or Spanish-English bilinguals (n = 27, including 7 men). Monetary incentives were offered to all the participants.
Bilingual participants reported that their caregivers originated from various areas of Latin America. The language background questionnaire evaluated exposure to and use of Spanish and English from childhood to adulthood. This report includes 26 bilinguals for linguistic background questionsFootnote 1 (one bilingual declined to answer the questionnaire). Questions about exposure were presented on a Likert scale ranging from 1 to 5 (1 = 100% Spanish; 2 = 75% Spanish, 25% English; 3 = 50% Spanish, 50% English; 4 = 25% Spanish, 75% English; and 5 = 100% English). Figure 1 displays violin plots illustrating bilinguals’ language exposure and use from infancy to the time of the experiment. The data shows a distinct transition from predominantly Spanish exposure and usage to predominantly English exposure and use. English monolingual participants reported being raised entirely in English-speaking families, with only incidental exposure to Spanish.

Figure 1. Violin plots for bilingual participants’ language exposure and use from birth to the date of the experiment. White dots represent the median.
Another series of questions measured bilinguals’ present language confidence. These questions were presented separately for English and Spanish on a 1-to-5 Likert scale (1 = I cannot speak the language, have only a few words or phrases, and I cannot create sentences; 5 = I have native-like proficiency with few grammatical errors and strong vocabulary). Bilinguals’ confidence in speaking English averaged 4.92 (SD = 0.276) and in Spanish 4.58 (SD = 0.634). Bilinguals’ confidence in understanding English averaged 5.0 (SD = 0.00) and in Spanish 4.81 (SD = 0.402). Please see Supplementary Tables (S1–S4) for further descriptive questions assessing confidence in hearing, speaking, reading in relation to age and language use with family members and friends.
Bilinguals and monolinguals wore digital recorders for two days in this large-scale study. Another study describes both groups’ everyday activities and language use (e.g., Ramírez-Esparza et al., Reference Ramírez-Esparza, Jiang, García-Sierra, Skoe and Benítez-Barrera2024). We used two-day language recordings of the digital recorders to verify bilinguals’ responses to the language questionnaire. A total of 27 bilinguals and 30 monolinguals were analyzed. Coders examined pre-selected and randomized speech-active parts from the digital recorders’ audio files (Ramírez-Esparza et al., Reference Ramírez-Esparza, Jiang, García-Sierra, Skoe and Benítez-Barrera2024). Bilinguals spoke 57% (SD = 16) of the time, and monolinguals spoke 62% (SD = 21) of the time. A t-test showed no significant differences in the amount of time that either group spoke t(54) = 1.031, p > .05, 95% CI [−5, 15]. Bilinguals spoke 50.2% in English, 3.7% in Spanish, and 3% code-switched. Despite speaking Spanish, bilinguals’ dominant language was English at the time of the experiment, as evidenced by the digital recordings and linguistic background questionnaire.
The English-Spanish NIH Peabody picture vocabulary test (PPVT) assessed vocabulary (Gershon et al., Reference Gershon, Cook, Mungas, Manly, Slotkin, Beaumont and Weintraub2014). This computer-adaptive receptive vocabulary test customizes questions based on prior responses. Participants are presented with an audio recording of a word and four images on a computer screen, and they are asked to select the image that best represents the word. The TPVT evaluates vocabulary abilities that are more dependent upon past learning experiences and are consistent across the lifespan. Age-adjusted scores over 100 indicate normal vocabulary (Dunn, Reference Dunn1997). Twenty-five bilinguals were assessed using the PPVT in both languages. The age-adjusted score for English was 107.76 (SD = 13.5), and for Spanish, it was 107.32 (SD = 18.0).
Overall, the bilinguals recruited in this study exhibit a typical language development pattern that is commonly observed in numerous young bilingual individuals residing in the United States. That is, they are exposed to languages other than English throughout infancy and early childhood, but as they grow and attend school, English becomes their dominant language (Kohnert et al., Reference Kohnert, Bates and Hernandez1999). This group of bilinguals is referred to as heritage bilinguals (Valdés, Reference Valdés2005).
3.2. Stimuli
We employed a shorter (190 ms) version of the stimuli (“ga” and “ka”) used by García-Sierra et al. (Reference García-Sierra, Diehl and Champlin2009, Reference García-Sierra, Ramírez-Esparza, Silva-Pereyra, Siard and Champlin2012) to capture brainstem potentials and ERPs simultaneously. Here, we report only ERPs, while ABRs are reported. ASL software from the Computerized Speech Lab (CSL) system was used to create stimuli using the cascade method (Klatt, Reference Klatt1980). “KA” (standard stimulus) had a + 50 ms VOT, while “GA” (deviant stimulus) had a + 15 ms VOT. Formant transitions were linearly interpolated from velar stop consonant values (180, 1725, 1725, 3200, and 3500 Hz for F1-F5) to vowel /a/ values (750, 1200, 2450, 3200, and 3500 Hz for F1-F5). Simulating consonant release required a 10-ms turbulent noise source at 60 dB amplitude. Aspiration was simulated using an aspiration source (AH) at 62 dB after consonant release and before vowel onset. The initial 100 ms formant transition is interpolated between 45 and 65 dB from F0. Insert earphones (Etymotic ER3C) delivered stimuli at approximately 67–68 LAeq dB.
3.3. Language contexts
Data collection involved presenting movies in the target language, with a Spanish-language film shown during the Spanish-language context and an English-language film during the English-language context. This approach was applied to both the bilingual and monolingual participants. To mitigate potential order effects, a single language context was established for each experimental session, with sessions separated by a minimum of three days. Additionally, the language context sessions were counterbalanced. To control the influence of lip movements on speech perception, cartoons were used in the experimental setup (Yoshida et al., Reference Yoshida, Iversen, Patel, Mazuka, Nito, Gervain and Werker2010). The experimental design encompassed both the movie and stimulation blocks. Each session began with a movie block in which participants watched a film in the target language, set at a comfortable audio level. During the stimulation blocks, the audio of the movie was entirely muted and subsequently reinstated to a comfortable level. A critical aspect was the continuous display of captions throughout both the movie and stimulation blocks. The movie blocks had a duration of approximately 90 s and the stimulation blocks lasted 60 s. The protocol involved 12 alternating cycles of the two block types. Each stimulation block consisted of 80 standard and 20 deviant sounds, in addition to a set of 10 stimuli introduced for familiarization before each stimulation block, which were excluded from the final data average. The entire duration of each language context session lasted approximately 30 min. A detailed visual representation of this setup is shown in Figure S1 of the supplementary materials.
3.4. Electroencephalogram (EEG)
An elastic cap with 64 electrodes positioned according to the worldwide 10/10 system was placed on the participants’ scalps, and a camera (Cap Track Brain Vision) recorded the electrodes’ x, y, and z coordinates. The anatomical markers for electrode digitization were the nasion and tragi of both ears. The digitized electrode locations were used to create sLORETA files. The EEG was referenced to FCz in DC mode and re-referenced offline to an average reference for analysis. ActiChamp amplifiers (24-bit A/D converter) recorded the electroencephalogram, and StimTrack (Brain Vision) delivered clicks (1 ms) for each stimulus. Offline filters at 0.10 Hz (6 dB/oct forward) and 40 Hz (12 dB/oct zero phase) were implemented. BESA Research 7.1 procedures (BESA GmbH, Gräfelfing, Germany) corrected eyeblinks measured by Fp1 and Fp2 electrodes. EEG segments with electrical activity exceeding ±100 mV were eliminated from the final average, and electrode impedances were maintained below 5 kΩ. ERPs were averaged offline from 470 ms EEG segments with a 100 ms pre-stimulus baseline period. Baseline correction was performed in relation to the pre-stimulus time.
A monolingual English speaker spoke English to both groups to ensure equal language exposure. A shielded, soundproof booth was used to collect the data. The /ka/ and /ga/ sounds were always standard and deviant stimuli, respectively, and the presentation of these stimuli was pseudo-randomized (i.e., 960 standard and 240 deviant sounds were played). Deviant sounds never occurred consecutively, and at least three consecutive standard sounds preceded them. The final average included 720 standard sounds, excluding those occurring after a deviant. The inter-stimulus interval (offset-to-onset) was 380 ms. Participants were told to focus on the movie and ignore the stimuli. The MMN was calculated by subtracting the response to standard sounds from the response to deviant sounds (deviant minus standard).
3.5. Accepted epoch count in ERP averages
The number of accepted epochs for monolinguals for the standard sound was 712.94 (SD = 26.55) during the English language context and 718.6 (SD = 9.19) during the Spanish language context. For the deviant sound, the number of accepted epochs for monolinguals was 233.77 (SD = 9.19) during the English language context and 235.48 (SD = 7.51) during the Spanish language context. The number of accepted epochs for bilinguals for the standard sound was 710.10 (SD = 25.06) during the English language context and 702.15 (SD = 46.67) during the Spanish language context. For the deviant sound, the number of accepted epochs for bilinguals was 233.40 (SD = 8.86) during the English language context and 230.77 (SD = 15.03) during the Spanish language context.
3.6. Statistical analysis for ERPs
Data-driven analyses tested the presence of the MMN (standard ERP versus deviant ERP) and its amplitude modulation (deviant minus standard) between language contexts. BESA Statistics 2 (BESA GmbH, Gräfelfing, Germany) was used for permutation testing and data clustering to analyze the ERP amplitudes. This multi-step approach assumes that statistical effects observed over extended periods and across adjacent channels are unlikely to be coincidental. Initially, a parametric test identifies time periods with pronounced effects, and the t-values in these regions are summed to form cluster values. Each region with a substantial effect undergoes this process, representing cluster values in both time and space. Thus, a large cluster value indicates a considerable difference in the time domain across numerous surrounding electrodes, whereas a smaller cluster value indicates a significant difference in one or a few neighboring electrodes. This study used a 4.5 cm channel neighbor distance. Subsequently, BESA repeats step 1 using permutation tests (10,000 in this case). This test determines whether the cluster value probabilities across experimental conditions (or participants) are interchangeable. Consequently, all permutations contribute to a cluster value distribution and directly ascertain the α-error of the initial cluster value from step 1. In other words, this process determines whether the initial cluster value obtained in step 1 is as probable as any cluster value from other permutation tests. This type of analysis is performed to control for Type I errors due to the large number of data points in ERP responses (see: Bullmore et al., Reference Bullmore, Suckling, Overmeyer, Rabe-Hesketh, Taylor and Brammer1999; Ernst, Reference Ernst2004; Maris & Oostenveld, Reference Maris and Oostenveld2007). Importantly, since ERP data are reported in time and space, the ERP time range in which a substantial cluster is found varies among channels. Therefore, the significant difference observed at electrode “x” will be similar but not identical to that at electrode “y.” Unless otherwise specified, we used two-tailed paired and independent t-tests for within- and between-group comparisons.
3.7. Statistical analysis for sLORETA
ERPs were used to calculate neural generators using standardized sLORETA (Pascual-Marqui, Reference Pascual-Marqui2002; Pascual-Marqui et al., Reference Pascual-Marqui, Michel and Lehmann1994). sLORETA is a distributed source modeling method that makes assumptions regarding the distribution rather than the number of current source densities. Unlike dipole-fitting approaches, sLORETA does not require a priori localization information. sLORETA computes the smoothest possible three-dimensional (3D) current distribution in the brain that generates the observed scalp field. The sLORETA algorithms calculate current density values (in amperes per square meter; A/m2) for 6,239 gray matter voxels of the brain compartment, each with a spatial resolution of 5 mm x 5 mm x 5 mm. Anatomical regions are labeled in accordance with the probabilistic MNI-152 template from the Brain Imaging Center of the Montreal Neurological Institute (MNI; Mazziotta et al., Reference Mazziotta, Toga, Evans, Fox, Lancaster, Zilles, Woods, Paus, Simpson, Pike, Holmes, Collins, Thompson, MacDonald, Iacoboni, Schormann, Amunts, Palomero-Gallagher, Geyer and Mazoyer2001) and the Co-Planar Stereotaxic Atlas of the Human Brain (Lancaster et al., Reference Lancaster, Woldorff, Parsons, Liotti, Freitas, Rainey, Kochunov, Nickerson, Mikiten and Fox2000; Talairach & Tournoux, Reference Talairach and Tournoux1988). The validity of sLORETA has been confirmed in several studies, including those combining EEG and fMRI (see Vitacco et al., Reference Vitacco, Brandeis, Pascual-Marqui and Martin2002). We used statistical non-parametric mapping (SnPM) (Nichols & Holmes, Reference Nichols and Holmes2002) to compute voxel-by-voxel with 10,000 permutations for within- or between-group comparisons. Comparisons of current density distribution, both between and within groups, were conducted using the log-F-ratio statistic. The results are represented as maps of the log-F-ratio statistics for each voxel corrected for p < 0.05. Importantly, the SnPM approach corrects for multiple comparisons and does not require Gaussianity assumptions. Identical protocols were employed in conducting correlations; however, the results are reported in terms of Pearson r-values.
4. Results
Initial analyses confirm the presence of the MMN. We compared standard and deviant ERP responses and MMN polarity inversion between mastoid electrodes and frontal electrodes (Alho, Reference Alho1995). Both groups exhibited significant MMN responses and polarity inversion in both language contexts. Bilingual ERP figures in both language contexts are shown in the supplementary section (Figure S2) with corresponding analyses (Analysis 1). Monolingual ERP figures in both language contexts are similarly presented in the supplementary section (Figure S3 and Analysis 2).
The analyses of MMN amplitude modulation and sLORETA are organized as follows: Initially, the entire MMN response window (−100 to 470 ms) is analyzed for MMN amplitude modulation in both language contexts, separately for bilinguals and monolinguals. Subsequently, we compare MMN across language contexts for both groups. In the sLORETA analysis, we report MMN time windows showing significant MMN differences. First, sLORETA is compared between language contexts for each group, and then sLORETA comparisons are made across language contexts for both groups. Finally, the sLORETA values are correlated with bilinguals’ language dominance shifts.
4.1. MMN responses in different language contexts
4.1.1. Comparison of bilinguals’ MMN response between language contexts
Two clusters appeared in the MMN comparisons of the bilinguals. Custer 1’s cluster value was −2284.83 between 138 and 333 ms following stimulus onset for electrodes Fz, F3, F7, FC5, FC1, C3, T7, CP5, Cz, C4, FC2, F4, AF7, AF3, F1, F5, FT7, FC3, C1, C5, CP3, CPz, C2, and FC4. The cluster value exhibited a significantly different probability distribution between the language contexts (p = 0.02). This finding strongly suggests a more negative MMN response in the English language context (mean = −.405 μV, SD = .351) than in the Spanish language context (mean = −.021 μV, SD = .354). Cluster 1 data-driven analysis is illustrated on the left side of Figure 2A. The voltage map shows a negative voltage distribution in the left frontal and central electrodes.

Figure 2. Visualization of the data-driven analysis between language contexts for bilinguals (A) and monolinguals (B). The left side of Section A shows bilinguals’ first cluster. The blue shaded areas represent the time intervals where the English mismatch negativities (MMNs) showed a more significant negative amplitude compared to the Spanish MMN. Bilinguals’ second cluster is shown on the left side of Section A. The red shaded areas represent the time intervals where the English MMNs showed a more significant positive amplitude compared to the Spanish MMN. The electrodes showing significant differences are displayed with rectangular boxes (*p < .02 for cluster 1 and + p < .03 for cluster 2). The voltage maps presented in section A represent voltage fluctuations for the difference between the MMNs obtained in both language contexts (English MMN minus Spanish MMN) at approximately 200 ms for cluster 1 (red line) and at approximately 400 for cluster 2. The data-driven analysis did not show significant differences for monolinguals (2-B).
Cluster 2 was observed between 355 and 469 ms after stimulus onset, with a cluster value of 1832 for electrodes Fz, FC1, Cz, C4, T8, FT10, FC6, FC2, F4, F8, AFz, F1, CP4, C6, C2, FC4, FT8, F6, AF8, AF4, and F2. The cluster value exhibited a significantly different probability distribution between the language contexts (p = 0.03). Cluster 2 shows that bilinguals’ different waveform in the investigated time range were more positive in the English language context (mean = .143 μV, SD = .455) than in the Spanish language context (mean = −.314 μV, SD = .416). The data-driven examination of bilinguals’ difference waveforms between the language contexts is depicted on the right side of Figure 2A. The voltage map shows a positive voltage distribution in the right-frontal and central electrodes.
4.1.2. Comparison of monolinguals’ MMN response between language contexts
No significant differences in cluster values between language contexts were observed. Figure 2B presents a visualization of the data-driven analysis of monolinguals’ MMN between language contexts.
4.2. Comparison between bilinguals’ and monolinguals’ MMN responses in different language contexts
4.2.1. English-language context
A cluster value of −3518.01 was found between 186 and 454 ms. The cluster value showed a different probability distribution between bilinguals and monolinguals (p = .026) for electrodes F3, F7, FC5, FC1, C3, T7, CP5, CP1, Pz, CP2, Cz, C4, FC2, F4, AF3, F1, F5, FT7, FC3, C1, C5, CP3, CPz, C2, FC4, and F2. This strongly suggests that bilinguals’ MMN was more negative (mean = −.530 μV, SD = .332) than monolinguals’ MMN (mean = −.110 μV, SD = .300). The visualization of the data-driven analyses for bilinguals and monolinguals in the English language context is depicted on the left-hand side of Figure 3. The voltage map shows negative voltage in the left-frontal and central electrodes.

Figure 3. Data-driven analyses comparing both groups’ mismatch negativities (MMNs) between language contexts. The English language context (left side) shows a larger MMN for bilinguals when compared with monolinguals. The voltage maps represent voltage fluctuations for the difference between both groups’ MMNs (bilinguals minus monolinguals) at approximately 200 ms (red line). The voltage map shows negative values in central and left frontal electrodes. The Spanish language context (right side) shows a larger MMN for bilinguals when compared with monolinguals. The voltage maps represent voltage fluctuations for the difference between both groups’ MMNs (bilinguals minus monolinguals) at approximately 300 ms (red line). The voltage map shows positive values in left frontal electrodes.
4.2.2. Spanish-language context
A cluster value of 2768.9 was observed between 237 and 469 ms. The cluster value showed a different probability distribution between bilinguals and monolinguals (p = .026) for electrodes FP1, F3, F7, FT9, FC5, AF7, AF3, F5, FT7, and C5. This strongly suggests that bilinguals’ difference waveform in the investigated time range was more positive (mean = .329 μV, SD = .417) than monolinguals’ difference waveform (mean = −.133 μV, SD = .307). The right side of Figure 3 presents the visualization of the data-driven analysis for bilinguals and monolinguals in the Spanish language context. The voltage map shows a positive voltage distribution in the left frontal electrodes.
4.3. sLORETA between language contexts
We analyzed the current source density of the MMN using sLORETA. We explored the MMN time regions that showed significant differences between language contexts and between groups.
4.3.1. Bilinguals
Bilinguals’ MMN responses in different language contexts showed two clusters. Cluster 1 was observed in the MMN time window between 138 and 333 ms. sLORETA analysis showed a significant difference in the averaged time region between 200 and 333 ms following stimulus onset. The MMN response involved three left frontal brain areas with more cortical activity during the English language context than during the Spanish language context. These areas were the Superior Frontal Gyrus (SFG) (BA 11), Medial Frontal Gyrus (MFG) (BA 10), and Orbital Gyrus (BA 10). Figure 4A displays the coordinates with the highest log-F-ratio values (1.23, p < .05; 1.34, p < .01). Please refer to Table S5 in the supplementary materials for the full list of MNI and Talairach coordinates.

Figure 4. Bilinguals’ current source densities between different language contexts. Panel A shows significant differences in BA 10 and 11 in the time frame of 200 to 333 ms post-stimulus onset. Panel B highlights significant differences specifically in BA 11, within the time range of 355 to 465 ms after stimulus onset. The areas of the brain that exhibited statistically higher activation during English language tasks, as compared to Spanish, are indicated in yellow.
Cluster 2 was observed in the MMN time window between 355 and 469 ms. sLORETA showed a significant difference in the average time region between 355 and 465 ms after stimulus onset. The MMN response engaged one left frontal brain area with more cortical activity in the English language context than in the Spanish language context. The brain region was located in the left SFG (BA 11). Figure 4B shows the coordinates with the highest log-F-ratio values (1.35, p < .05; 1.47, p < .01). Please refer to Table S6 in the supplementary materials for a full list of MNI coordinates and Talairach coordinates.
4.3.2. Monolinguals
Although monolinguals did not exhibit significant MMN amplitude differences between language contexts, we proceeded to conduct sLORETA analyses in the region where MMN is typically observed (from 200 to 300 ms post-stimulus onset). The results showed no significant difference. This indicates that the brain regions involved in speech processing exhibited similar levels of activation in both language contexts.
4.3.3. Bilinguals’ versus monolinguals’ sLORETA in the English language context
We explored the MMN time intervals with significant differences between the groups, particularly in the 186–454 ms interval. Nevertheless, given our prior findings that only bilinguals exhibited significant differences between language contexts, we executed a one-tailed log-F-ratio statistical analysis for independent groups. We found a significant difference in the average time region between 200 and 260 ms after stimulus onset. The results showed that the current source densities associated with MMNs involved frontal brain structures with more pronounced cortical activation in bilinguals than in monolinguals (MFG; BA 10, p < .05). Figure 5A shows the log-F-ratio values with significant differences (1.37, p < .05; 1.52, p < .01) and depicts the coordinates with the highest log-F-ratio value. Please refer to Table S7 in the supplementary materials for a full list of MNI coordinates and Talairach coordinates associated with the MFG.

Figure 5. (A) Current source densities between bilinguals and monolinguals during the English language context. Significant differences were found in the averaged time region between 200 and 260 ms after stimulus onset for BA 10. (B) Current source densities between bilinguals and monolinguals during the Spanish language context. Significant differences were found in the averaged time region between 350 and 450 ms after stimulus onset. The frontal activation represents BA 6 and BA 8. The right posterior activation represents BA 3 and BA 4. Yellow coloring depicts brain structures with statistically larger activation in bilinguals when compared to monolinguals.
4.3.4. Bilinguals’ versus monolinguals’ sLORETA in the Spanish language context
We examined the MMN time intervals in which the groups differed significantly (237–469 ms). We detected significant differences between 350 and 450 ms after the stimulus onset. MMN source densities involved frontal and parietal brain areas with higher cortical activation in bilinguals than in monolinguals. These brain regions were the MFG (BA 8), Postcentral Gyrus (BA 3), Precentral Gyrus (BA 4), and SFG (BA 6). Figure 5B displays the coordinates with the highest log-F-ratio value (1.12, p < .05; 1.25, p < .01). Please refer to Table S8 in the supplementary materials for a full list of MNI coordinates and Talairach coordinates.
4.3.5. Correlating sLORETA values with language shift across the lifespan
We investigated the correlation between bilinguals’ source densities in speech processing and shifts in language usage from early childhood to the time of the experiment. To measure the reduction in Spanish usage, we calculated independent averages of the reported percentages for speaking and hearing Spanish during ages 0–3 and at the age of participation in the experiment. We then determined the change by computing the difference between the two averages. A positive value signifies a notable reduction in Spanish use from early childhood to the experimental period.
Parallel to our prior sLORETA analysis, we examined the MMN time interval (138–333 ms) for significant language context differences in the MMN response. This analysis included 26 bilinguals. The results revealed a positive and significant correlation between reduced Spanish usage and source densities in the left (IFG) (BA 44; MNI -60, 5, 20; Talairach −59, 6, 18, and BA 9; MNI −60, 5, 25; Talairach −59, 6, 23). This correlation was notable between 267 and 305 ms post-stimulus onset and is illustrated in Figure S4 in the Supplementary Materials, which displays the Pearson r-values (.69, p < .05). Correlation analysis revealed that bilingual individuals who underwent a significant transition from primarily using Spanish during early childhood to predominantly employing English in young adulthood exhibited more pronounced activation in the IFG during the English language context than in the Spanish context.
5. Discussion
The present study investigated how bilinguals perceive speech sounds with competing phonemic representations across languages, such as the same sound representing different phonemes in Spanish and English. By presenting these sounds in both Spanish and English language contexts, we aimed to determine whether control mechanisms, such as those related to the ECN, were involved in their perception, even when only one language was being used. Using ERPs to measure brain activity and sLORETA to localize the brain regions associated with these processes, this study sought to provide insight into how phonemic competition is managed in bilinguals. While behavioral research has demonstrated that language context influences phonemic categorization, no study had previously investigated whether active language control mechanisms are engaged during the perception of speech sounds with competing phonological representations in the absence of lexical meaning. This investigation aimed to clarify the role of neural control mechanisms in managing competing phonological representations across languages.
5.1. Brain regions involved in phonetic categorization between language contexts
Our sLORETA results demonstrate that bilinguals show significantly greater activation in the left MMN frontal generators (Brodmann areas 10 and 11: frontopolar cortex and orbitofrontal cortex, respectively) during the English language context compared to the Spanish context. The observed activation in the left frontal cortex aligns with previous studies linking MMN prefrontal generators to a comparator-based mechanism (Giard et al., Reference Giard, Perrin, Pernier and Bouchet1990; Gomes et al., Reference Gomes, Molholm, Ritter, Kurtzberg, Cowan and Vaughan2000; Maess et al., Reference Maess, Jacobsen, Schröger and Friederici2007). This mechanism plays a critical role in modulating the deviance detection system in the temporal cortices, thereby enhancing auditory change detection and control (Doeller et al., Reference Doeller, Opitz, Mecklinger, Krick, Reith and Schröger2003; Garrido et al., Reference Garrido, Kilner, Stephan and Friston2009). In contrast, monolinguals did not show a significant difference in brain activation between language contexts, suggesting that this heightened activation in bilinguals may reflect the additional demands of managing two phonological systems during speech perception. We propose that the activation of the frontal MMN generators is linked to the involvement of the ECN in regulating language-specific processing.
Language selection models vary in their assumptions about the mechanisms underlying language selection. Some models propose that the non-target language must be inhibited (Green, Reference Green1998; Green & Abutalebi, Reference Green and Abutalebi2013; Kroll et al., Reference Kroll, Bobb, Misra and Guo2008; van Heuven et al., Reference van Heuven, Schriefers, Dijkstra and Hagoort2008), while others suggest raising the threshold of the selected language (Blanco-Elorrieta & Caramazza, Reference Blanco-Elorrieta and Caramazza2021; Grosjean, Reference Grosjean and Nicol2001) or maintaining distinct resting levels of activation for each language (Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002; Dijkstra et al., Reference Dijkstra, Van Jaarsveld and Brinke1998). The heightened activity observed in the left frontal brain area of bilinguals in our study can be interpreted in several ways. It may stem from an inhibitory process controlled by frontal areas or function as a mechanism to elevate the English phonetic category (Blanco-Elorrieta & Caramazza, Reference Blanco-Elorrieta and Caramazza2021; Grosjean, Reference Grosjean and Nicol2001).
However, the frontal activation of Brodmann areas 10 and 11 falls within the domain of executive control, which serves to minimize interference from the non-target language. The prefrontal cortex (PFC), a part of the ECN, is involved in various executive functions such as working memory (Smith & Jonides, Reference Smith and Jonides1999), controlled semantic retrieval (Badre et al., Reference Badre, Poldrack, Paré-Blagoev, Insler and Wagner2005; Brian et al., Reference Brian, David, Sara, David, Charles and Anders2006; Gold & Buckner, Reference Gold and Buckner2002), phonological retrieval (Poldrack et al., Reference Poldrack, Wagner, Prull, Desmond, Glover and Gabrieli1999), inhibition of automatic responses, attentional control, planning, and cognitive flexibility to switch between different goals (see Niendam et al., Reference Niendam, Laird, Ray, Dean, Glahn and Carter2012). The ECN comprises the left middle and superior frontal gyri, inferior frontal and orbitofrontal gyri, superior and inferior parietal regions, angular gyri, precuneus, inferior and middle temporal gyri, left thalamus, and right crus (Botvinick et al., Reference Botvinick, Braver, Barch, Carter and Cohen2001; Ridderinkhof et al., Reference Ridderinkhof, Ullsperger, Crone and Nieuwenhuis2004; Shen et al., Reference Shen, Welton, Lyon, McCorkindale, Sutherland, Burnham, Fripp, Martins and Grieve2020; Koechlin et al., Reference Koechlin, Ody and Kouneiher2003; Miller & Cohen, Reference Miller and Cohen2001; Rodriguez-Fornells et al., Reference Rodriguez-Fornells, Rotte, Heinze, Nösselt and Münte2002). Given these perspectives, we posit that the left frontal MMN mechanisms observed are linked to the ECN.
5.2. Integrating the executive control network into speech perception models of second language acquisition
Our findings extend, rather than contradict, existing models of second language speech perception, particularly the SLM (Flege, Reference Flege and Strange1995; Flege et al., Reference Flege, Schirru and MacKay2003) and the L2 Linguistic Perception (L2LP) model (Escudero, Reference Escudero2005). While both models address the interaction between a learner’s L1 and L2, they differ in how they conceptualize language activation and contextual influences. The SLM emphasizes the coexistence of L1 and L2 phonetic categories within a shared phonetic space (Lindblom, Reference Lindblom, Hardcastle and Marchal1990), where assimilation or dissimilation processes help accommodate new sounds while maintaining phonetic distinctions (Flege et al., Reference Flege, Munro and Skelton1992; Flege & Eefting, Reference Flege and Eefting1988; Flege et al., Reference Flege, Schirru and MacKay2003; Mack, Reference Mack and Nelde1990). However, it does not explicitly account for how linguistic context modulates the activation levels of each language.
In contrast, the L2LP model introduces the Language Mode Activation Hypothesis (Escudero & Yazawa, Reference Escudero, Yazawa and Amengual2024), which builds on Grosjean’s Language Mode Framework (Reference Grosjean1998, Reference Grosjean and Nicol2001, Reference Grosjean2008). This hypothesis posits that L2 learners can activate both L1 and L2 perceptual systems to varying degrees depending on contextual demands, supporting a flexible, dynamic interplay between the two languages. The present investigation empirically demonstrates that bilinguals rely on control mechanisms (i.e., ECN) even when processing speech sounds without lexical information, supporting the L2LP framework. These findings confirm that language control occurs at both pre-lexical and lexical stages, reinforcing the idea that bilingual selection operates dynamically, independent of word meaning.
Further supporting this perspective, Van Leussen and Escudero (Reference Van Leussen and Escudero2015) distinguished between pre-lexical (processing) and lexical (representational) stages in speech perception, arguing that shifts in phonetic categorization occur at the processing level rather than through permanent changes in representation. This aligns with findings from Spivey and Marian (Reference Spivey and Marian1999), who showed that spoken input in one of a bilingual’s languages can automatically activate both mental lexicons in parallel, even in monolingual contexts. Similarly, Ju and Luce (Reference Ju and Luce2004) found that bilinguals adjust their processing pathways dynamically, activating lexical representations from both languages when encountering language-specific VOTs.
Expanding on this, Marian and Spivey (Reference Marian and Spivey2003) proposed that bilingual language processing involves simultaneous activation of both languages, allowing bilinguals to flexibly manage activation levels and competition effects. Their model suggests that processing pathways adapt dynamically based on both bottom-up (stimulus-driven) and top-down (contextual or task-related) factors. This flexibility is further explained by the Adaptive Control Hypothesis (Green & Abutalebi, Reference Green and Abutalebi2013), which outlines a spectrum of cognitive control processes, such as goal maintenance, selective inhibition, and task switching. These processes operate across multiple levels, from sub-lexical phonetic elements to full lexical representations, ensuring that bilinguals can suppress interference from the non-target language when needed.
5.3. Brain regions involved in phonetic categorization between groups and language contexts
5.3.1. English language context
The sLORETA analysis comparing bilinguals and monolinguals in the English language context revealed that bilinguals exhibited stronger cortical activation. Specifically, bilinguals showed significantly greater activation in the MFG between 200 and 260 ms after stimulus onset. This suggests that bilinguals recruit more neural resources in frontal regions when processing English speech sounds, reflecting the engagement of the ECN. Such enhanced activation in bilinguals, compared to monolinguals, has been reported in previous studies and is thought to reflect increased cognitive effort (Kovelman et al., Reference Kovelman, Baker and Petitto2008a; Kovelman et al., Reference Kovelman, Shalinsky, Berens and Petitto2008b; Parker Jones et al., Reference Parker Jones, Green, Grogan, Pliatsikas, Filippopolitis, Ali, Lee, Ramsden, Gazarian, Prejawa, Seghier and Price2012; Palomar-García et al., Reference Palomar-García, Bueichekú, Ávila, Sanjuán, Strijkers, Ventura-Campos and Costa2015; Román et al., Reference Román, González, Ventura-Campos, Rodríguez-Pujadas, Sanjuán and Ávila2015; Wang et al., Reference Wang, Xiang, Vannest, Holroyd, Narmoneva, Horn, Liu, Rose, deGrauw and Holland2011).
This pattern of activation is particularly noteworthy, as it aligns with concepts from the SLM (Flege, Reference Flege and Strange1995, Reference Flege, Schirru and MacKay2003), which posits a shared phonetic space from which assimilation or dissimilation can occur. The greater activation in the frontal regions of bilinguals compared to monolinguals in the English language context may reflect the phenomenon of dissimilation, where a new phonetic category for an L2 sound is fully established to maintain phonetic contrast with L1. This separation, driven by the tendency of phonetic categories to minimize overlap (Lindblom, Reference Lindblom, Hardcastle and Marchal1990), enhances the distinction between L2 sounds and their L1 counterparts. Crucially, the heightened ECN activation observed in bilinguals aligns with Flege’s concept of overshoot, in which bilinguals may exaggerate or overrealize certain L2 phonetic contrasts as they work to differentiate them from L1 categories (Flege & Eefting, Reference Flege and Eefting1988; Mack, Reference Mack and Nelde1990). This increased neural engagement suggests that bilinguals recruit additional cognitive control mechanisms to reinforce phonetic distinctions, further supporting the idea that dissimilation is not solely a perceptual or articulatory phenomenon but also engages executive control processes at a neural level.
Our findings align with Kovelman et al. (Reference Kovelman, Shalinsky, Berens and Petitto2008b), who reported greater activation in frontal regions, such as the DLPFC and inferior frontal cortex (IFC), in bilinguals compared to monolinguals. Their study, which examined highly proficient Spanish-English bilinguals, demonstrated that bilinguals recruit additional cognitive control mechanisms to navigate dual-language contexts. This “bilingual signature” supports our observed MFG activity, reinforcing the idea that bilinguals actively engage additional neural resources to process speech sounds with competing phonological representations. These findings also complement the SLM (Flege, Reference Flege and Strange1995, Reference Flege, Schirru and MacKay2003), particularly the notion of dissimilation, where distinct phonetic categories emerge to minimize overlap between languages.
5.3.2. Spanish language context
The comparison between bilinguals and monolinguals in the Spanish language context is particularly significant, as it highlights the brain regions involved in within-category perception for bilinguals and between-category perception for monolinguals. This comparison is crucial for determining whether (1) the ECN provides the necessary flexibility to sustain and regulate parallel activation and adjustments in processing pathways, or (2) the ECN is responsible for managing competition between overlapping phonological representations.
The results differed significantly from those observed in the English language context. In the Spanish context, sLORETA analyses revealed enhanced brain activity in bilinguals compared to monolinguals in several regions: SFG (BA 6), MFG (BA 8), Precentral Gyrus (PreCG; BA 4), and Postcentral Gyrus (PCG; BA 3). Notably, while the activation sites differ from those observed in the English language context, the regions identified in the Spanish context are integral to the ECN. Specifically, SFG activation, a component of the ECN, has been documented in bilingual language control studies (Abutalebi et al., Reference Abutalebi, Annoni, Zimine, Pegna, Seghier, Lee-Jahnke, Lazeyras, Cappa and Khateb2007; Abutalebi et al., Reference Abutalebi, Della Rosa, Green, Hernandez, Scifo, Keim, Cappa and Costa2011; Garbin et al., Reference Garbin, Costa, Sanjuan, Forn, Rodriguez-Pujadas, Ventura, Belloch, Hernandez and Ávila2011; Liu et al., Reference Liu, Jiao, Li, Timmer and Wang2021; Marian et al., Reference Marian, Bartolotti, Rochanavibhata, Bradley and Hernandez2017; Marian et al., Reference Marian, Chabal, Bartolotti, Bradley and Hernandez2014; Perani et al., Reference Perani, Abutalebi, Paulesu, Brambati, Scifo, Cappa and Fazio2003; Rodríguez-Pujadas et al., Reference Rodríguez-Pujadas, Sanjuán, Ventura-Campos, Román, Martin, Barceló, Costa and Avila2013; Shen et al., Reference Shen, Welton, Lyon, McCorkindale, Sutherland, Burnham, Fripp, Martins and Grieve2020; Sulpizio et al., Reference Sulpizio, Del Maschio, Fedeli and Abutalebi2020; Geng et al., Reference Geng, Guo, Rolls, Xu, Jia, Zhou, Blakemore, Tan, Cao and Feng2023). Similarly, MFG involvement in bilingual language control and decision-making processes has been established (Garbin et al., Reference Garbin, Costa, Sanjuan, Forn, Rodriguez-Pujadas, Ventura, Belloch, Hernandez and Ávila2011; Perani et al., Reference Perani, Abutalebi, Paulesu, Brambati, Scifo, Cappa and Fazio2003; Shen et al., Reference Shen, Welton, Lyon, McCorkindale, Sutherland, Burnham, Fripp, Martins and Grieve2020; Sulpizio et al., Reference Sulpizio, Del Maschio, Fedeli and Abutalebi2020). Moreover, Burzynska et al. (Reference Burzynska, Nagel, Preuschhof, Gluth, Bäckman, Li, Lindenberger and Heekeren2012) found that cortical thickness in the executive network, particularly in left and right MFG (right BA 9 and 46; left BA 8 and 9), the right PreCG (BA 4 and 6), and the left and right PCG (BA 2), among other regions, was a significant predictor of executive function. This was measured using the computerized Wisconsin Card Sorting Test, developed by Heaton et al. (Reference Heaton, Chelune, Talley, Kay and Curtiss1993). De Sanctis et al. (Reference De Sanctis, Gomez-Ramirez, Sehatpour, Wylie and Foxe2009), using source localization analysis, also identified PreCG (BA 4), MFG (BA 6), and SFG (BA 6), among other brain regions, as being associated with the preservation of high levels of executive functioning. Furthermore, PCG activation has been observed in visual word paradigms. For example, Righi et al. (Reference Righi, Blumstein, Mertus and Worden2010) explored the neural underpinnings of phonological onset competition using an eye tracking paradigm combined with fMRI, finding enhanced brain activation in typical executive control areas, including the left PCG (BA 3,40, and 22), during the target versus competitor condition.
We propose that the differing patterns of brain activation observed between groups in the Spanish language context are due to the ECN engaging distinct processes in response to varying perceptual demands. Specifically, when perceptual elements need to be enhanced for more contrastive perception or diminished for less contrastive perception, the ECN may employ different processing strategies. These findings suggest that bilinguals are capable of dynamically adjusting processing pathways based on the linguistic context. And hence the results favor the idea that the ECN provides the necessary flexibility to maintain and modulate parallel activation and adjustments in processing pathways. If the ECN’s function were limited solely to managing phonological representations, we would expect bilinguals and monolinguals to exhibit similar brain activation patterns in the Spanish language context, since their processing would parallel that of monolinguals who do not experience competition between overlapping phonological systems. In other words, if bilinguals did not need to manage multiple language systems simultaneously or adjust their brain’s processing to accommodate different languages, there would be less demand for flexibility or additional engagement of the ECN. Therefore, if bilinguals and monolinguals exhibited similar patterns of brain activity, it would indicate that bilinguals were not experiencing the added cognitive challenge of handling sounds from both languages concurrently.
Evidence indicates that distinct brain regions within the ECN are selectively activated by various tasks, such as updating, inhibition, switching, and dual-tasking (Saylik et al., Reference Saylik, Williams, Murphy and Szameitat2022). Similarly, Geng et al. (Reference Geng, Guo, Rolls, Xu, Jia, Zhou, Blakemore, Tan, Cao and Feng2023) demonstrate that bilingual individuals engage overlapping yet functionally distinct neural populations across their native and second languages. Geng and colleagues’ findings suggest that different but overlapping neural patterns are recruited in response to specific task demands and the language being processed. This differential engagement of brain networks, which varies according to linguistic or perceptual contrasts, aligns with our findings that bilinguals can adjust processing pathways based on the linguistic context. Overall, our results support the concept of dynamic neural engagement and adaptability driven by linguistic context and task-specific demands.
Overall, we propose that differences in brain activation patterns across language contexts arise from the ECN engaging distinct processes in response to varying perceptual demands. Specifically, when perceptual elements require enhancement for increased contrast or reduction for less contrastive perception, the ECN appears to deploy different strategies. These findings show that bilinguals can dynamically adapt to linguistic context, indicating that the ECN provides the flexibility needed to maintain and modulate parallel activation. Moreover, they extend and support existing models, such as the SLM and the L2LP model, by emphasizing the role of cognitive control in distinguishing language-specific phonemic categories.
5.4. Bilinguals double phonemic representation
As established in the introduction, the concept of bilinguals’ double phonemic boundary posits that bilingual individuals maintain dual phonemic representations for the same speech sounds. Although much of the supporting evidence comes from behavioral measures, it has not definitively identified which brain regions underpin the auditory discrimination of two phonemic categories with competing representations. Our findings show that ECN activation persists across both language contexts, indicating that both languages remain active even when only one is ostensibly in use, in line with prior research on bilingual speech processing (Abutalebi et al., Reference Abutalebi, Annoni, Zimine, Pegna, Seghier, Lee-Jahnke, Lazeyras, Cappa and Khateb2007; Abutalebi et al., Reference Abutalebi, Della Rosa, Green, Hernandez, Scifo, Keim, Cappa and Costa2011; Hernandez et al., Reference Hernandez, Dapretto, Mazziotta and Bookheimer2001; Ju & Luce, Reference Ju and Luce2004; Marian & Spivey, Reference Marian and Spivey2003; Spivey & Marian, Reference Spivey and Marian1999). Rather than reflecting a static “double representation,” these results point to a context-driven, dynamically recalibrated process at the level of phonemic perception, orchestrated by the ECN.
5.5. Language proficiency and its impact on results
It is essential to consider the influence of language proficiency and usage on the observed results. Although these variables inevitably affected the findings, it is crucial to determine how these effects are most likely reflected in the results. For example, the reduced MMN in the Spanish language context, compared to the English context, can be attributed to decreased language control in a less proficient or less frequently used language (Green & Abutalebi, Reference Green and Abutalebi2013). However, our findings do not necessarily support this interpretation for two reasons. First, the present study replicated previous findings, showing the expected amplitude pattern in MMN responses, with larger MMN in the English context indicating greater phonemic contrast and reduced MMN in the Spanish context suggesting less contrastive perception of the sounds (García-Sierra et al., Reference García-Sierra, Ramírez-Esparza, Silva-Pereyra, Siard and Champlin2012). This consistency implies that the bilingual participants were proficient enough in Spanish to adjust their perception of VOT across contexts. Second, the comparison with monolinguals in the Spanish context highlights the degree of perceptual flexibility demonstrated by bilinguals. In the following section, we discuss the specific brain regions associated with these perceptual adjustments and their correlation with language use at the time of the experiment, providing deeper insights into the neural mechanisms involved.
5.6. Language use shifts across the lifespan: patterns and influences
Bilingual language activation is complex. For instance, research on bilingual language production indicates that when bilinguals must either strictly adhere to or dynamically switch between both languages, cognitive effort increases (Green, Reference Green1998; Hernandez et al., Reference Hernandez, Dapretto, Mazziotta and Bookheimer2001; Abutalebi et al., Reference Abutalebi, Della Rosa, Green, Hernandez, Scifo, Keim, Cappa and Costa2011). However, when bilinguals select their language without strict adherence or dynamic switching, cognitive effort does not significantly increase (Blanco-Elorrieta & Pylkkänen, Reference Blanco-Elorrieta and Pylkkänen2017; Kleinman & Gollan, Reference Kleinman and Gollan2016; Zhang et al., Reference Zhang, Wang, Huang, Li, Qiu, Shen and Xie2015; Zhu et al., Reference Zhu, Blanco-Elorrieta, Sun, Szakay and Sowman2022). While this indicates that a control mechanism may not always be essential for language selection (Costa & Santesteban, Reference Costa and Santesteban2004; Costa et al., Reference Costa, Santesteban and Ivanova2006), there are real-life scenarios where bilinguals must maintain the use of only one language. In these scenarios, it is proposed that highly proficient bilinguals exhibit lower levels of inhibitory control than less proficient bilinguals (Green, Reference Green1998; Green & Abutalebi, Reference Green and Abutalebi2013). Still, other researchers propose that once bilinguals learn to activate their language in a language-specific manner, they can utilize it in language-switching tasks, regardless of the proficiency levels of the languages involved (Costa et al., Reference Costa, Santesteban and Ivanova2006).
The variations often seen in research on bilingualism may partly stem from the diverse methods used to assess bilingual proficiency and the interplay between the initially learned language (Birdsong, Reference Birdsong and Birdsong1999; Johnson & Newport, Reference Johnson and Newport1989) and the frequency of language use (Dufour & Kroll, Reference Dufour and Kroll1995; Schreuder & Weltens, Reference Schreuder and Weltens1993). Numerous studies have demonstrated the involvement of the IFG in lexico-semantic processing and lexico-control as a result of increased attentional and verbal working memory demands for dual language processing and cross-linguistic integration of semantic information (Gabrieli et al., Reference Gabrieli, Poldrack and Desmond1998; Kovelman et al., Reference Kovelman, Baker and Petitto2008a; Kovelman et al., Reference Kovelman, Shalinsky, Berens and Petitto2008b; Petrides, Reference Petrides2005). Relevantly, IFG often reveals greater activation for L2 compared to L1. However, in the case of early bilinguals, many studies have reported increased activation for L1 compared to L2 (see for a comprehensive review, Sulpizio et al., Reference Sulpizio, Del Maschio, Fedeli and Abutalebi2020). Language proficiency, age of acquisition, frequency of use, and the specifics of the language task may all have an impact on this variation in IFG activation, which reflects the complexity of language processing in bilingual individuals.
Our study’s methodology was designed to simulate a real-life scenario: watching a movie in either Spanish or English. Specifically, bilingual participants adhered strictly to one language by passively attending to a movie without generating responses related to experimental tasks. Regarding language background, participants exhibited the well-documented dominance shift in heritage bilinguals (Kohnert et al., Reference Kohnert, Bates and Hernandez1999; Valdés, Reference Valdés2005). In this context, bilinguals are primarily exposed to Spanish during early childhood, with English becoming more prominent later in both academic and non-academic contexts. The frequency of English use in daily activities was confirmed through digital recorders worn by participants for two days. Concerning language proficiency, their age-adjusted PPVT scores were within the normal range for both languages, indicating potential ceiling effects.
Given the minimal variability in language proficiency between Spanish and English and the predominant use of English at the time of the experiment, we chose to explore the relationship between the well-documented language dominance shift in heritage bilinguals and brain activation during speech discrimination. Our findings revealed that bilinguals who experienced a notable shift from predominantly using Spanish in early childhood to primarily using English in adulthood exhibited stronger IFG activation in the English language context compared to the Spanish context. This heightened activation likely reflects increased engagement of executive control mechanisms within the IFG to manage contrastive phonemic distinctions and optimize speech processing in the dominant language through adjustments in processing pathways.
Overall, the correlational results highlight the complexities of language proficiency and brain activation in heritage bilinguals. By examining bilinguals who transitioned from using Spanish (L1) to English (L2) as their dominant language, we found that despite high proficiency in both languages, there was stronger activation in the IFG during English tasks. This aligns with existing research on the role of IFG in lexico-semantic processing, particularly in bilingual contexts, where language use and proficiency, along with the age of acquisition, influence brain activation patterns (Sulpizio et al., Reference Sulpizio, Del Maschio, Fedeli and Abutalebi2020). Our results contribute to the understanding of the dynamic nature of bilingual language processing and the neurological underpinnings of shifts in language dominance. Therefore, our study not only reinforces the current understanding of the role of IFG in bilingual lexico-semantic processing but also enhances knowledge about the neurocognitive processes involved in language dominance shifts.
6. Conclusion
This study provides novel insights into how bilinguals perceive and process speech sounds that have competing phonemic representations across languages. By using ERPs and sLORETA to measure and localize brain activity, we found that bilinguals exhibit greater activation in regions associated with the ECN when processing these sounds, especially in different language contexts. Specifically, increased activation in the left frontal cortex during the English context suggests that the ECN plays a crucial role in adjusting processing pathways to accommodate language-specific phonemic contrasts.
Our findings extend existing models like the SLM and the L2LP model by emphasizing the importance of cognitive control mechanisms in differentiating language-specific phonemic categories. The ability of bilinguals to dynamically adjust their processing pathways based on linguistic context underscores the flexibility and adaptability of the ECN in managing parallel activation across languages.
Additionally, the observed shifts in language dominance among heritage bilinguals highlight the complex interplay between language proficiency, usage, and neural activation patterns. The stronger activation in the IFG during English tasks reflects the neurocognitive adjustments associated with changes in language dominance over time.
Overall, our study enhances the understanding of the neural mechanisms underlying bilingual speech perception. It emphasizes the pivotal role of the ECN in enabling bilinguals to navigate between languages efficiently, thereby contributing to the broader knowledge of bilingual language processing and cognitive control.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S1366728925000148.
Data availability statement
The data supporting this study is available upon request by contacting the main author.
Acknowledgments
We wish to thank for data collection assistance to Noelle Wig, Eilis Welsh, Calli Smith, Sarah Polcaro, Lexi Arcomano, Molly Barnett, Sydney Bates, Christine Cammisa, Kaleigh Constantine, Leiah Cutkomp, Tayla Duntz, Kristen Fagan, Crystal Flores, Lina Kane, Ashley Lombardi, Alondra Marmolejos, Amy O’Rourke, and Allison Tozzi.
Competing interest
The authors declare no competing interests exist.