Neuroscience of speech perception

Friday, 1 January 2016

Listeners’ understanding of comprehensive speech in a non-native language

It can therefore be argued that the majority of speech comprehension studies has solely focussed on localising anatomical brain areas while few studies went beyond listeners’ perception of speech in their first language and considered listeners’ understanding of comprehensive speech in a non-native language (Inui et al., 1998; Kim, Relkin, Lee, & Hirsch, 1997; Nakai et al., 1999; Perani et al., 1996). The neuroimaging study by Nakai et al (1999) investigated Japanese speakers’ listening comprehension of their native language (Japanese), a comprehensive non-native language (English) as well as of a non-comprehensive non-native language (Hungarian). Nakai et al. (1999) were particularly interested in detecting distinct activations of separate language regions that respond to processing comprehensive and non-comprehensive languages. In contrast to prior research (e.g. Perani et al., 1996), they found no expansive responses of the IFG and angular gyrus when listeners passively listened to their native language. Nonetheless, similar to previous investigations (Mazoyer et al., 1993; Perani et al., 1996), Nakai et al. (1999) found both comprehensive and non-comprehensive languages to elicit activations from the posterior part of the STG. In line with prior research that found that the IFG is activated to a high degree during the passive listening of words (Mazoyer et al., 1993; Perani et al., 1996), and during perception of speech that is complex in syntax (Inui et al., 1998), Nakai et al (1999) observed the IFG to respond to the comprehensive languages, Japanese and English. These languages were also reported to activate the supplementary motor area (SMA) and the pre-motor area (PMA) indicating a role of these regions in perceived comprehensibility (Nakai et al., 1999). Finally, all languages were observed to elicit responses in the transverse temporal gyri and the PAC (Nakai et al., 1999).

However, the task in Nakai et al.’s (1999) study was a passive listening task and did not measure the neural correlates of participants’ active comprehension of their speech material. Additionally, their experiment dealt with sentence comprehension and did not consider listening comprehension at the word level. Moreover, they did not focus on revealing the possible linguistic benefit of a particular speech modification for listeners with varying levels of proficiency within one language (Nakai et al., 1999). Furthermore, measurements were taken from four participants only. Similar to Nakai et al (1999), the functional magnetic resonance imaging (fMRI) study by Inui et al (1998) investigated Japanese listeners’ speech comprehension. However, they did not investigate their notion at word-level as their speech material included sentences only (Inui et al., 1998).

It can therefore be said that no neuroimaging study to date has investigated the neural basis of native and non-native listeners’ comprehensibility of speech that includes a speech modification such as vowel space expansion, by using an active listening comprehensibility task with word stimuli that were produced in a naturalistic setting with a communicative purpose. Moreover, no neuroimaging study has studied the neural mechanism of the possible perceptual and cognitive advantage that vowel space expansion might provide listeners. Vowel space expansion is contributed to by changes in the first two formants, F1 and F2, and has been shown by prior behavioural studies to enhance listeners’ perception of speech (Ferguson & Kewley-Port, 2007; Uther, Knoll, & Burnham, 2007). It has been found that this kind of speech modification yields a large speech intelligibility benefit for native speakers and for early learners of a second language (L2) and to lead to a small intelligibility benefit for late L2 learners (Bradlow & Bent, 2002); however, this has not been investigated by a neuroimaging study either.

Spoken word comprehension

In contrast to speech intelligibility, speech comprehension entails numerous cognitive activities such as the integration of the physical speech signal over time and the access to and selection of appropriate semantic representations using decision strategies to operate semantic information (Davis & Johnsrude, 2003). Most studies on speech comprehension showed that extensively spread systems on both hemispheres are involved in speech comprehension (Benson et al., 2001; Binder et al., 1997; Chee, O'Craven, Bergida, Rosen, & Savoy, 1999; Demonet et al., 1992; Demonet, Price, Wise, & Frackowiak, 1994a; Nakai et al., 1999; Newman, Pancheva, Ozawa, Neville, & Ullman, 2001; Scott, Leff, & Wise, 2003; Spitsyna, Warren, Scott, Turkheimer, & Wise, 2006; Visser, Jefferies, & Lambon Ralph, 2010). Many investigations have been carried out to uncover the neural basis of speech comprehension (Crinion & Price, 2005; Humphries, Willard, Buchsbaum, & Hickok, 2001; Obleser et al., 2007a; Obleser, Eisner, & Kotz, 2008; Peelle et al., 2010). However, most of these studies focused on comprehension of spoken sentences.

Neuroimaging studies that focussed on the phonological and semantic processing of words demonstrated that together with the left parietal angular gyri, the left middle and inferior temporal gyri are involved in semantic processing while the left posterior inferior gyrus of the frontal lobe and the supramarginal gyri of the parietal lobe enable listeners to phonologically resolve sound information of words (Demonet et al., 1992, 1994a). The observation that those structures that are relevant for semantically processing auditory words are identical to those that are important for semantic operations on visually presented words, indicated a semantic system for words irrespective of presentation mode (Vandenberghe, Price, Wise, Josephs, & Frackowiak, 1996). This amodal processing of semantic information, through which broadly distributed temporal, frontal and parietal regions for speech comprehension of both auditory and visual words are engaged, has been further supported (Chee et al., 1999; Newman et al., 2001). Specifically, right frontal and temporal areas have been related to speech comprehension as well (Newman et al, 2001).

Other speech comprehension studies showed a stronger involvement of prefrontal and angular gyri when more effort was required to retrieve semantic associations when sentences in a foreign language were, for example, presented to listeners (Nakai et al., 1999). While the determination of word meaning through semantic context has been observed to elicit activation from the left superior frontal gyrus (Scott et al., 2003), tasks in which listeners were required to pay particular attention during speech comprehension was reported to activate the dorsal posterior frontal regions (Giraud et al., 1994). Moreover, the posterior middle temporal region, which was shown to respond to semantic requirements, was reported to become more active as executive needs intensified (Whitney, Jefferies, & Kircher, 2011). Although the ventral inferior frontal cortex was related to planned semantic operations (Adams & Janata, 2002; Badre, Poldrack, Pare-Blagoev, Insler, & Wagner, 2005), similar to the angular gyri and the left fusiform gyrus, the ventral inferior frontal cortex responded to elevated difficulty in accessing semantic knowledge when speech was presented in both visual and auditory modes (Adams & Janata, 2002; Rodd, Davis, & Johnsrude, 2005; Schmithorst, Holland, & Plante, 2006; Spitsyna et al., 2006). The angular gyri were related to top-down processing in forecasting semantic information and to the recovery and combination of concepts (Binder, Desai, Graves, & Conant, 2009; Brownsett & Wise, 2010; Obleser & Kotz, 2010).

More recently, the broadly dispersed system of semantic representation was suggested to include the posterior temporoparietal cortex, the precuneus and the left angular gyrus in the parietal areas, the middle and superior frontal gyri and the left frontal pars orbitalis as well as the posterior inferior temporal gyrus, the middle temporal gyrus and the anterior temporal fusiform (Rogalsky, Matchin, & Hickok, 2008; Visser et al., 2010; Visser & Lambon Ralph, 2011). Prior research on speech comprehension has reported a considerable overlay of parts of those structures that are significant for speech articulation, such as the pars opercularis and triangularis of IFG and the inferior and lateral areas of the right cerebellar cortex, with those structures that are essential for speech comprehension (Papathanassiou et al., 2000). This observation has been suggested to account for activities that are shared by both articulation and comprehension of speech and which include processes of for instance articulatory strategies, short-term auditory memory and semantic processing (Bookheimer, 2002; Papathanassiou et al., 2000; Wise et al., 2001).

Sunday, 27 December 2015

Auditory speech processing

The primary auditory cortex (PAC) on the superior temporal gyrus (STG) is known to have a fundamental part in processing hearing, as well as language and speech (Flechsig, 1896; Seldon, 1985). The STG encloses the tranverse temporal gyrus, known as Heschl's gyrus (HG) (Brodmann area 41/42), and has been established to be involved in auditory speech processing.

References:

Flechsig, P. (1896). Die Lokalisation der geistigen Vorgänge insbesondere der Sinnesempfindungen des Menschen. Leipzig: Verlag von Veit.

Seldon, H.L. (1985). The anatomy of speech perception. In A. Peters & E.G Jones (Eds.), Cerebral Cortex: Association and auditory cortices, Vol. 4 (pp. 273-326). New York: Plenum.

Speech word intelligibility

Previous research that looked into brain regions that respond to speech intelligibility compared intelligible speech with conditions that included nonspeech baselines by employing subtractive designs (Binder et al., 2000; Scott, Blank, Rosen, & Wise, 2000). One PET study, for instance, revealed that intelligible speech activates the anterior left superior temporal sulcus (aSTS) (Scott et al., 2000).

The superior temporal sulcus (STS) in the left hemisphere.

The study carefully matched stimuli for acoustic complexity, which differed in clarity. It demonstrated that the left posterior STS responded to auditory signals on a phonetic level regardless of whether these stimuli were intelligible or not (Scott et al., 2000). On the contrary, the anterolateral stream from the PAC showed a response to intelligible speech. Specifically, it was shown that anterior and ventral to the PAC, the STS responded to clear speech only. This study can therefore be considered to have shown that the response of the anterior section of the left STS evidently differs from the response of the posterior section of the left STS (Scott et al., 2000). This study is consistent with prior research that showed intelligible speech such as connected speech (e.g. stories) to activate anterior temporal lobe areas (Mazoyer et al., 1993; Schlosser, Aoyagi, Fulbright, Gore, & McCarthy, 1998). The findings in Scott et al.’s study (2000) are also in line with observations on patients with semantic dementia that was linked to loss of grey matter within the temporal lobe of the left hemisphere (Chan et al., 2001).

These findings were extended by a correlational fMRI study, in which acoustically degraded speech stimuli that differed in three distinct manners and in their extent of being intelligible were rated for intelligibility (Davis & Johnsrude, 2003). Brain regions that were identified to respond to intelligibility included, in addition to the bilateral anterior middle temporal gyrus, the left anterior hippocampus, the left inferior frontal gyrus and angular gyrus and the left pSTG (Davis & Johnsrude, 2003). It has been suggested that the involvement of the left anterior hippocampus represents a response to meaningful speech stimuli, which previously had been shown to be normally encoded and maintained in parts of the left medial temporal lobe (Strange, Otten, Josephs, Rugg, & Dolan, 2002). Activations of the left pSTS and the left angular gyrus related to intelligibility have been suggested to provide evidence of additional processing streams (Davis & Johnsrude, 2003).

Moreover, it was shown regarding brain regions that are sensitive to degraded speech that sentences heard in noisy conditions may be separated from noise by low-level auditory operations (Davis & Johnsrude, 2003). The observation of an especially marked response in regions close to the PAC that increased with degradation has been indicated to illustrate the augmented use of attentional resources to the masked speech stimulus (Davis & Johnsrude, 2003). The intelligibility of speech of degraded quality was also reported to modulate increased responses to degraded speech within the frontal operculum (Davis & Johnsrude, 2003). The observed elevated recruitment of attention during the perception of speech of degraded quality is in line with the suggestion that the perception of degraded speech requires more attention compared to clear speech (Rabbitt, 1990). Recent evidence for example implies that the extent to which listeners pay attention to speech determines both the comprehension of sentences that differ in speech clarity and the involvement of those brain regions that uphold speech processing (Wild, Davis, & Johnsrude, 2012). Specifically, attention was found to improve the processing of unclear speech in STS and the lIFG.

In addition to the finding of a path that is directed to the anterolateral temporal cortex involved in stimulus intelligibility (Scott et al., 2000), another PET study revealed a path towards the posterior superior temporal cortex specialised in processes involved in repetition (Wise et al., 2001). The results from both PET studies were confirmed by a further fMRI study (Narain et al., 2003). Using a passive language-listening task, the study indicated robustly left-lateralised activation for intelligible speech, including the posterior STG and the anterior STS (Narain et al., 2003). However, when directly compared to the study it is based on (Scott et al., 2000), the study by Narain et al (2003) showed a more intense posterior response compared to the anterior activation on the STS. This difference in results was attributed to the difference of the method used which resulted in an altered power of analysis. This was confirmed by a reanalysis of the original PET study data in Scott et al.’s study (Narain et al., 2003).

More recent research has also found elevated spectral information in the speech signal to activate the anterior STS in both hemispheres, in addition to the IFG (Obleser, Wise, Alex Dresner, & Scott, 2007a).

It was shown that when speech clarity decreases under adverse listening conditions, intelligibility of speech is assisted through raised functional connectivity across auditory cortical areas that include the posterior cingulate cortex, the dorsolateral prefrontal cortex and the angular gyrus (Obleser et al., 2007a). It has been suggested that the reported functional connectivity between the angular gyrus and the left IFG is supported by links between these areas through the superior longitudinal fasciculus (Eisner, McGettigan, Faulkner, Rosen, & Scott, 2010; Frey, Campbell, Pike, & Petrides, 2008; Obleser et al., 2007a). It can therefore be said that the processing of intelligible speech is supported by higher-level cortical regions that are distant from the PAC in an unfavourable listening condition and that the functional connectivity between these areas is fortified when speech intelligibility is aided by semantic context (Obleser et al., 2007a).