Grossberg & Myers

Based upon a dynamical model of neural processes within a broader theory called Adaptive Resonance Theory, the model posits an ongoing dynamic competition where top-down feedback to phonemic item representations creates a slowly developing resonance between item and list levels, which is sustained by the feedback.

Speech is characterized by 4 types of acoustic segments:

  1. formants (energy concentrated in narrow frequency bands)
  2. transitions (linking formants to other acoustic segments)
  3. higher frequency spectrally shaped noise
  4. silent gaps (associated with stop and affricate consonants)

Context Effects occur when perception of one phoneme is altered by changing the acoustic characteristics of nearby sound segments. Trading Relations occur when a phonemic percept can remain unchanged by simultaneously changing more than one acoustic features of the signal - they "trade against each other."

Consider the sentence:

Did anyone see the gray ship?

from Repp, Liberman, Eccardt and Pesetsky (1978, JEP:HPP 4(4):621-637). Perception varies based on the silence gap between gray and ship and the duration of the fricative in ship.

Perception of GRAY and GREAT

gray

The orignal utterance lies in area 1, with no silence between the "ay" and "sh" and a fricative noise of about 122 msec.

great

When exposed to a silent interval inserted between "gray" and "ship," the listener would assimilate the silence and the "sh" into cues for a stop consonant, perceiving "gray" as "great." Given a noise duration of 160 msec, the "t" was perceived even after relative long 100 msec gaps of silence. The "t" was then grouped with the preceeding word "gray" rather than with the temporally contiguous "chip" signal.

Shift from SHIP to CHIP

chip

By shortening the duration of the fricative noise, there is a switch from "gray ship" or "great ship" to "gray chip." Looking at areas 2 and 3, for a given silence duration shortening the noise duration caused the perceived stop "t" to leave the first syllable "grea" and latch onto the fricative "sh" to form the affricate consonant "ch" ("tsh"). Without changing the amount of silence separating the workds, a variation in the initial segment of the second word can alter the perception of the first word!

Further, the boundary between areas 2 and 3 shows a trading relation between silence and noise durations. At longer silence durations longer noise durations are required to cue the switch.

Perception of GREAT CHIP

In area 4, there is a stop consonant perceived to exist in both words, the "t" in great and the "ch" in chip. Notice that the transition from region 3 to 4 shows that increasing the separation of "chip" from "gray" can change the "gray" to "great."

Three questions:

  1. How does the brain generate perceptual representations such that coherent groupings like "gray" and "chip" can influence each other across long time spans?
  2. How can the representations emerge such that "t" can leap over a preceeding interval of silence without filling the interval?
  3. How does the brain deal with these context sensitive representations without altering the order in which the groupings are perceived?

One possible answer is a hierarchy of processing levels linked by bi-directional pathways:

At the lowest level, peripheral auditory neurons send signals to higher-order neurons that encode iconic sensory features. A pattern of activation across these feature detectors within a small time interval activates an item representation, which are stored in working memory as a temporal succession of sounds. The working memory transforms the sequence of sounds into an evolving pattern of activation.

The activity patterns across these working memory items in turn activate list chunks (representing phonemes, syllables, or words), which are context-sensitive representations of a particular temporal sequence of items. Since sequence is encoded, these are actually list sequences. Active list chunks feed back down to the item working memories to support their neural representations while suppressing items in the working memories that are not represented by the active list (an inhibitory process).

This would explain phonemic restoration experiments where broadband noise is perceived as different phonemes depending upon context.

When a phonemic sequence in working memory excites and then receives confirmatory top-down feedback from a list, the positive feedback loop enhances the activity in both fields through resonance. This model proposes that when listeners perceive fluent speech, a wave of resonant activity plays across the working memory, binding the phonemic items into larger language units and raising them into the listener's conscious perception.

The key problem is the different time scales at these levels. They need to be coordinated in order to form a unified speech percept. The rate of conscious speech is equal to the time scale of the resonance between processing levels.


from The resonant dynamics of speech perception: Interword integration and duration-dependent backward effects by S. Grossberg and C. W. Myers, 2000, Psychological Review, 4, 735-767. Web version here.

| 310 Home Page | Psychology Department | SUNY Oswego |


Comments to author: David Bozak
All contents copyright © 2000, SUNY Oswego, All rights reserved.
Revised: February 26, 2001
URL: http://www.cs.oswego.edu/~dab/310/grossberg.html