Acoustic signal
produced by air that is pushed up from the lungs past the vocal cords and into the vocal tract.
vowels
produced by vibration of the vocal cords
formants
Formants: The frequencies at which these peaks
consonants
Consonants are produced by a constriction, or closing, of the vocal tract (thus changes in vocal tract, i.e constriction of the vocal tract) and air flow around articulators.
every other sound (like consonants) are created by the movement of air and shape of your articulators (are the tongue, lips, teeth, jaw, and soft palate)
phonemes
smallest unit of speech that changes meaning of a word
In English there are 47 phonemes:
spectrogram
Dark smudges - formants
lack of invariance or variability problem:
no simple relationship between a particular phoneme and the acoustic signal
acoustic signal for a particular phoneme is variable.
variability from different speakers
Speakers differ in pitch, accent, speed in speaking, and pronunciation -> This acoustic signal must be transformed into familiar words
variability from context
even though we perceive the same /d/ sound in /di/ and /du/, the formant transitions, which are the acoustic signals associated with these sounds, are very different.
-Thus, the context in which a specific phoneme occurs can influence the acoustic signal that is associated with that phoneme.
Categorical perception
a wide range of acoustic cues results in the perception of a limited number of sound categories
multimodal
speech perception is multimodal; our perception of speech can be influenced by information from a number of different senses.
McGurk effect
although auditory information is the major source of information for speech perception, visual information can also exert a strong influence on what we hear
audio-visual speech perception
The McGurk effect is one example of audio-visual speech perception. (Eg. people routinely use information provided by a speaker’s lip movements to help understand speech in a noisy environment )
Experiment
The McGurk effect
-Visual stimulus shows a speaker saying “ga-ga.”
“top-down” processing affects speech perception
-speech perception is determined both by the nature of the acoustic signal (bottom-up processing) and by context that produces expectations in the listener (top-down processing).
phonemic restoration effect:
The ability to fill in part of a word that has been obscured was experienced even by students and staff in the psychology department who knew that the /s/ was missing.
- can be influenced by the meaning of words following the missing phoneme
The segmentation problem -
there are no physical breaks in the continuous acoustic signal.
speech segmentation
The perception of individual words in a conversation is called speech segmentation.
How we perceive breaks in words
-knowledge: Top-down processing, including knowledge a listener has about a language, affects perception of the incoming speech stimulus
-perceptual organization of the sounds, and this change was achieved by your knowledge of the meaning of the sounds.
transitional probablilites
statistical learning
the chances that one sound will follow another sound.
transitional probabilities
The process of learning about transitional probabilities and about other characteristics of language is called statistical learning. Research has shown that infants as young as 8 months of age are capable of statistical learning.
The pop-out effect
shows that higher-level information such as listeners’ knowledge can improve speech perception.
- hat after experiencing the pop-out effect subjects be- came better at understanding other degraded sentences that they were hearing for the first time.
broca’s aphasia
-