Songs as an Aid for Language Acquisition
Daniel Schon, Maud Boyer, Sylvain Moreno, Mireille Besson, Esabelle Peretz, Regine Kolinsky
In Science Direct (2008) 106, 975-983
Review and Response by John Picone
As a long time lover of both music and language, I have recently become intrigued by the possible relationship between the two, specifically, what role music might play in linguistic development.
Previous research studies have shown that adults and infants can use the statistical properties of syllable sequences to extract words from continuous speech. They have also shown that a similar learning mechanism operates with music stimuli.
In this work we combined linguistic and musical information and we compared language learning based on speech sequences to language learning based on sung sequences. We hypothesized that, compared to speech sequences, a consistent mapping of linguistic and musical information would enhance learning. Results confirmed the hypothesis showing a strong learning facilitation of song compared to speech. Most importantly, the present results show that learning a new language, especially in the first learning phase wherein one needs to segment new words, may largely benefit of the motivational and structuring properties of music in song (p. 975).
Indeed, songs may contribute to language acquisition in several ways. First, the emotion aspects of a song may increase the level of arousal and attention. Second, from a perceptual point of view, the presence of pitch contours may enhance phonological discrimination, since syllable change is often accompanied by a change in pitch. Third, the consistent mapping of musical and linguistic structure may optimize the operation of learning mechanisms (p. 976).
The researchers point out that one of the first challenges in learning a new language is to segment speech into words. One becomes acutely aware of this challenge when learning a new language which at first sounds like an uninterrupted stream of meaningless sounds. While word units in print text are marked by a space between the words, the word boundaries in speech are not necessarily marked by consistent acoustical cues such as pauses. So, how is it that we learn these word boundaries?
We accomplish this by discerning, unconsciously, the transitional probability of syllable sequences. In essence, there is a statistical probability that some syllables will follow others in a word, while others will end or begin a word. For example, given the phonological sequence prettybaby, the transitional probability is greater from pre to ty than from ty to ba. Both adults and infants use these statistical properties of syllable sequences to extract word units from continuous speech.
Other studies have shown that this statistical learning ability is not only language related, but can also operate with non-linguistic stimuli such as tones. That is, a similar statistical learning mechanism operates for tone sequence segmentation. For the researchers, this raises the possibility that a common learning device may be involved for both language and music. Given this, the experiments conducted in this study compare learning based on spoken sequences to learning based on sung sequences.
The researchers conducted three experiments, the third being, perhaps, the most intriguing. Each experiment involved 26 native French speakers, a different group for each experiment. The participants in each experiment listened to a continuous stream of speech for a period of seven minutes. The choice of time is significant as the researchers determined that it would be impossible to learn the spoken word units in such a short period of time, but hypothesized that it would be possible to learn them when they were sung. This decision was informed by a previous study (Saffran, Newport, & Aslin, 1996) that determined that participants needed roughly 20 minutes to learn the word units of a spoken stream of speech.
The researchers created a language of six trisyllabic words: gimysy, mimosi, pogysi, pymiso, sipygy, sysipi. The participants listened, in random order, to 108 repetitions of each of the six words, with the only constraint of never repeating the same word twice in a row. The text was presented to the participants using a speech synthesizer. No acoustical cues were inserted at word boundaries resulting in a rather monotone and continuous stream of syllables. The participants were told to listen to the sounds carefully, but not to analyze them in any way.
To test how well the participants learned the word units of the new language, they were presented with pairs of words, one being a word from the new language and the other a part-word made up of syllables from the new language, but not configured as a word. The part-words were comprised either of the last syllable of a “real” word followed by the first two syllable of a word, or the last two syllables of a word followed by the first syllable of another word. In other words, the “marker” syllables – first and last of a word – were placed in an opposite position. The participants had to choose which of the two words was one of the six of the new language. Each participant was presented with 36 pairs of words.
The results of the first experiment showed that the participants’ level of performance was not significantly different from chance: 48% correct. After 7 minutes of exposure to the new language, they were unable to discriminate words from part-words.
The second experiment was identical to the first except that the syllables of the continuous stream were sung by the synthesizer rather than spoken. It is important to note that the testing phase of the experiment was also identical to the previous experiment in that the items were spoken not sung. The difference was that each syllable was associated to a distinct tone and therefore each word was always sung on the same melodic contour.
Correct choices in the testing phase rose to 64%. The researchers concluded that the simple addition of musical information allowed the participants to discriminate words from part-words.
In responding to how language learning may benefit from musical information, the study concludes the following:
First, a general increase in the level of arousal or attention might increase overall performance. Second, the presence of tonal and discrete pitch changes between syllables may enhance phonological boundaries and therefore increase phonological discrimination. Indeed, syllables may be distinguished not only on the basis of their phonetic properties, but also on the basis of pitch information, and may also benefit of the gestalt properties of pitch, especially of grouping. Third, the consistent mapping of linguistic and melodic boundaries may enhance global transitional probabilities, thereby increasing the efficacy of the statistical learning mechanism (p. 980).
The goal of the third variation of the experiment was to sort out which of these explanations best explains the effect of musical facilitation.
While, in the third experiment, the syllables were still sung, linguistic and musical boundaries no longer matched. More precisely, while the second and third syllables of each sung word had consistent pitches, the first syllable could be sung on six different pitches. The testing phase was identical to the previous two experiments using spoken items.
This allowed us to (1) keep arousal constant because music had exactly the same structure as in Experiment 2, and (2) preserve phonological boundaries enhancement, because each syllable was still sung on a precise pitch. However, by decorrelating linguistic and musical boundaries, we eliminated the superposition of linguistic and melodic transitional probabilities. If we were to find the same facilitation effect as in the second experiment, then the effect should be due to arousal/attention or boundary enhancement. By contrast, if the effect were to disappear, then it would mostly be due to superposition of transitional probabilities (p. 980).
The study concludes that the results of this third experiment were significantly different from chance: 56% correct. For the researchers, the implication is that arousal and/or boundary enhancement play a role in learning.
This is in line with previous results with babies showing that infant-directed speech increases infants’ attention to fluent speech and consequently to the statistical relationship between syllables (Thiessen, Hill, & Saffrran, 2005). Moreover, if we were to consider that music is akin to prosody, these results would be in line with previous findings showing that prosodic information is important for segmentation…. Another interesting finding is that our results seem to point to the fact that, in the presence of multiple statistical cues, linguistic statistical cues take precedence over musical statistical cues (p. 981).
The reasons offered for this last finding are that the participants were all adults and not musicians. They concede that different results might be found in musicians and infants for whom prosodic cues are not only relevant but can even be more important.
Overall, our results are clear in pointing to the fact that learning is optimal when the conditions for both the emotional/arousal and linguistic functions are fulfilled. Therefore, learning a foreign language, especially in the first learning phase wherein one needs to segment new words, may largely benefit from the motivational and structuring properties of music in song. Whether this learning gain will extend to language acquisition in infants would be interesting to explore in future work. Indeed, if it were the case, it would support the idea that lullabies and children’s songs may have not only an emotional (communicative and reassuring) function, but would also facilitate linguistic processing due to their simple and repetitive nature (p. 982).
Initially, this study seemed difficult to respond to. Then, considering the study’s findings about music and arousal, I popped a CD of The Hannaford Street Silver Band into the stereo to see if that would help.
First, I do not find the results of the third experiment – 56% correct – to be earth shattering. However, I will give the researchers the benefit of concluding that this is, indeed, significant. It is certainly markedly better than the result of the first experiment.
I must also comment that I thought having the third experiment was quite a responsible undertaking. Prosody in normal speech is, indeed, variable as are the melodic contours of infant-directed speech. It’s not likely that a baby will hear “Time for beddy-bye!” sung the same way all the time. I also have a new appreciation for “baby talk.” It was something I rather loathed when other people spoke to my own children when they were infants in this fashion. What I find intriguing in this regard is the fact that mothers and grandmothers seem to naturally speak to infants in this way. Are maternal brains hardwired – unbeknownst to them – to enhance linguistic development in babies by speaking to them with the exaggerated musical contours of baby talk? Is this another miracle of the brain? Thankfully, my own children didn’t suffer from the exposure of their father’s “adult” talk as I often sang to them, perched on my lap, while I played the piano.
The study points to the statistical probability of syllable placement as significant in learning spoken word units. One aspect of syllabification that the study did not explore was the role of accented syllables. To what degree does the statistical placement of accented syllables contribute to learning aural word segmentation? This also leads me to wonder where accented or stressed syllables come from? Why is it, for example, that French tends not to stress the first syllable, whereas the same word in English does, in fact, often stress the first syllable: Mi - chel’ becomes Mi’ - chael; o - range’ becomes o’ - range. Accented syllables are expressed in pitch contours. Would this make it more difficult for a native English speaker to learn French as opposed to German or another language whose accented syllable placement and attending pitch contours more closely mimic English?
I also wonder if spoken Italian, whose melodic contours are much more “musical” than English, would be an easier language to learn than a language less musical in its sound.
My final response to this study has to do with music and the apparent arousal of pleasure it causes thereby facilitating learning in general. As a secondary school teacher, I often hear complaints by parents that their children insist on doing their homework with music blaring in the background. Is this a distraction or does the arousal facilitate learning the geometry concepts that will be on tomorrow’s test?
I seriously wondered if playing music, as I reflected on this study, would increase the possibility of insights into it, questions about it. Truly, I think it did.
Perhaps I’ve discovered the “Brass Band Effect!”