Musical Rhythms, Memory, and Human Expression

Musical Rhythms, Memory, and Human Expression

Ryan Davis, Angie Fuentes, and Kyle Yoder

Yale University, Cognition of Musical Rhythm, Virtual Lab



1.1  Introduction

The emotional properties of music, long recognized by music theorists, composers, and casual listeners alike, have yet to be fully explored by cognitive scientists. We do know that miniscule variations in timing between notes, called microtiming, are used by musicians to make their music sound more expressive; indeed, people listening to music that is played without microtiming often report that it sounds mechanical. Memory researchers have also demonstrated that emotional valence and social context strongly impact individuals’ ability to recall events. Our research seeks to explore the intersection of these two paths of research. [Kyle]

1.2  Previous Research

In 2008, Swedish researchers Juslin and Vjästfäll conducted a large review of the research into the connections between music and emotion. Despite the widely accepted belief that the two are inextricably linked, these researchers found that the evidence was not sufficient to describe the mechanism by which music could elicit the same emotions in different persons. They proposed a multipart mechanism that they believed could account for these emotional responses. One aspect of this mechanism was musical expectancy and rhythm.

Research has revealed that one major component of listeners’ ability to ascribe emotional valence to music is subtle variations in timing between notes in that music. These variations, called microtiming, are employed by musicians (consciously and unconsciously) in order to give their performance an expressive quality (Ashley, 2002; Repp, 1999). Indeed, most quantization software, meant to make computer-generated music sound “more human,” operates by inserting microtiming variations into the piece in order to make it less perfect and, hopefully, more expressive.

Much research into memory has also focused on the effect of emotion. Research has found that not only are memories with some sort of emotional content more likely to be retained and more easily recalled in the future, but also that memories with a social context show this effect even more robustly (Coppola et al., 2014; Jhean-Larose et al., 2014; Watts et al., 2014). In fact, researchers have found that direct administration of oxytocin, a neuropeptide often associated with feelings of attachment and prosociality, can provide participants with enhanced memory for otherwise non-emotional information (Weigand et al., 2013). Furthermore, memories of neutral events are often overshadowed by those of closely occurring emotional events (Watts et al., 2014).

Some research has been done into the intersection of musical rhythm and memory. Balch and Lewis (1996) found that hearing a familiar rhythm could facilitate participants’ memories of events that were happening when they last heard the same rhythm. Drake et al. (2000) compared how well musicians and nonmusicians could synchronize with human-generated pieces containing microtiming and how they well they could with computer-generated pieces played precisely as written. While they found that participants were better at synchronizing with the computer-generated pieces, they also found that they synchronized with the human-generated (that is, expressive) pieces at slower levels, at a narrower range of levels, and more correspondingly to the theoretically correct metrical hierarchy. They concluded that microtiming might transmit a particular metrical interpretation to the listener and enable the perceptual organization of events over a longer time span (Drake et al., 2000).

The present study seeks to build off of this research by exploring whether the microtiming variations and the expressive quality of the performance are sufficient to elicit these differences in cognitive processing, or if participants’ beliefs about the social context of the music may mediate these effects. [Kyle]

1.3  Present Research

In this study, we attempt to observe if the ease with which participants can recall a musical rhythm is impacted by their beliefs as to whether that rhythm was produced by a human or a computer. By testing participants in three separate belief groups – the rhythms were created by a human, the rhythms were created by a computer, or no specification about the origin of the rhythm – we hope to be able to detect differences in the accuracy of rhythmic memory as a result of belief group. We predict that those who believe the rhythms were created by a human will perform better at the rhythmic memory task. [Angie]


2.1  Participants

In total, 42 participants (25 female and 17 male) completed the study.  They ranged in age from 19 to 59 years old, with the mean age being 28.8 years of age (standard deviation=11.8 years).  All but three participants recorded English as their first language (the first language of 2 participants is Spanish and of 1 participant is French).  Thirty-two of the participants had at least 1 year of musical training, with 13 of these participants having at least 10 years of training.  Also, most of the participants play at least one instrument.  Four participants reported having some sort of hearing deficiency, either ringing in their ears or mild to moderate hearing loss. [Angie]

2.2  Stimuli

Our stimuli were brief, three-bar rhythmic samples in 4/4 time. We divided our rhythms into two groups based on difficulty, which we named Simple and Complex. To accommodate our desired number of participants in the experiment, it was decided that each participant would undergo eight trials, meaning that four Simple rhythms and four Complex rhythms were constructed. However, each respective rhythm had its own alternate version, a version that was only subtly altered, to enhance the experiment. As a result, there were 16 different rhythms in total. Each rhythmic sample was randomized in tempo (using an online random number generator), ranging between 70bpm and 90bpm, yet each alternate version rhythm carried the exact tempo of its original. This tempo range was chosen as it is commonly regarded as middle ground between slow and fast.

The Simple rhythms were constructed using only dotted half notes, half notes, quarter notes, and eighth notes. There were no syncopations in the Simple rhythms. The Complex rhythms were constructed adding sixteenth notes, dotted eighth notes, dotted quarter notes, and ties, thus creating syncopations. The rhythms were designed to be varied in content, and each alternate version’s subtle change was evenly spaced between rhythmic samples to avoid predictability. The subtle changes were done by either changing a rhythmic value (e.g., a quarter note becoming two eighth notes) or flipping a rhythmic cell (e.g., a quarter note and two eighth notes becoming two eighth notes and a quarter note).

The rhythmic stimuli were recorded by Michael Laurello, a composition student at the Yale School of Music, using Apple Logic Pro 9.1.8 and a “roto tom” sample sound from the Vienna Symphonic Library. Michael recorded each rhythm using 0%, 50% and 100% quantization, and it was decided that 50% was a true balance of rhythmic strictness and performance flexibility. 50% quantization was used for each rhythm throughout the entire experiment. [Ryan]

2.3  Task & Procedure

Participants were randomly introduced to eight of the rhythms (either simple or complex) , via one playing of the recording, and were asked to try and memorize what they heard. The participant was either informed 1) nothing 2) that the recording that they heard was done by a human percussionist 3) that the recording was done by a computer. Following a distractor task (word puzzles), the participant would then either be played the identical rhythm that they heard before the distractor task, or its alternate version. The participant would then be asked if what they heard the second time was the same or different from the first rhythm. [Ryan]

2.4  Data Collection & Analysis

Data was collected through Qualtrics survey website and exported into Microsoft Excel for analysis.  The data was analyzed by looking for potential effects of each participant’s belief condition on their ability to correctly identify whether they were given the same or different rhythms within each trial.  We also conducted limited analysis to discover any effects that demographics may have played on correct identification. [Kyle]


3.1 Population Sample

Forty-two participants (25 female and 17 male) were recruited via email and Facebook posts advertising the study.  Participants were all between 19 and 59 years of age (mean age = 28.79, standard deviation = 11.95, median age = 23.00) and all had completed at least a high school level of education.  Ten participants reported being unable to play a musical instrument, while the remaining thirty-two reported at least one year of experience playing: ten (23.80% of the total sample) reported playing primarily the piano, seventeen (40.47%) reported playing a string instrument (i.e., cello, violin, viola, or guitar), and four (9.52%) reported playing a woodwind or brass instrument.  Only one participant reported playing percussion. The number of years of training varied widely among these participants, with the most experienced player having performed on the  (mean = 7.38, standard deviation = 6.35, median = 7.00).  On a five-point scale (1 = no training, 5 = professional training), participants generally reported average familiarity with Western music training in either instrumental performance, vocal performance, and music theory(mean = 2.38, standard deviation = 1.41), while five participants reported a professional level of overall training.  Of the forty-two participants, four reported having some kind of mild hearing deficiency (two reported ringing, two reported mild hearing loss); however, all four reported being able to hear clearly the stimuli used in this study. [Kyle]

3.2  Analysis & Figure 1

Across all belief groups, participants performed better when the rhythm presented after the distraction was the same rather than when the rhythm presented after the distraction was different.  In other words, participants more often reported that the rhythm following the distraction was the same rather than a different rhythm.  This is true for all belief groups, as shown in Figure 1.  Combining all belief groups, 65.25% of participants answered correctly when the rhythm was the same (standard deviation=.0654), while 54.7% of participants answered correctly when the rhythm was different (standard deviation=.0314). This may be evidence that people tend to think rhythms are the same and are not particularly good at detecting minor differences between them.  This also may be evidence that the word puzzle distraction was too time-consuming or difficult and required much thought. [Angie]

bigger music graph

Figure 1.

3.3  Analysis & Figure 2

Figure 2 shows the Simple Rhythms and Complex Rhythms that were used in the experiment. The top rhythm of each grouping is the original form, while the bottom is its subtly altered version. Within a singular trial, participants either heard the top rhythm of each grouping two times (with the recording playings separated by word puzzle distractions) meaning the correct answer was that the rhythms were identical OR participants heard the top rhythm first, followed by the bottom rhythm second (with the recording playings separated by word puzzle distractions) meaning that the correct answer was that the rhythms were not identical.



Figure 2.

From a visual standpoint, it is immediately clear that the Complex Rhythms are indeed more difficult than the Simple Rhythms, due to the increased number of audible attack points. The Simple Rhythms ranged from 12 to 15 audible attacks, with an average of 13.125. The Complex Rhythms ranged from 17 to 21 audible attacks, with an average of 18.875. The increased number of attack points would naturally lead one to believe that it is more difficult to remember more information, especially given that our participants only heard each rhythm played one time. However, in general, our participants did not have an exceedingly strong score in identifying whether the second rhythm played (be it simple or complex) was the same or different than the first rhythm played. There are many possible reasons for this outcome, yet with our sample size it is impossible to determine any exact answers. The most obvious possible reason is that the rhythmic information was simply too long to retain after only one playing. This time gap was only reinforced by a following series of word puzzle distractions. In addition, the alternate versions of each rhythm were intentionally designed to be subtly different. The rhythmic differences were by no means significant, and according to our analysis, even those who identified themselves as musical experts were not remarkably superior in their trials. [Ryan]

[3.4  Analysis & Figure 3

As mentioned in section 3.1, five participants (4 male and 1 female) identified themselves as having a professional level of overall music training. These “expert” participants ranged in age from 22 to 36 (mean = 26.4, standard deviation = 5.68), each reported a different instrument as their primary (respectively: cello, clarinet, piano, viola, and violin), and all reported a minimum of ten years experience playing their instrument. We decided to examine whether these “experts” were significantly better at the task of identifying the rhythms than the general pool of participants.

Significance across conditions is impossible to show in this analysis, as three of the expert participants were randomly assigned to the computer-belief condition, while only one each was assigned to the human and no belief conditions. Taken as a whole, it appears that experts may be better than the general group of participants at correctly identifying the rhythms; however, due to the relatively small sample size of this group, these results are not significant (p>0.05). This is easily seen in Figure 3 below, which shows the average rate of correct responses to the rhythm identification rate in the expert and general samples. [Kyle]

Figure 3

Figure 3.


Our results do not reveal any impact of belief group on how participants’ ability to recall a rhythm. We predicted that participants would be able to better recall a rhythm if they believed it was performed by a human. Although there were minor differences in accuracy of rhythm recall between the three groups, no significant effect was demonstrated. Participants performed slightly better in the “no belief” group than in the other two groups, while the “computer-generated” belief group performed slightly worse than the other groups.

Similarly, no significant effect was found with regards to music training and participants’ ability to correctly complete the rhythm recognition task.  Nevertheless, the data trends that direction, providing basis for the hypothesis that, were more participants to be included in the study, this effect could be found to be significant.  This distinction is important because it provides insight into whether the rhythms used in this study were too complex for the average person to remember after listening only once.  Perhaps further research will reveal a “complexity threshold” for musical memory.

An unexpected finding from this study was that people tended to perform better in determining that a rhythm was the same rather than determining that a rhythm was different. However, further experimentation is necessary to determine whether this finding reflects an actual facet of human cognition. In this pilot study, it is possible that the changes in rhythms were simply too subtle for participants to detect.  Another possibility is that participants defaulted to saying that rhythms were the same, producing a “false positive” for this effect.

Although the findings of this pilot study did not provide major evidence for answering our question about the interplay of emotion, belief, and memory, they did provide guidance for future experimentation exploring the same topic. One limitation of using Qualtrics to collect data was that, instead of being asked to replicate the rhythm, our participants were given a task using a “same-different” paradigm. In other words, participants had a 50% chance of guessing the correct answer, potentially allowing correct guesses to skew our results. If subjects were required to recreate the rhythm–perhaps by tapping it–one would be able to more accurately determine if they had remembered the rhythm correctly.

A similar study in future would perhaps yield more revealing data if the selected rhythms were shorter in length. It would be of interest to determine if participants’ success in determining whether a rhythm was the same or different could be influenced by the actual percussive sound(s) used. For example, would it be easier to distinguish the rhythms if the chosen stimuli sound had a discernible pitch, or even multiple pitches? In addition, the combinations of different time signatures could provide further insight.

Another limitation of this study was that it was not randomized whether the second rhythm presented in a trial was the same or different. Whether the rhythm heard after the word puzzle was the same or different was predetermined. We tried to minimize bias by randomizing the order in which participants saw the trials; however, we were unable to randomly assign the rhythm after the word puzzle to be the same or different. This further randomization would have eliminated any possible bias of certain rhythms being more distinctive and easier to find differences in.

The subject of belief and memory is an interesting topic that still requires much experimentation to be fully understood. With this study, we hoped to provide a foundation and springboard for future endeavors in this area. In moving forward in researching belief and memory, it is necessary to run more experiments testing their relationship and think of new methods in which one can examine how belief affects memory. Suggestions for future studies would include replication, rather than recognition, of a rhythm, and varying the distraction difficulty and length between rhythm recognition. [Angie, Kyle, Ryan]



Ashley, R. (2002).  Do[n’t] Change a Hair for Me: The Art of Jazz Rubato. Music Perception, 19:3, 311–332.

Balch, W.R., & Lewis, B.S. (1996). Music-Dependent Memory: The Roles of Tempo Change and Mood Mediation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22:6, 1354-1363.

Coppola, G., Ponzetti, S., & Vaughn, B.E. (2014). Reminiscing Style During Conversations About Emotion-laden Events and Effects of Attachment Security Among Italian Mother–Child Dyads. Social Development, 23:4, 702–718. DOI: 10.1111/sode.12066.

Drake, C., Penel, A., & Bigand, E. (2000). Tapping in Time with Mechanically and Expressively Performed Music. Music Perception, 18:1, 1-23.

Jhean-Larose, S., Leveau, N., & Denhie`re, G. (2014). Influence of emotional valence and arousal on the spread of activation in memory. Cognitive Processing, 15, 515–522. DOI 10.1007/s10339-014-0613-5.

Juslin, P.N., & Vjästfäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences, 31, 559–621. DOI:10.1017/S0140525X08005293.

Repp, B. (1999). Individual differences in the expressive shaping of a musical phrase: The opening of Chopin’s Etude in E major. In Suk Won Yi (Ed.), Music, Mind, and Science, 239-270.

Watts, S., Buratto, L.G., Brotherhood, E.V., Barnacle, G.E., Schaefer, A. (2014). The neural fate of neutral information in emotion-enhanced memory. Psychophysiology, 51, 673–684. 
DOI: 10.1111/psyp.12211.

Weigand, A., Feeser, M., Gärtner, M., Brandt, E., Fan, Y., Fuge, P., Böker, H., Bajbouj, M., & Grimm, S. (2013). Effects of intranasal oxytocin prior to encoding and retrieval on recognition memory. Psychopharmacology, 227, 321–329. DOI 10.1007/s00213-012-2962-z.

Balch, W.R., & Lewis, B.S. (1996). Music-dependent memory: The roles of tempo change and mood mediation. Journal of Experimental Psychology: Learning, Memory, and Cognition. Vol.22(6), pp. 1354-1363.


Music-dependent memory was obtained in previous literature by changing from 1 musical piece to another. Here, the phenomenon was induced by changing only the tempo of the same musical selection. After being presented with a list of words, along with a piece of background music, listeners recalled more words when the selection was played at the same tempo than when it was played at a different tempo. However, no significant reduction in memory was produced by recall contexts with a changed timbre, a different musical selection, or no music (Experiments 1 and 2). Tempo was found to influence the arousal dimension of mood (Experiment 3), and recall was higher in a mood context consistent (as compared with inconsistent) with a given tempo (Experiment 4). The results support the mood-mediation hypothesis of music-dependent memory.

Balch & Lewis 1996

Juslin, P.N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences, 31(5), pp. 559-­621.


Research indicates that people value music primarily because of the emotions it evokes. Yet, the notion of musical emotions remains controversial, and researchers have so far been unable to offer a satisfactory account of such emotions. We argue that the study of musical emotions has suffered from a neglect of underlying mechanisms. Specifically, researchers have studied musical emotions without regard to how they were evoked, or have assumed that the emotions must be based on the “default” mechanism for emotion induction, a cognitive appraisal. Here, we present a novel theoretical framework featuring six additional mechanisms through which music listening may induce emotions: (1) brain stem reflexes, (2) evaluative conditioning, (3) emotional contagion, (4) visual imagery, (5) episodic memory, and (6) musical expectancy. We propose that these mechanisms differ regarding such characteristics as their information focus, ontogenetic development, key brain regions, cultural impact, induction speed, degree of volitional influence, modularity, and dependence on musical structure. By synthesizing theory and findings from different domains, we are able to provide the first set of hypotheses that can help researchers to distinguish among the mechanisms. We show that failure to control for the underlying mechanism may lead to inconsistent or non-interpretable findings. Thus, we argue that the new framework may guide future research and help to resolve previous disagreements in the field. We conclude that music evokes emotions through mechanisms that are not unique to music, and that the study of musical emotions could benefit the emotion field as a whole by providing novel paradigms for emotion induction.


Drake, C., Penel, A., & Bigand, E. (2000). Tapping in Time With Mechanically and Expressively Performed Music. Music Perception, 18(1), 1-23.


We investigate how the presence of performance microstructure(small variations in timing, intensity, and articulation )influences listeners’ perception of musical excerpts, by measuring the way in which listeners synchronize with the excerpts. Musicians and non musicians tapped on a drum in synchrony with six musical excerpts, each presented in three versions: mechanical (synthesized from the score, without microstructure), accented (mechanical, with intensity accents), and expressive (performed by a concert pianist, with all types of microstructure). Participants’ synchronizations with these excerpts were characterized in terms of three processes described in Mari Riess Jones’s Dynamic Attending Theory: attunement (ease of synchronization), use of a referent level (spontaneous synchronization rate), and focal attending (range of synchronization levels). As predicted by beat induction models, synchronization was better with temporally regular mechanical and accented versions than with the expressive versions. However, synchronization with expressive versions occurred at higher (slower) levels, within a narrower range of synchronization levels, and corresponded more frequently to the theoretically correct metrical hierarchy. We conclude that performance microstructure transmits a particular metrical interpretation to the listener and enables the perceptual organization of events over longer time spans. Compared with nonmusicians, musicians synchronized more accurately (heightened attunement), tapped more slowly (slower referent level), and used a wider range of hierarchical levels when instructed (enhanced focal attending), more often corresponding to the theoretically correct metrical hierarchy. We conclude that musicians perceptually organize evens over longer time spans and have a more complete hierarchical representation of the music than do nonmusicians.

This source compares how well people can synchronize with expressive versus mechanical excerpts.  This gives us knowledge of prior work that has compared human-like performances against computer-like performances.  The results from this experiment show that people were better at synchronizing with the mechanical excerpt, which is the opposite of our hypothesis.  However, this study also showed that people synchronized with the expressive excerpt at higher levels, at a narrower range of levels, and more correspondingly to the correct metrical hierarchy, which suggests that expressive, human-like performance may enhance certain aspects of synchrony that mechanic performances do not.

Drake & Bigand (2000)

Refined Idea


Is the ease with which participants can recall and replicate a musical rhythm impacted by their beliefs as to whether that rhythm was produced by a human or a computer?

Possible Theory, Conjecture, and Hypothesis:

Human beings find it easier to remember events containing emotional information than those that do not because we prefer social conditions, environments, and interactions to asocial ones.  Under this theory, we hypothesize that participants will be better at recalling and replicating rhythms they believe to be produced by humans than those produced by computers, as they will ascribe more emotional context to the human-produced piece.

Alternatively, a computer can produce a work that does not carry variations in microtiming, expressive or otherwise.  Under this, we hypothesize that participants will find it easier to recall and replicate those rhythms they believe were produced by a computer, as they will interpret such rhythms as exact and absolute.

Operationalization of First Hypothesis:
“… we hypothesize that participants will be better at recalling and replicating rhythms they believe to be produced by humans than those produced by computers…”

Participants = Students from Yale College and the Yale School of Music aged 18-25

Better Recall and Replication = We will ask participants to reproduce the rhythm exactly (to the best of their ability) after hearing it.  We anticipate a significantly higher accuracy in this task in the “human belief” condition(s).  Further refinement of our method will happen after we feel more confident in knowing what others have done.

Believe and Produced= Two recordings will be created, one by having a human being play a rhythm and one by having a computer generate a piece using the exact written timing of that rhythm.  Participants will be told either nothing about the origin of the piece (control), that a computer produced the rhythm, or that a human produced the rhythm.

Group Questions!

1) Is there a “sweet spot” in terms of musical repetition relating to the enjoyment of the listener?

2) How quickly can we identify the meter of a song, specifically when it is in a complex meter (5/8, 7/8, alternating bars of 3 and 4, etc)? Is there a great difference between musicians and non-musicians in identification? Or between types of musicians?


Possible Group Research Questions

1) Because of the entanglement of music and emotions, some research has suggested that background music can enhance memory.  What effect does this music’s “groove” have on the listener’s ability to retain information in working and/or long-term memory?

2) What effect does the “grooviness” of a song have on its “earworm” quality?