DAVIS, Harrison: The Emotional Effect of a Phrase’s Strength and its Metrical Placement in an Expressive Performance

Wikis > Final Projects > DAVIS, Harrison: The Emotional Effect of a Phrase’s Strength and its Metrical Placement in an Expressive Performance


There are many elements in the vast realm of music, but few are as widely assumed and celebrated as music’s intimate connection with the emotional spectrum. The link that a performer and a listener develop through musical expression is essential to both the ethical aspect of enriching societal and personal bonds, and perhaps even to the more practical matters of facilitating emotional and mental healing. However, the exact details behind how rhythmic elements in music create the empathic emotional bond characteristic of expressive performances remain vague. Research thus far has established the importance of the interaction between rhythm and tonality in emotional expression (Madison, 2000) and has given evidence towards the activation of neural reward centers during expressive performances (Chapin, Jantzen, Kelso, Steinberg, and Large, 2010), but has not ventured into the specifics behind the interactions that generate these emotional and neural responses. In my research, I aim to investigate the specific elements behind these phenomena and approach the issue of perceiving emotion in music along the lines of the specific expressive microtiming of accented phrases.

The most salient concept in my research proposal is that of microtiming. A functional definition for the concept is given by Vijay Iyer in his article, “Embodied Mind, Situated Cognition, and Expressive Microtiming in African-American Music”:  “miniscule timing deviations in performed music,” (Iyer, 397). Microtiming variances can arise in a number of contexts, but I am interested by its production in the setting of a performer playing pieces in an expressive manner, intentionally introducing these structural variants. I believe that microtiming relates to emotion in that when one entrains to a meter, one forms both psychological and physiological anticipations of accented time points. When a performer inserts expressive variations on the timing of these accents, expectations are made incorrect and the slight asynchrony of entrainment to actual onsets creates a tension within the listener that is translated, within a musical context as a specific emotion.

In investigating this idea I aim to use the concept of “phrase strength” which was introduced by Professors Cheng and Chew of the University Of Southern California Viterbi School Of Engineering in a study in which they developed computational methods for the analysis of phrasing strategies. Specifically, the term signifies “the average loudness difference between a (representative sound wave’s) local maximum and the two adjacent local minima.” The terms “phrase volatility” and “phrase typicality” refer to the “the degree and quantity of variance from the average phrase strength” and the “popularity of (a specific metrical location) as a phrase peak.” I will use these concepts to investigate my intuitions about microtiming’s relation to emotion by using musical situations in which the phrase strength of an accent is of high phrase volatility (the accents are significantly higher in variance from average phrase strength) and its phrase typicality is low (its specific location being either slightly before or after entrained accent). In performances that contain such variations (either “laid back” accents or “stiff” accents) as a consistent structural feature, I hope to see a predictable pattern of emotional perception in listeners.

My research question is thus: How do volatile accented phrases which feature a characteristic and consistent phrase typicality of either delayed or preemptive microtiming within a performance affect the perception of emotion in said piece?

The goal of my experiment will be to investigate the specific elements behind how microtiming in a performance leads to the perception of general emotions in listeners. The methodology introduced by Cheng and Chew is a way to objectify the specific qualities of microtiming variances and phase/accent qualities that will be manipulated. By applying the terminology they created, I can directly relate the prominence of an accent and its metrical characteristics with its general emotional interpretation. More specifically, the “perception of emotion” would be approached experimentally along the lines of subjective evaluations characterized by requested descriptions and analysis. I will request that participants evaluate and describe their internal emotional response and analyze, through description, the emotional features of the performance.

Bibliography & Annotations –

Cheng, E., & Chew, E. (2008). Quantitative analysis of phrasing strategies in expressive performance: Computational methods and analysis of performances of unaccompanied Bach for solo violin. Journal Of New Music Research, 37(4), 325-337.

Abstract – This paper presents computational methods for quantitative description and analysis of expressive performance strategies in violin performances. We present general techniques for extracting beat-level tempo and loudness data, and the Local Maximum Phrase Detection (LMPD) method. The LMPD method equates local maxima in the loudness curve with interpreted phrases, and defines measures of phrase strength (clarity), phrase volatility (standard deviation), and phrase typicality (concurrence with norm), for characterizing each phrase. The methods are developed in the context of, and applied to, eleven recorded performances of the Andante movement from Bach’s Sonata No. 2 in A minor BWV 1003 for solo violin by master violinists. For each performance, we present tempo and loudness summary statistics of the entire piece, its sections, and each phrase. In our experiments, we find that loudness is a more consistent indicator of phrasing strategies, suggesting that phrase structure may impose stricter constraints on dynamic than on tempo variation. The results of the LMPD method show that Kremer’s performance exhibits the highest, and Enescu’s the lowest, phrase volatility; Milstein’s shows the highest average phrase typicality, and Enescu’s the lowest; and, Grumiaux plays with the highest, and Menuhin the lowest, average phrase strength.

I found this article particularly insightful with regards to my question in that it informs the methodology substantially. This whole investigation was based on the development of technology, technology that can be used to investigate my question. Furthermore, the idea of “phase strength” was derived from this article. (RILM Abstracts of Music Literature)

Madison, G. (2000), Properties of expressive variability patterns in music performances.Journal Of New Music Research, 29(4), 335-356

Abstract – Common variability patterns for timing, articulation, and loudness were extracted by means of principal component analysis from music performances intended to express anger, fear, happiness, or sadness. Synthetic performances were generated with either timing, articulation, loudness, or no variability, which were rated by 10 musically experienced listeners on 10 adjectives, including their original emotions. Correlations were found between the ratings and two mathematical properties of the patterns, namely fractal dimension and durational contrast. The results suggest that both these properties describe relevant characteristics of the variability patterns, and that they play a role in emotional expression. Different roles for these properties in timing, articulation, and loudness variability were indicated. These findings may facilitate comparisons between different expressive domains, such as music, dance, speech, and body motion.

This article informs my question in that it begins the exploration of the same field I wish to investigate. Furthermore, the specific results they drew can help me decide the approach of characteristics that I will look at in my research. (RILM Abstracts of Music Literature)

SCHELLENBERG, E., Krysciak, A., & Campbell, R. (2000). Perceiving emotion in melody: Interactive effects of pitch and rhythmMusic Perception: An Interdisciplinary Journal18(2), 155-171.

Abstract – Examined the degree to which pitch and rhythm affect perceived emotional content of short melodies, and whether such effects are interactive or additive. Short melodies consistently judged to convey a single emotion were manipulated to derive the 3 altered melodic versions of pitches only, rhythm only, or equal pitches and duration throughout. 30 undergraduate psychology students with traditional Western music backgrounds rated the degree to which original and altered versions conveyed emotions. Results show that the effects of pitch and rhythm varied across melodies, including those whose original melodies expressed the same emotion. In all cases, ratings were influenced more by differences in pitch than by differences in rhythm. Whenever rhythm affected ratings, it interacted with pitch.

With regards to my research question, I feel as if this article and study provide a good sense of context for my focus on “phrase strength”. It lays out a researched ground for the contributions of rhythm to emotional perception in music and emphasizes the interaction between a piece’s melodic properties with its rhythmic features. This is highly relevant to my research as I will be attempting to elaborate on this relationship based on how manipulating highly specific metrical attributes will affect the affect of a phrase. Furthermore, the discussions of relative relevance of pitch and rhythm inform future decisions in choosing a piece for an experiment in that I would have some insight of how to manipulate performance moment – moment. (PsychInfo Database)


Literature Review

Of all the principles of music to be lauded and examined, few are as widely hailed as the purest representation of music’s significance than the emotions it can convey to a listener. In some ways, the entire world of musical theory is ultimately dedicated to the analysis and celebration of how sounds can become physical manifestation of the feelings that have long dominated human existence. As Guy Madison indicates in his article, ‘Properties of Expressive Variability In Music Performances,’: “One function of musical expressivity is to induce and represent emotions” (Madison, 2000). Despite the idealism behind this musical quality, after years of vigorous examination in the field of music theory, and many experiments revolving around the production of emotion through music, we still know relatively little of the concrete mechanisms behind the effects of expressive performance. We know especially little, in light of the frequent focus on the modal influences of emotion, about the specific interaction between rhythm and emotion. This lack of knowledge must be remedied. Besides the obligation we have to seek the secrets behind music’s most salient contributions, we face another obligation in the many societal benefits that could be attained with an intimate knowledge of the emotional language of music. Music is used in various mediums, such as television, ads, and entertainment that have intimate and important consequences on the psyche of a society.

The research that has focused on the salient rhythmic properties of expressive performance has laid a solid ground for more specifics oriented studies. I aim to use this ground to investigate the minute details behind how rhythm, through expressive microtiming, influences the perception of emotion in a musical piece. My research question is thus: How do volatile accented phrases – featuring a characteristic and consistent phrase typicality and notably high relative phase strength – within a performance affect the perception of emotion in said piece when they involve microtiming slightly preceding or following the primary accent? This question naturally employs terms that require elaboration, which will be supplied in the subsequent analysis of preceding research after a quick view of relevant studies in the field of emotion in music rhythm.

The general influences of rhythm on the perception of emotion in music have been explored to a respectable degree. Perhaps the most objective evidence for the prevalence of rhythm in the communication of emotion was found in a study conducted by Chapin, Jantzen, Kelso, Steinberg, and Large. In their article, ‘Dynamic Emotional and Neural Responses to Music Depend on Performance Expression and Listener Experience,’ they detail an experiment in which participants moved a mouse over an two-dimensional emotional response interface which reported arousal and valence (emotional character) responses in real time. The input received through this process, along with subsequent neural imaging through fMRI systems (both of which were implemented during an expressive performance of Chopin’s Etude in E Major and a controlled generated performance) revealed remarkable results. Both arousal and valence patterns (although to a lesser degree) were positively correlated with the tempo curve characteristic of the expressive performance at optimal time lag (significantly correlated in terms of arousal). Furthermore, there were noted activations through increased BOLD (changes in blood oxygenation) signal for the expressive performance in various brain areas such as the right posterior and anterior parahippocampal gyrus, the fusiform gyrus, the inferior parietal lobule, left medial and right dorsal medial prefrontal cortex, and bilateral ventral anterior cingulated (Chapin, Jantzen, Kelso, & Large, 2010). These results very clearly point the effect of expressive performance on emotional perception in music. These implications are essential to the function of my research, in that they lay out grounds for the effect of microtiming on emotional perception, the basic premise upon which my question stands.

In another landmark study on the prevalence of rhythm in musical emotion, detailed in the article, ‘Perceiving Emotion in Melody: Interactive Effects of Pitch and Rhythm,’ the role of rhythm in the overall presence of emotion in music is clarified. In their study, E. Glenn Schellenberg, Ania M. Krysciak and R. Jane Campbell sought to identify whether pitch and rhythm in a musical piece are additive or interactive in the production of emotion. Their method involved the generation of musical pieces from various sources (some were from traditional Eastern European melodies, others were written for the purpose of the study) that divided into three categories based on three basic and generally unambiguous musical emotions in music: happy, sad, and scary. They then made various pitch and rhythmic alterations to the pieces (both combined and isolated alterations) which were heard and evaluated on an emotional scale from one to seven by the participants. The results of this experiment point to several profound conclusions. Unsurprisingly, pitch was found to have significant effects on emotions in music. In relation, rhythm seemed slightly less impactful, and was relevant mostly when interacting with the pitch characteristics of the music (Campbell, Krysciak, & Schellenberg, 2000). This is highly significant to the nature of my question as the context of my curiosity is in the manipulation of precise rhythmic location in relatively strong phrases and accented locations. Another interesting result from this study was that effects of pitch and rhythm differed across contexts, even between those that expressed the same emotion (Campbell, Krysciak, Schellenberg, 2000), which is important to note for conclusions to be drawn in my own research. However, this discrepancy might be best explained by a different approach to emotional presence in music.

In his article, ‘Embodied Mind, Situated Cognition, and Expressive Microtiming in African-American Music’, Vijay Iyer takes a unique perspective that emphasizes the motion-oriented nature of music. He explores the idea of “embodied mind”, in which cognition is based upon the experience of having a body with sensorimotor capacities. Thus, “perception is understood as perceptually guided action.” (Iyer, 2002) His theory also is centered on the idea of “situatedness”, which asserts that, granted the body’s prevalence in projection of motion in music, that projection is immersed, as the body is, in an environment that shapes its experience. Iyer’s study went on to analyze several hypotheses of how these ideas were related to the “discrepancies” or “inaccuracies,” as others refer to them, of microtiming variations. He uses actual performances within the realm of African-American music to analyze how microtiming highlights structural aspects of the music, reflects the specific temporal constraints imposed by physical embodiment, and/or fulfills aesthetic function. This emphasis on motion might provide insight on how pitch and rhythm have varying effects within different contexts as variables such as metric type (e.g. duple v. triple) as such conditions may lead to different forms of bodily movement, reflecting differing states and realms of emotional expression. In the specific space of my research, this study provides useful insight on the workings of expressive microtiming in musical perception, with special emphasis on the structural characteristics of a musical piece (a feature I intend to focus on in terms of specific patterns of manipulation). It is notable that the evidence of physical significance in the perceptual manifestation of musical rhythm is backed by evidence presented in, ‘Dimensions of emotion in expressive musical performance’ (Dalca, Krumhansl, Levitin, Wanderley, 2005). This study indicated that the visual experience of a performer’s physical actions was a key influence on the perception of emotion in a musical performance.

In the realm of musical rhythm and emotion research, an article that certainly contributes substantially to the nature of discussion is, ‘Properties of Expressive Variability In Music Performances’ (Madison, 2000). In this article, a unique model of the overall metric qualities of a piece was used to inform the perceptions of said piece. Specifically, Madison formulates the “Hurst’s exponent”, which, to put it simply, is used to “describe the roughness of a pattern” and the correlation r between the structure of the melody and the variability patterns to characterized musical pieces which were subsequently evaluated by listeners. The results in turn indicated that the author’s theoretical model had played a role in emotional expression.

However, the theoretical model that is most relevant to my research question is that presented in an article by Eric Cheng and Elaine Chew, ‘Quantitative Analysis of Phrasing Strategies in Expressive Performance: Computational Methods and Analysis of Performances of Unaccompanied Bach for Solo Violin’. The authors present the “Local Maximum Phrase Detection (LMPD) method”, a computational method for the quantitative description and analysis of expressive performance. This method involves three key terms: (1) phase strength (Sj) – the average loudness difference between a phase’s local maximum and the two adjacent local minima; (2) phrase volatility – the standard deviation of all Sj values in the performance, which measures the degree and quantity of variance from the average phrase strength; and (3) phrase typicality (Tk) of a phrase with a local maximum at location k quantifies the popularity of that location as a phrase peak, or the proportion of other performers who also place a local maximum at the location k (for actual mathematical values for these terms, see Cheng & Chew, 2008, pp. [1]). All three of these terms are vital to the functioning of my research question. My research question is based on the premise that particularly strong phases which are consistently featured in similar locations across different performers in a given piece and are particularly volatile in that piece will have a profound impact on the perception of emotions and emotional intensity when manipulated on the scale of microtiming.

A  discussion that applies greatly  to the structure and design of research interests can be found in the article, ‘A Scientific View of Musical Rhythm’. In many ways, this article lays out the basic foundations behind the study of musical rhythm. The article discusses the many features of the psychological perception of rhythm, most notably, it explores the specifics behind the actual recognition of onset times, relative to the actual onset of events within a stimulus. It also provides a definition of microtiming of sorts. “Perceptual Attack Time” is defined as the “perceived moment of rhythmic placement”, before asserting that “Any detailed empirical study of musician’s timing, whether alone or in groups, must take PAT into account (choosing stimuli & analyzing response data)”. This article also describes the nature of microtiming as a property of expressive performance that can involve an irregular series of pulses, with a tempo curve correlated to the structure of a piece. It also defines microtiming as an attack time that does not exactly coincide with the exact time of a pulse, but instead may be precede or follow it by tens of milliseconds. (Wright, 2011)

The application of these musical models necessitates specific effective sources of music analysis hardware and a focused direction. To this end, two articles have proven to be particularly helpful. In ‘Expressive Asynchrony in a Recording of Chopin’s Prelude No. 6 in B Minor by Vladimir de Pachmann’, which explored asynchrony (i.e., “an aspect of microtiming that occurs when the constituent voices of a notationally solid chord are brought out of alignment in performance”). The results indicate the particular importance of asynchronies within the “subliminal range” (20-50ms). Most helpfully, the study pointed to a free performance analysis software, called “Sonic Visualiser”, which allows precise identification of onset times (aided through slowed down play-back). This operation is an essential requirement for the analysis of the musical stimuli certain to be a part of my experiment (Dodson, 2011). In another extremely useful article written by Eerola and Vuoskoski, ‘A Review of Music and Emotion Studies: Approaches, Emotion Models, and Stimuli’ a representative sample of the research on music and emotion was gathered and the prominent methods, models, approaches, etc. were quantified and analyzed to suggest future logical courses for the field of research. In its conclusion the article asserted that subsequent studies should aim for theoretical consistency, the use of a multiplicity of approaches simultaneously, the use of ecologically valid (and varying) stimulus material, and the careful awareness of participants’ background (Eerola & Vuoskoski, 2013).

As the field of research stands, the basic, general concepts of rhythmic influence on emotional perception have been established (Campbell, Krysciak, Schellenberg, 2000) and the relevance of expressive performance in the intensity of the emotions perceived has been indicated (Chapin, Jantzen, Kelso, Large, 2010). Furthermore, several other approaches have been developed, for example, some recent studies have shown interest in the idea of the physical embodiment music and musical emotion (e.g., Iyer, 2002). Amongst this research, the technology of rhythmic models relative to the study of rhythm has thrived, opening the doors to new means of exploring music temporality (Cheng, Chew, 2008). In this setting, I plan to use the avenues opened by past theoretical models to explore how specific manipulation of microtiming rhythmic elements affect perception of emotion in music. In doing so, I hope to open the door to the consideration of universal, or more likely – contextual, effects of expressive performance.



—  EEROLA, T., & Vuoskoski, J. (2013). A review of music and emotion studies: Approaches, emotion models, and stimuliMusic Perception: An Interdisciplinary Journal30(3), 307-340. doi:10.1525/mp.2012.30.3.307

—  WRIGHT, M. (2011). A scientific view of musical rhythm. In Berger, Jonathan (Ed.) Turow, Gabe (Ed.) Berger, Jonathan (Afterword), Music, science, and the rhythmic brain: Cultural and clinical implications (pp. 73-85). New York: Routledge New York, NY.

—  CHAPIN, H., Kelly, J., Kelso, S., Steinberg, F. & Large, E. (2010). Dynamic Emotional and Neural Responses to Music Depend on Performance Expression and Listener Experience. PLoS ONE 5(12). e13812. doi:10.1371/journal.pone.0013812

—  SCHELLENBERG, E., Krysciak, A., & Campbell, R. (2000). Perceiving emotion in melody: Interactive effects of pitch and rhythmMusic Perception: An Interdisciplinary Journal18(2), 155-171.

—  VINES, B., Krumhansl, C., Wanderley, M., Levitin, D., & Dalca, I. (2005). Dimensions of emotion in expressive musical performanceAnnals Of The New York Academy Of Sciences, 12, 462-466.

—  CHENG, E., & Chew, E. (2008). Quantitative analysis of phrasing strategies in expressive performance: Computational methods and analysis of performances of unaccompanied Bach for solo violinJournal Of New Music Research37(4), 325-338.

—  IYER, V. (2002). Embodied mind, situated cognition, and expressive microtiming in African-American musicMusic Perception: An Interdisciplinary Journal19(3), 387-414.

—  MADISON, G. (2000). Properties of expressive variability patterns in music performances. Journal Of New Music Research29(4), 335-356.

—  Dodson, A. (2011). Expressive asynchrony in a recording of Chopin’s prelude no. 6 in B minor by Vladimir de Pachmann. Music Theory Spectrum: The Journal Of The Society For Music Theory33(1), 59-64.