Asymmetrical Rhythms

Hi everyone.  I’m sorry about the delayed post, I’m still furiously catching up from Charlotte, as I’m sure are many of you.

Pleases read TOUISSANT, G et. al. 2011 “Computational models of symbolic rhythm similarity: Correlation with human Judgments” pages 380-402 & 418-424 only (pdf 1-23 & 39-45). (Feel free to skip experiments 2 and 3; I will summarize them in class).

In preparation for class, I would like everyone to ruminate on the relationships between different mathematical measures of rhythm similarity (symbolic) and human judgment of rhythms (heard) as similar. Specifically, what is the perceptual difference between swap and edit distances? Why do you believe edit distances performed better?  Are there any rhythmic circumstances in which you might expect swap distances to better correlate to perception than edit distances?  Do you have any thoughts on how to “change the edit distance so that it is impervious to counter examples” (421)?  Might it be beneficial to reconstruct “distance” in a different way than the minimum number of required mutations?  If so, how would we proceed?


My aim in these questions is to foster a discussion on: symbolic metrics, what different distances “mean,” experimental design, representation of results, and potential follow-up studies.


3 thoughts on “Asymmetrical Rhythms

  1. Anyone familiar with Toussaint’s ongoing work on rhythm—typically to the exclusion of meter—and timelines is likely to recognize more of the same with this article. Yet, it is somehow refreshing that on this occasion we engage Toussaint in an experimental setting. A brief summary: Toussaint et al. (2011) empirically investigates the validity of two general claims, one abstractly theoretical and the other musicologically theoretical, both of which attempt to express similarity relationships among various rhythms, where it may be said that the motivation for such formal models aim to capture some salient intuition. The article tests Toussaint’s swap and edit distances and Mario Rey’s putative Afro-Cuban ‘parent rhythms’. As I recall, swap distance is the earlier of Toussaint’s ideas, and initially, this theory was limited by its requiring equal cardinality among objects compared—although according to this article, something like multi-set is permissible in swap operations. I suspect that Toussaint developed edit distance in response to this initial limitation in order to more generally deal with comparative situations involving differing cardinalities (whether in terms of cycle length or attacks). Ultimately, Toussaint et al. finds that experimental trials tend to support the idea that edit distance more closely represents or models the (un-) conscious thought processes of a given listener asked to assay similarity.

    I can appreciate the impulse behind this research—to test the empirical validity of a conceptual model—as much as I enjoy unbounded theory building, which may itself enjoy a more abstract elegance or validity. The occasional juxtaposition of the two can indicate new research directions, synthetic or not. By its admitted problems, Toussaint et al.’s article points toward interesting and more complicated further study. The most significant problem with the present study, and surely not one easy to resolve, is that it assumes meter and its almost certainly confounding effects away. Peter suggestively asks whether there might be some other way to construct ‘distance’; probably, we must do so upon re-introducing meter in tandem with rhythm. Here’s an example problem sketch. The ancestral tree (BioNJ) that Toussaint constructs using edit distance calculations for the Afro-Cuban rhythms from Rey’s work (‘parents’ and thus ‘progeny’) does not make intuitive sense, even if it seems to agree with the specifics of Rey’s phylogeny argument. For instance, the distance of the cinquillo from the tresillo appears rather large, while in my own experience the former is a kind of decoration of the former (upbeats attach to the second and third attacks of the tresillo). So what underpins my own intuition? I suggest that it at least partially considers rhythms as related if they project a common meter; to me, the cinquillo is a rather straightforward, if not denser, expression of a 332 meter as is, for that matter, the so-called cinquillo variant. Contrarily, edit distance, as a sort of metrically indifferent algorithm, places the cinquillo close to the contradanza, presumably and unknowingly due to the structural fact that both are denser attack patterns, and differ by only one in terms of cardinality.

    The authors write, ‘Our goal here is not to study the edit distance in the context of a metrical theory of rhythm, a problem of great interest in itself, but rather to test the robustness of its correlation with human perception, when the listener is free to create any metrical interpretation he or she provides’ (382). Yet how robust can these similarity judgments be when metrically abstract? Resulting trends may certainly sublimate into their characteristics the tendencies of rhythms to project one meter or another and for listeners to induce in common ways without ever facing them directly. But in the name of robustness: do abstractly similar durational rhythms continue to be construed as similar across several fixed and enforced metric environments? But then again, perhaps such a direction only makes unnecessary mess. That is, would an imposed meter also impose the (structural) similarity that must otherwise be induced by the listener?

    – S P G

  2. It seems worth noting that the insertion and deletion operations used when measuring edit distance allow for the preservation of large rhythmic chunks without much “cost”, even when two rhythms are metrically dissimilar (perhaps not the best wording, and, granted, Toussaint is not concerned with meter here). This is not true of the swap measure. Take, for example, the Gahu rhythm, but move all of the attacks over one square. Then compare it to the original Gahu: the swap distance is 5; the edit distance is 2. And then consider the distance between the Gahu and Bossa-Nova, which share a large rhythmic chunk in a metrically similar setting (edit 2; swap 1).

    This is another instance in which the point I raised last week seems relevant: how well do these abstract models perform when confronted with real music? It seems we always strip stimulus down to its bare essentials, often out of necessity, yet rarely consider the consequences of doing so. Why should I care which of two abstract sets of operations (i.e., swap or edit) better represents human perception under such highly constrained conditions? It’s fascinating from a mathematical perspective and makes for some nice foliage, but I’m not sure what I’m to take from the experiments.

    This isn’t to criticize Toussaint et al. in any way. What are they (or anyone) to do in a study like this when real music contains so many confounding factors? I’m not really sure.

    • Just to clarify, one of the points in comparing the new-Gahu, Gahu, and Bossa-Nova is to point out the following: under the edit measure, the new-Gahu/Gahu pair and the Gahu/Bossa-Nova pair are equally similar, which is (in my mind) counterintuitive.

Comments are closed.