This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
with point distances between elements of 3, 1, and 4, respectively. Associate the alphabetic elements with the pitch-classes of the ordering <e, 2, 3, 7>, so the ordered pitch-class intervals produced by adjacent pitch classes correspond to the point distances between alphabetic labels. The number of points traversed moving from a starting to goal element determines interval size. Generate the cyclic permutation corresponding to the RNF ordering <7, e, 2, 3> in the same way, and normalize it to read in a left-to-right direction simply to facilitate point count comparisons. Cyclic permutations A1 and A2 produce identical perimeter intervals of eight and four points, respectively. The final perimeter interval of A1 and A2 is 3 points and 1 point, respectively. A2, the RNF <7, e, 2, 3>, ultimately produces the smallest perimeter interval, so it is the normal form.
Example 4. Uninterpreted Mod 12 space
Normal Form, Successive Interval Arrays, Transformations and Set Classes
35
Examining an un-interpreted version of the mod-12 pitch-class space reveals the asymmetrical bias in the original algorithm is the result of its interpretation of the space.10 A circle containing twelve equally spaced points represents the more general symmetrical structure of an un-interpreted mod-12 pitch-class space (see Example 4). Two rules determine movement around the space: 1) the move from one point to any other point must be made through points adjacent to the current point, and 2) once a direction is chosen, subsequent moves continue in the chosen direction until the goal point is reached.11 The important and relevant feature of this model is movement can proceed in two directions, clockwise or counterclockwise. Calculating ordered pitchclass intervals in the original algorithm limits movement to the clockwise direction in its comparisons of perimeter intervals. Therefore, it does not examine the perimeter intervals of cyclic permutations generated by moving around the circle counterclockwise. a) 0135=5 1 3 5 0 = (–1) 11 3 5 0 1 = (–2) 10 5 0 1 3 = (-2) 10
i<0, 5> i<1, 0> i<3, 1> i<5, 3>
b) 5310=7 3105=2 1053=2 0531=1
i<5, 0> i<3, 0> i<1, 3> i<0, 1>
The asymmetrical bias of ordered pitch-class intervals is the source of the ordering misalignments. The first step in the original algorithm of placing the pitch classes in ascending numerical order is really a convention determined by the formula for calculating ordered pitch-class intervals, because the normal form for the pitch-class set {0, 1, 3, 5}, for example, could be calculated ordering the pitch classes as either an ascending or descending series (see Example 5). The smallest perimeter interval can be calculated from within the permutation as the smallest interval between its first and last pitch classes, or it could be calculated from outside the permutation as the largest interval between its first and last pitch classes. Each method produces equivalent results, and the normal form in each case is identical to within retrogression.12 The ascending series corresponds to the clockwise motion produced by calculating ordered pitch-class intervals. The ordered pitch-class interval i<0, 5> is 5, so with pitch-class 0 as the first element of the ordered pair, the only direction to pitch-class 5 that traverses only 5 points is clockwise.
10
11
12
Another way of stating this is that the circle in Example 4 is similar to an uninterrupted model of the group Ζ12. That is, the representation models the structure of the group, but does not have any particular tokens associated with the elements in the representation. If moves along the diameter or radii through the center of the circle were allowed, the familiar concept of interval would be lost. We calculate our intervals by the number of steps it takes to move from one point to another through intervening points. In the latter method, the smallest perimeter interval is calculated indirectly, since the complement of the largest interval outside the permutation will be the smallest perimeter interval inside the permutation. Calculating the normal form by the outside interval appears in Rahn’s shortcut for normal form.
36
C. Scotto
Example 5. The asymmetrical bias of ordered pitch-class intervals
The descending series, however, reveals the clockwise bias inherent in calculating ordered pitch-class intervals. A descending series of pitch classes may intuitively imply a counterclockwise motion, but it actually produces a complementary clockwise motion. The ordered pitch-class interval i<5, 0> is the perimeter interval of the descending series and, subtracting pitch-class 5 from 0 initially produces the value –5. Interpreting the value literally would mean moving from pitch-class 5 counterclockwise to pitch-class 0, since counterclockwise is the only direction in which pitch-class 0 is five points from pitch-class 5. Negative intervals are, of course, converted to their mod 12 equivalents, so the -5 counterclockwise motion is converted to the clockwise complementary path between pitch-classes 5 and 0 traversing 7 points. Obviously, the normalization preserves many important concepts, such as complementary intervals. However, it also limits other concepts, such as normal form, since the bias those concepts inherit produces the inconsistencies in normal form comparisons. If mod 12 normalizes movement around the space to the clockwise direction, then mod -12 normalizes movement around the space to the counterclockwise direction (see Example 6a). The mod -12 space is just a reinterpretation of the group “Z12” with
Normal Form, Successive Interval Arrays, Transformations and Set Classes
37
a)
{2, 3, 7, e} mod –12→{-10, –9, –5, –1} b) (-10, –9, –5, –1} Step 1 <–10, –9, –5, -1> Step 2
NF
Step 3 ← -9 = <-10, -9, -5, -1> –5/–8 = <-1 -10, -9, -5 > -4/–8 = <-5, -1, -10, -9> –11= <-9, -5, -1, -10>
Step 2a Remove Last Pitch-Class to Create Next Container Interval < –10, –9. -5> < –1, –10, –-9>
<-10, -1>i <-1, -5>i <-5, -9>i <-9, -10>i
-10 –(1) = -9 -1 –(-5) = 4 mod -12 = -8 -5 –(-9) = 4 mod -12 = -8 -9 –(-10) = 1 mod -12 -11
Step 3 Repeated <-10, -5>i -10 –(-5) = -5 <-1, -9>i -1 –(-9) = 8 mod -12 = -4
Step 4 LNF <e, 2, 3, 7>
RNF <-5, -1, -10, -9>
i<e, 7> 7—11 = -4 mod 12 = 8 i<e, 3> 3—11= -8 mod 12 = 4 i<e, 2> 2-11= -9 = 3
<-5, -9>i -5 –(-9) = 4 mod -12 = -8 ABS = 8 <-1, -9>i -1 –(-9) = 8 mod -12 = -4 ABS = 4 <-10, -9>i -10 –(-9) = -1 ABS = 1
Example 6. a) Mod -12 and mod 12 spaces, b) algorithmic steps for generating a right and left normal form
38
C. Scotto
Step 5 <-5, -1, -10, -9> RNF mod 12 → <7, e, 2, 3> RNF NF generated by original algorithm = <e, 2, 3, 7>
Example 6. (continued)
different tokens.13 Calculating the ordered interval i<–4, –1>, for example, by subtracting –4 from –1, [–1 –(–4)] produces a value of 3. Applying mod –12 produces a value of –9 normalizing the movement to the counterclockwise direction. The full computational version of the new algorithm examines the intervals of a pitch-class set from both the clockwise and counterclockwise perspectives to determine the smallest perimeter interval. It generates the additional counterclockwise cyclic permutations by applying mod –12 to the integers of a pitch-class set. The intervals of the counterclockwise cyclic permutations are calculated in the “Z–12” space. The clockwise cyclic permutation producing the smallest “Z12” perimeter interval is compared to the counterclockwise cyclic permutation producing the smallest “Z–12” perimeter interval, and the clockwise or counterclockwise cyclic permutation producing the smallest perimeter interval is the normal form. Clockwise and counterclockwise cyclic permutations are members of the left or LNF or right or RNF normal classes, respectively. To generate the counterclockwise cyclic permutations for pitch-class set Z, transform the pitch-classes {2,3,7,e} into their “Z–12” counterparts, {-10, -9, -5, -1} (see Examples 6a and b).14 Place the pitch classes in descending order from right to left starting with -1, and produce all the cyclic permutations. Calculate the first container interval for each permutation to find the smallest perimeter interval. Two orderings emerge in step 3: <–1, –10, –9, -5> and <-5, –1, –10, –9>. Step 2a removes the
13
14
John Fraleigh describes the isomorphism with regard to token exchange: “Suppose that a set has three elements. As before, we may as well let the set be {e, a, b}. For e to be an identity, a binary operation * on this set has to have a table [where] …each row and each column are to contain each element exactly once…so * does give a group structure on G = {e, a, b}. Now suppose that G’ is any other group of three elements and imagine a table for G’ with identity element appearing first. Since our filling out of the table for G = {e, a, b} could be done in only one way, we see that if we take the table for G’ and rename the identity e, the next element listed a, and the last element b, the resulting table for G’ must be the same as the one we had for G. As explained…this renaming gives an isomorphism of the group G’ with the group G” (Fraleigh 1999, 60). The pitch-class set is transformed by mod –12 rather than simply taking the inverse of each pitch-class (i.e., {-2, –3, –7, –11}), because transforming the pitch class by mod –12 preserves the spatial relationships on the circle of their mod 12 counterparts. Taking the inverse of each pitch class would flip the spatial relationships 180 degrees, which would yield results in the calculations for normal form equivalent those produced by calculating the LNF.
Normal Form, Successive Interval Arrays, Transformations and Set Classes
39
last pitch class from each permutation, and repeating step 3 calculates the new perimeter interval. Since the interval –4 is smaller than –5, the ordering <-5, –1, –10, – 9> produces the small perimeter interval. Step 4 determines whether the clockwise or counterclockwise cyclic permutation produces the smallest perimeter interval. Comparing positive and negative intervals from the spaces is not a problem, since they are indicators of direction not size. Nevertheless, taking the absolute value of the intervals facilitates comparisons. The first two intervals in both the clockwise and counterclockwise permutations tie with values of 8 and 4, respectively. The next ordered pitch-class intervals for the clockwise and counterclockwise permutations is 3,–1 with an absolute value of 1, respectively. The permutation <-5, –1, –10, –9> produces the smallest perimeter interval. Step 5 transforms the pitch classes back to their positive counterparts producing the RNF <7, e, 2, 3>. The complete computational algorithm presented in Figure 1 is an adaptation of Morris’s algorithm (Morris 1991, 40). Although it is computationally intensive, the shortcut version demonstrated earlier produces equivalent results with a minimum of computational overhead in about three easy steps. The complete algorithm is presented as the theoretical counterpart of the pragmatic shortcut in the interest of completeness. Spans represented by the formula SK1(X) are another name for perimeter intervals, and span calculation is the formal version of the pruning procedure. Calculating spans for RNFs requires another set of ordered pitch-class intervals calculated from right-to-left or moving counterclockwise around the circle. The combination of clockwise and counterclockwise intervals will be called bidirectional ordered pitchclass intervals: DEF
ic = b–a mod 12 and <-a, -b> icc = -a–(-b) mod –12 (for any two pitch classes a, b, the ic bidirectional ordered pitch-class interval between a and b in that order equals the number b–a (mod 12) or the bidirectional ordered pitchclass interval icc between -b and -a in that order equals the number -a–(-b) (mod –12))15
Some pitch-class sets produce both an LNF and an RNF, so conditions 1, 2, or 3 in step 12 of the algorithm determine the normal form in these cases. The steps in the procedure are illustrated by running pitch-class set {0,1,3} through the algorithm (see Example 7a). Calculating the counterclockwise intervals from the pitch-class set DL by applying mod —12 to the resulting intervals visually simplifies the illustration (see Example 7b). Lr has one member and Rr has no members in step 11, so LNF<0,1,3> is the normal form.
15
This definition is an expanded version of Rahn’s definition of ordered pitch-class interval (Rahn 1980, 25). The interval type icc allows the calculation of Z-12 intervals without having to first flip the pitch-classes into the “Z–12” space. Although it might appear as if these new intervals no longer have complements because ic and icc might appear as if they are complements of each other, this is not the case. The complement of ic is still ic, because these intervals belong to the Z12 group, and in that group they are still complements, since they are both taken mod 12.
40
C. Scotto
a) 1)
DL={0, 1, 3}, DR={0, -11, -9}
2)
DL=<0, 1, 3>, DR=<-11, -9, 0>
3)
k = 3–1 = 2
4-5) Lr ={ A0 <0,1,3> S2 (DL) = ic <0, 3> 3 – 0 =3 A1 <1,3,0> S2 (DL) = ic <1, 0> 0 – 1 = –1 mod 12 = 11 A3 <3,0,1> S2 (DL) = ic <3, 1> 1 – 3 =–2 mod 12 =10} Rr={ A3 <-11,-9,0> S2 (DR) = <-11, 0> icc -11 – 0 =-11 A4 <-9,0,-11> S2 (DR) = <-9, -11> icc -9 –(-11) = 2 mod -12 = -10 A5 <0,-11,-9> S2 (DR) = <0, -9> icc 0 –(-9) = 9 mod -12 = -3} 6) {A0=3, A1=11, A2=10, A3=11, A4=10, A5=3} 7) m=3 8) Delete A1, A2, A3, A4 from Lr and Rr. 9) k = 2-1 = 1 10) 1 > 0, go to step 5 52) Lr ={ A0 013 S1 (DL) = ic <0, 1> 1 – 0 =1 } Rr={ A5 <0,-11,-9> S1 (DR) = <3, 1> icc -11 –(-9) = -2} 6 2)
{A0 = 1, A5 = 2}
7 2)
m=1
8 2)
Delete A5
9 2)
k=1–1=0
Example 7. a) Illustration of steps in the algorithm, b) calculating the counterclockwise intervals from the pitch-class set DL
Normal Form, Successive Interval Arrays, Transformations and Set Classes
102)
41
If k = 0, go to step 11
11) If k = 0 and either Lr or Rr, but not both, has only one member, it is the NF, and it will be either a LNF or RNF: NF = LNF<0, 1, 3> b) 4-5) Rr={ A3 <-11,-9,0> S2 (DR) = <-11, 0> icc -11 – 0 =-11 A4 <-9,0,-11> S2 (DR) = <-9, -11> icc -9 –(-11) = 2 mod -12 = -10 A5 <0,-11,-9> S2 (DR) = <0, -9> icc 0 –(-9) = 9 mod -12 = -3} Rr={ A3 <1,3,0> S2 (DR) = <0,1>icc =1-0 =1 mod -12 = -11 A3 <3,0,1> S2 (DR) = <1,3>icc =3-1 =2 mod -12 = -10 A3 <0,1,3> S2 (DR) = <0,1>icc =0-3 = -3} Example 7. (continued)
Producing normal form INT1 equivalence classes that partition the domain of all possible pitch-class sets creating a partition isomorphic to Tn/TnI equivalence classes is only possible with the new algorithm, because it corrects the misaligned orderings that misidentify inversional relationships among pitch-class sets. Identifying a pitchclass set’s Tn/TnI type usually involves generating the Tn types for a pitch-class set and its inversion, and the representative in “most normal form” becomes the set’s Tn/TnI type (Rahn 1980, 81-2). The representative in most normal form has its intervals “most packed from the right,” which means the larger intervals are on the right proceeding to smaller intervals at the left.16 Interval distribution relative to size and direction distinguishes the representative in “most normal form.” The Tn representative that is not in “most normal form” has the larger intervals on the left proceeding to the smaller intervals on the right, which means it is “most packed from the left.” Interval distribution also identifies the class membership of a normal form orderings. The intervals in the INT1s of all LNFs are “most packed from the right,” which distinguishes a LNF from a RNF where the intervals in the INT1s are “most packed from the left.” For example, the normal forms of pitch-class sets {0,e,9} and {0,1,3} are RNF<9,e,0> and LNF<0,1,3>, generating INT1s <2-1> and <1-2>, respectively. Since the INT1 of the representative in “most normal form,” always has the smaller intervals on the left proceeding to larger intervals at the right, the INT1 of the Tn/TnI type representative is always an LNF. The new algorithm guarantees the INT1s of the normal
16
The original phrase that Rahn (1980, 38) uses is “most packed to the left.” It is a little unfortunate, since it often causes confusion with Forte’s normal form algorithm that breaks ties by choosing the cyclic permutation that has the smallest initial interval. Straus (2005) uses the phrase most packed from the right, which avoids confusion and is closer in meaning to the normal form criterion.
42
C. Scotto
forms for all the members of a set class will be identical to within retrogression, so the INT1 of an LNF can always be generated from an RNF simply by retrograding the INT1 of an RNF. For example, the normal form for the pitch-class set {0,t,4,7,e} is <4,7,t,e,0>, and the INT1 of the normal form ordering is <3-3-1-1>. Since the smaller intervals are on the right proceeding to the larger intervals on the left, the interval series is a member of the RNF class, and retrograding it produces the INT1 type <1-13-3>. Using the intervals of the INT1 type and starting with pitch-class 0 generate the remaining pitch classes of the set class representative. Pitch-class set {0,t,4,7,e} is a member of set-class 5-Z38[0,1,2,5,8]. Since the INT1 of an LNF always uniquely identifies the set class of any pitch-class set, and an LNF can always be generated from an RNF simply by retrograding the INT1 of an RNF, the INT1s of LNFs are equivalence classes that exhaustively partition all pitch-class sets into exclusive classes. Furthermore, INT1 types easily distinguish between Z-related pitch-class sets. The new algorithm generates the LNF<3,4,7,8,t> for pitch-class set {3,4,7,8,t}, so the INT1 <1-3-1-2> of the LNF is also an INT1 type. Comparing the INT1s of pitch-class sets {0,t,4,7,e} and {3,4,7,8,t} reveals they are neither transpositions nor inversions of each other nor are they members of the same set class, since their INT1s are not identical to within retrogression. The INT1 type <1-3-1-2> indicates pitch-class set {3,4,7,8,t} is a member of set class 5-Z18[0,1,4,5,7], which is the Z partner of pitchclass set {0,t,4,7,e}.17 Although interval vectors cannot reliably identify set class membership and distinguish between Z-related pitch-class sets, INT1 types can perform both functions. Therefore, intervals can partition the domain of all possible pitch-class sets into equivalence classes, if a specific specialized subset of intervals instead of the total intervallic content of pitch-class sets generates the equivalence classes. The interval succession criterion determines a pitch-class set’s normal form in the shortcut algorithm. The interval distribution relative to size and direction that distinguishes the LNF class from the RNF class is the basis for the criterion. The order of interval sizes in an INT1 of an LNF is from smaller intervals on the left proceeding to the larger intervals on the right, while the order of interval sizes is reversed for the INT1 of an RNF. When reading the interval series produced by a pitch-class set in 17
Rahn (1979-80) demonstrates that Lewin’s EMB(A, B) and his extension of embed, MEMBn(X, A, B), both distinguish between Z-related sets 483-498). The pitch-class sets generated by the INT1 types of a pair of Z-related sets are in a relationship determined by a special case of EMB(A, B): if #A = #B and /A/ /B/ then EMB(A, B) = EMB(B, A) = 0. In other words, if the cardinality of sets A and B are equal and if A and B are not members of the same set class, then pitch-class set A will not be embedded in B and pitch-class set B will not be embedded in A. This works whether pitch-class sets A and B are ordered or unordered. With ordered pitch-class sets taken from a pair of Z-related sets, a corollary about interval follows. If the ordered pitch-class set A is not embedded in B, and if the ordered pitch-class set B is not embedded in A, then the INT1s of either A or B cannot be reproduced in the other pitch-class set, even though the interval vectors of pitch-class sets A and B are identical. Reproducing the INT1 of pitch-class set A in B or reproducing the INT1 of B in A would be equivalent to embedding pitch-class set A in B or embedding pitch-class set B in A, or it would be equivalent to demonstrating the pitch-class sets are transformationally related.
Normal Form, Successive Interval Arrays, Transformations and Set Classes
43
ascending order in the direction opposite the normal form class (i.e., reading an interval series from right-to-left for an LNF), the potential perimeter intervals in the series become anti-perimeter intervals. The permutation with a series of decreasing interval sizes from largest to smallest produces a permutation with the smallest perimeter intervals, because a large anti-perimeter interval decreases the size of its complement, a perimeter interval. This is why the intervals in the INT1s of all LNFs are “most packed from the right,” and the intervals in the INT1s of all RNFs are “most packed from the left.” When more than one cyclic permutation produces the largest anti-perimeter interval, the remaining anti-perimeter intervals determine the normal form. This is why the other anti-perimeter intervals in the series must be ordered from largest to smallest. For example, the interval series <1-2-2-7> for the pitch class set {3, 4, 6, 8} contains two cyclic permutations producing the largest anti-perimeter interval: The cyclic permutation reading right-to-left for the LNF is <7-2-2-1>, and reading left-to-right for the RNF it is <7-1-2-2> (see Example 8).18 The next anti-perimeter interval for the
Example 8. Left and right cyclic permutations 18
Reading the interval series from left-to-right for the RNF is equivalent to reading the pitch classes that produced the series backwards. Reading the pitch classes in the reverse order produces the mod 12 complements of the intervals in the series, <5, 1, 2, 2>. However, the intervals of the RNF space are taken mod -12, so the intervals in the series become <-7, -1, 2, -2>, and the shortcut algorithm is simply taking the absolute values of the intervals in the series.
44
C. Scotto
RNF is 1, while the next anti-perimeter interval for the LNF is 2. Since interval 2 is larger than interval 1, pitch-class 6 will be closer to pitch-class 3 than pitch-class 4 is to pitch-class 8.19 Therefore, only the interval series <7, 2, 2, 1> meets the interval succession criterion, and the normal form is LNF<3, 4, 6, 8>. The interval succession criterion makes generating the normal form of a pitch-class easier and more efficient. For example, pitch-class set {1, 0, t, 7, 4} is one of the problematic pitch-class sets in Table 1 (see Example 9). The interval series generated by the initial steps of the shortcut contains three 3s, which means the pitch-class set has six cyclic permutations, three from the right and three from the left, capable of producing the smallest perimeter interval. However, the interval succession criterion necessitates only examining the permutation reading right-to-left beginning with interval three between pitch-class adjacency <7,t> and the permutation reading left-toright beginning with interval three between pitch-class adjacency <1,4>. Of the two remaining permutations, only RNF<4, 7, t, 0, 1> whose INT1 is <3-3-2-1> meets the interval succession criterion. The set class is always generated from an LNF, so retrograding the INT1 <3-3-2-1> produces the INT1 type indicating pitch-class set {1, 0, t, 7, 4}, is a member of set class 5-31[0, 1, 3, 6, 9]. A = {1, 0, t, 7, 4} 1) <0, 1, 4, 7, t> 2) 0 1 1
4 3
7 3
t 3
(0)
2
3) 1←33321 t→33312 12) RNF<4, 7, t, 0, 1> INT<3, 3, 2, 1> 13) Set class: a)
Retrograde RNF INT <1, 2, 3, 3>
b) Begin with pitch-class 0 5-31[0, 1, 3, 6, 9] Example 9. Shortcut algorithm for normal form 19
Another way to verify the smallest perimeter interval is to add the two intervals between pitch-classes 3 and 6 and compare the value to the sum of adding the two intervals between pitch-classes 8 and 4. Adding the intervals for pitch-classes 3 and 6 produces a value 3, while adding the intervals for pitch-classes 8 and 4 produces a value of 4. The interval 3 is smaller than 4, so the LNF produces the smallest perimeter interval, and it is the normal form for the pitch-class set.
Normal Form, Successive Interval Arrays, Transformations and Set Classes
45
The full algorithm demonstrates how LNFs and RNFs reveal hidden symmetries, when the INT1 of the normal form is not R-invariant (see Example 10). For pitchclass set {0, 1, 2, 7}, the fourth pass sets k to 0 and m to 1 leaving Lr and Rr each with one member. Since all perimeter intervals are identical and the orderings of the LNF and RNF are rotations of each other, the pitch-class produces both an LNF and an RNF. In these cases condition 2 applies: the normal form is determined by context, or the LNF is chosen by convention. The INT1s generated by equivalent LNFs and RNFs are retrograde related indicating the pitch-class set is symmetrical, and it will map into itself under inversion. The set Lr and Rr will also illustrate transpositional symmetry. The multiplicity of identical normal form types, LNFs or RNFs, represents the number of operations that map the pitch-class set into itself under Tn, so the cardinality of Lr or Rr equals the degree of transpositional symmetry. The number of LNF/RNF pairs represents the number of operations that map the pitch-class set into itself under TnI and equals the degree of inversional symmetry. The sum of both numbers equals the degree of symmetry. For example, when k=0 for the pitch-class set {0, 1, 2, 7} the cardinality of Lr is one and the number of LNF/RNF pairs is also one, so the degree of symmetry is 2. Since the pitch-class set {0,1,2,7} generates two normal form orderings, LNF<0,1,2,7> and RNF<7,0,1,2> that are rotations of each other, it is also a member of two Tn types. The LNF<0,1,2,7> is a member of Tn type (0,1,2,7)Tn, while the RNF<7,0,1,2> is a member of new Tn type (0,5,6,7)Tn. The addition of another Tn type for this pitch-class set better reflects its symmetrical structure. 1)
LNF
A0 0127 ic<0, 7> 7 – 0 = 7 A1 1270 ic<1, 0> 0 – 1 = –1 = 11 A2 2701 ic<2, 1> 1 – 2 = –1 = 11 A3 7012 ic<7, 2> 2 – 7 = –5 = 7
RNF A4 0127 icc<7, 0> 0 – 7 = –7 A5 7012 icc<2, 7> 7 – 2 = 5 = –7 A6 2701 icc<1, 2> 2 – 1 = 1 = –11 A7 0127 icc<0, 1> 1 – 0 = 1 –11
2) A0 0127 ic<0, 2> 2 – 0 = 2 A3 7012 ic<7, 1> 1 – 7 = –6 = 6
A4 0127 icc<7, 1> 1 – 7 = –6 A5 7012 icc<2, 0> 0 – 2 = –2
3) A0 0127 ic<0, 1> 1 – 0 = 1
A5 7012 icc<2, 1> 1 – 2 = –1
LNF<0, 1, 2, 7> INT<1, 1, 5>
RNF<7, 0, 1, 2> INT<5, 1, 1>
Example 10. LNFs and RNFs reveal hidden symmetries
46
C. Scotto
Fortunately, the shortcut algorithm quickly and easily reveals hidden symmetries and a pitch-class set’s degree of symmetry (see Example 11). The interval series <4-1-4-3> generated by pitch-class set {4,5,9,0} contains two fours. The interval succession criterion necessitates only examining the permutation reading right-to-left beginning with interval four between pitch-class adjacency <0,4> and the permutation reading left-to-right beginning with interval four between pitch-class adjacency <5,9>. All perimeter intervals in the LNF and RNF are identical and the orderings are rotations of each other, so the INT1s generated by the equivalent LNF and RNF will be retrograde related indicating the pitch-class set will map into itself under inversion. The LNF<4,5,9,0> is a member of Tn type (0,1,5,8)Tn, while the RNF<9,0,4,5> is a member of the new Tn Type (0,3,7,8)Tn. The degree of symmetry for the pitch-class set is 2, since the algorithm only produces one LNF/RNF pair, and the multiplicity of LNFs is also one. In general, the number of identical normal form permutations generated by a pitch-class set equals its degree of symmetry.20 The normal form function, NFTR[], has at its foundation four possible INT1s relationships and transformational definitions (see Example 12). Since the INT1s of the normal forms generated by the new algorithm always correctly indicate the presence or absence of a transformational relationship in comparative analysis, the function uniquely associates a pair of INT1s with a transformational type. The elements of the domain are sets containing the INT1s generated by a pair of pitchclass sets in normal form. The codomain is the set containing the values 1, 2, 3, or 4, and the range is sets whose elements are a pair of INT1s and a value. The values indicate the transformational relationship: 1 = the pitch-class sets generating INT1<X> and INT1 from normal form orderings are Tn-related; 2 = the pitch-class sets generating INT1<X> and INT1 from normal form orderings are TnI-related; 20
The presence or absence of the cyclic interval creates some interesting disparities between INT1 and CINT1 systems. In the INT1 system, most set classes with a degree of symmetry of 2 or higher will produce an INT1that is R-invariant. Some set classes, however, need to repeat a pitch-class in order to reveal they are capable producing an R-invariant INT1, when the unmodified algorithm generated their normal form. These pitch-class sets generate two equivalent normal forms using the modified algorithm, and the INT1s of the normal forms reflect the pitch-class set’s symmetrical structure. The inclusion of the cyclic interval in the CINT1 system essentially reverses the situation. Pitch-class sets capable of generating an Rinvariant CINT1 will be the pitch-class sets that generate two equivalent normal forms. In the INT1 system a pitch-class set with a degree of symmetry of 2 producing an R-invariant INT1 will no longer produce an R-invariant CINT1 due to the inclusion of the cyclic interval. For example, the pitch-class set {0,1,6,7} in normal form is <0,1,6,7> producing the INT1 1-5-1, while placing the same pitch-class set in ascending order and including the cyclic interval produces the pitch-class set <0,1,6,7,0> generating a CINT1 of 1-5-1-5, which is not Rinvariant. Although the CINT1 1-5-1-5 is not R-invariant, there is a cyclic permutation of the CINT1 that is related by retrogression to the original order of the CINT1, 5-1-5-1. In the shortcut version of the modified algorithm, the interval series produces four equivalent permutations indicating the degree of symmetry of pitch-class set {0,1,6,7} is 4. The smallest initial pitch-class condition eliminates two of the orderings, and the remaining ordering fall under condition 1: if there is one LNF and RNF, and if the RNF is the LNF read backwards, the NF is the LNF, by convention.
Normal Form, Successive Interval Arrays, Transformations and Set Classes
47
A = {4, 5, 9, 0} 1) <0, 4, 5, 9> 2) (0,1,5,8)Tn/(0,3,7,8)Tn 0 4 5 9 (0) 4 1 4 3
3)
44341 54341
4) LNF<4, 5, 9, 0> INT<1, 4, 3> Condition 2 from step 12
RNF<9, 0, 4, 5> INT<3, 4, 1>
5) Set class—4-20[0, 1, 5, 8]
4 5 9 0—LNF RNF—9 0 4 5 3 4 1 4 3 Example 11. Revealing hidden symmetries with the shortcut algorithm
3 = the unordered pitch-class sets generating INT1<X> and INT1from normal form orderings are Tn and/or TnI related; and 4 = the pitch-class sets generating INT1<X> and INT1 from normal form orderings are not Tn or TnI –related. Comparing Sets/ Normal Form and Normal Form Comparisons The INT1the two sets in normal form can be in 1 of 4 relationships. 1) The normal forms of pitch-class sets A and B generate INT1s whose intervals are in the same order: INT1(NF(A))=<1-2-3>, INT1(NF(B))=<1-2-3>. 2) The normal forms of pitch-class sets A and B generate INT1s whose intervals are retrograde related: INT1(NF(A))=<1-2-3>, INT1(NF(B))=<3-2-1>. 3) The normal forms of pitch-class sets A and B generate INT1s whose intervals are in the same order and retrograde related: INT1(NF(A))=<1-5-1>, INT1(NF(B))=<1-5-1>.
48
C. Scotto
4) The normal forms of pitch-class sets A and B generate INT1s whose intervallic contents are not identical or the intervallic content is identical but the order of the intervals is neither the same nor retrograde related: INT1(NF(A))=<1-3-2>, INT1(NF(B))=<1-4-2> or INT1(NF(A))=<1-2-3>, INT1(NF(B))=<3-1-2>. Definitions and Transformational Relationships Following from the Five Normal Form INT1 Relationships: 1)
2)
3)
4)
Relationship 1: Definition 1—unordered pitch-class sets A and B are related by the operation of transposition if and only if the INT1 generated by pitch-class set A in normal form contains the same ordered pitch-class intervals in the same order as the INT1 generated by pitchclass set B in normal form. Relationship 2: Definition 2—unordered pitch-class sets A and B are related by the operation of inversion if and only if the INT1 generated by pitch-class set A in normal form contains the same ordered pitchclass intervals in the reverse order of the INT1 generated by the pitchclass set B in normal form. Relationship 3: Definition 3—unordered pitch-class sets A and B are related by the operation of transposition and inversion if and only if the INT1 generated by pitch-class set A in normal form is R-invariant with the INT1 generated by pitch-class set B in normal form. (a) Definition 3a—unordered pitch-class set A will map into itself under inversion if and only if the INT1 generated by pitch-class set A in normal form is R-invariant. Relationship 4: Definition 4—unordered pitch-class sets A and B are not related by the operation of inversion or transposition if and only if the INT1 generated by pitch-class set A in normal form does not contain the same ordered pitch-class intervals as the INT1 generated by the pitch-class set B in normal form or the INT1 generated by pitchclass set A in normal form contains the same ordered pitch-class intervals as the INT1 generated by the pitch-class set B in normal form but the order of the intervals is neither the same or reversed. Example 12. INT relationships and the normal form function
In this paper, I have demonstrated that creating a normal form function produces a consistent system for determining transformational relationships. Simply placing a pitch-class set in normal form is sufficient to determine its transformational type, the specific transformation relating it to any other pitch-class set, its degree of symmetry, and its set class membership. Rahn writes “the ability to take a set and quickly, almost automatically, list it in normal form is absolutely crucial to all subsequent use of nontonal theory” (Rahn 1980, 31). It is my hope that there are now many more reasons to work towards this goal.
Normal Form, Successive Interval Arrays, Transformations and Set Classes
49
References Chrisman, R.: Identification and Correlation of Pitch-Sets. Journal of Music Theory 15(1-2), 58–83 (1971) Forte, A.: The Structure of Atonal Music. Yale University Press, New Haven (1973) Fraleigh, J.: A First Course in Abstract Algebra. Addison-Wesley Publishing Company, Inc., Reading (1999) Morris, R.: Composition with Pitch-Classes. Yale University Press, New Haven (1987) Morris, R.: Class Notes for Atonal Music Theory. Frog Peak Music, Lebanon (1991) Rahn, J.: Basic Atonal Theory. Longman Inc., New York (1980) Rahn, J.: Relating Sets. Perspectives of New Music 18(1-2), 483–498 (1979-1980) Straus, J.N.: Introduction to Post-Tonal Theory. Pearson Prentice Hall, Upper Saddle River (2005)
Appendix Rahn/Morris/Scotto Normal Form Algorithm
Definition: span (sub-k) of ordered set X: Sk1 (X) = xk – x0 mod 12 where k ≤ #x – 1 Sk2 (X) = xk – x0 mod –12 where k ≤ #x – 1 Examples Sk1 X = <0, 1, 3>; S2 (X) = x2 – x0 = 3 – 0 = 3 S1 (X) = x1 – x0 = 1 – 0 = 1
Sk2 X = <-11, -9, 0>>; S2 (X) = x2 – x0 = -11 – 0 = –11 S1 (X) = x1 – x0 = -9 – 0 = –9
50
C. Scotto
1)
DL is a pitch-class set in mod 12 space; DR is a pitch-class set in mod -12 space.
2)
Write DL as an ordered set by placing the pitch classes in ascending numerical order. Call this L. Write DR as an ordered set by placing the pitch classes in descending numerical order. Call this R.
3)
Let k = #D – 1
4)
Construct the set Lr consisting of all rotations of L. Construct the set Rr consisting of all rotations of R.
5)
Find all the values Sk for each member of Lr and Rr.
6)
Take the absolute value of each Sk
7)
Find the smallest values of Sk from the members of Lr and Rr. Call it m.
8)
Delete all members of Lr and Rr with Sk greater than m.
9)
k=k–1
10)
If k = 0, go to step 11; if k >0, go to step 5
11)
12)
a)
If k = 0 and either Lr or Rr, but not both, has only one member, it is the NF, and it will be either a LNF or RNF.
b)
If k = 0 and Lr and Rr each have one members, go to step 12.
Apply conditions 1, 2, or 3. Condition 1: if there is LNF and RNF, and if the RNF is the LNF read backwards, the NF is the LNF, by convention. Condition 2: If there is one LNF and one RNF, and if the RNF is a rotation of the LNF, the choice of NF is context dependent. Condition 3: If all the members of Lr and Rr produce the smallest perimeter interval, choose the LNF that begins on the smallest pitch-class integer by convention. Fig. 1. Rahn/Morris/Scotto Normal Form Algorithm
Normal Form, Successive Interval Arrays, Transformations and Set Classes
51
Table 1. Set Classes where the INTs of a Pitch-class Set and its Inversion in NF are not Retrogrades of Each Other
4-19[0, 1, 4, 8] 5-13[0, 1, 2, 4, 8] 5-31[0, 1, 3, 6, 9] 5-32[0, 1, 4, 6, 9] 6-Z46[0, 1, 2, 4, 6, 9] 6-Z47[0, 1, 2, 4, 7, 9] 6-Z-44[0, 1, 2, 5, 6, 9] 6-27[0, 1, 3, 4, 6, 9] 7-10[0, 1, 2, 3, 4, 6, 9] 7-16[0, 1, 2, 3, 5, 6, 9] 7-29[0, 1, 2, 3, 6, 7, 9] 7-21[0, 1, 2, 4, 5, 8, 9] 7-22[0, 1, 2, 5, 6, 8, 9] 9-7[0, 1, 2, 3, 4, 5, 7, 8, t] 9-8[0, 1, 2, 3, 4, 6, 7, 8, t] 9-10[0, 1, 2, 3, 4, 6, 7, 9, t] 9-11[0, 1, 2, 3, 5, 6, 7, 9, t]
A Model of Musical Motifs Torsten Anders Interdisciplinary Centre for Computer Music Research University of Plymouth
Abstract. This paper presents a model of musical motifs for composition. It defines the relation between a motif’s music representation, its distinctive features, and how these features may be varied. Motifs can also depend on non-motivic musical conditions (e.g., harmonic, melodic, or rhythmic rules). The model was implemented as a constraint satisfaction problem.
1
Introduction
Compositional aspects such as harmony and counterpoint have often been formalised and implemented successfully. For example, Pachet and Roy (2001) provide a survey of constrained-based harmonisation systems. A key aspect of such systems is the introduction of formal models of established musical concepts such as note pitches, pitch classes, scale degrees, chord roots and so forth. At the end of their survey, Pachet and Roy (2001) point out: “However, what remains unsolved is the problem of producing musically nice or interesting melodies.” I believe, in order to formalise melody composition we need to model important melodic concepts such motifs and their relations. A crucial aspect of the motif concept is the diversity of possible motifs and their variations. The motif definition of the New Grove clearly points out this diversity. A short musical idea, melodic, harmonic, rhythmic, or any combination of these three. A motif may be of any size, and is most commonly regarded as the shortest subdivision of a theme or phrase that still maintains its identity as an idea. [Drabkin] Motifs have been modelled for music analysis. For example, Buteau and Mazzola (2000) model the similarity of motifs, including motifs of different lengths. However, a motif model for composition is missing (to my knowledge). L¨ othe (1999) proposes a system creating minuet melodies over a given harmonic progression. The author discusses the importance of motif variations, but does not present a formalisation. The constraint-based composition system OMRC (Sandred, 2003) and its successor PWMC1 support the composition of pieces from pre-composed motifs. These systems allow the user to apply further constraints on the music (e.g., rhythmic and harmonic rules). However, motif variations are severely restricted (only pitch transpositions are permitted). 1
Personal communication, PRISMA meeting, January 2007 in Montb´eliard, France.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 52–58, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Model of Musical Motifs
53
This research presents a model of musical motifs for composition. The model expresses the relation between a motif’s music representation, its identity (often notated a vs. b, cf. (Schoenberg, 1967)), and how it is varied (a1 vs. a2 ). Various musical aspects (e.g., the rhythm, melody, or harmony) can define the identity of a motif. The model distinguishes between a motif as an abstract concept and instances of this motif (actual occurances in the music). The identity of a motif (the abstract concept) is described formally by a set of features. Motif instances can vary these features in many ways, while retaining the motif’s identity. In general, the term ‘motif variation’ is often used to indicate how a given prototypical instance of a motif is transformed in order to obtain another instance. On the contrary, the present model formalises motif variations by defining the relation between a symbolic description of the motif (the abstract concept) and its instances. The user defines which relations are regarded as variations, and which are not (compare changing the melodic contour with a mere transposition). The model is implemented as part of the constraint-based composition system Strasheela (Anders, 2007).2 Users define a set of motifs (by features characterising their identity), and a set of variations on these motifs. Rules on motific identity and variation can be applied. For example, a rule may constrain that a certain phrase consists of variations of the same motif, where the motif’s identity is unknown in the definition. Additionally, users can constrain other aspects of the music. For example, harmonic, rhythmic, and formal rules are defined independently of the motif definition, but directly affect the motifs in the solution. For efficiency, Strasheela uses state-of-the-art constraint programming techniques: a constraint model based on the notion of computational spaces (Schulte, 2002) makes search strategies programmable. Paper Outline The rest of the paper is organised as follows. The motif model formalism is explained in Sec. 2. Section 3 demonstrates the model with two motifs from Beethoven’s 5th symphony. The text concludes with a discussion (Sec. 4).
2
The Formal Model
The proposed motif model is stated as a constraint satisfaction problem (CSP). A CSP closely resembles a mathematical specification. A CSP imposes constraints (relations) between variables (unknowns), where each variable has a domain (a set of possible values). However, a CSP is also executable: modern constraint solvers efficiently find solutions for a CSP (i.e., determine each variable to a value of its domain which is consistent with all its constraints). In this model, a motif is a tuple of the three variables representation, description, and variation (Fig. 1). The following paragraphs outline how single domain values of these variables are constructed. Formally, this text notates variables by a disjunction (∨) of its domain values. 2
Strasheela is available for download at http://strasheela.sourceforge.net/
54
T. Anders motif ::= representation , description , variation representation ::=
some hierarchic music representation ∨...
description ::=
feature 1 : variable list 1 , feature 2 : variable list 2 , . . . ∨...
variation ::=
motif → (0 ∨ 1) ∨...
makeVariation ::= feature 1 : f1 : motif → variable list 1 , feature 2 : f2 : motif → variable list 2 , . . . → variation domain value (a function) Fig. 1. A motif consists of its music representation, a symbolic description, and a variation function
The variable representation basically stores the information recorded in a music notation of an instance of the motif. For example, representation expresses the temporal organisation of notes in the motif and their pitches. Its domain is the set of all motif representation candidates. In an efficient implementation of the model, the representation is not a variable itself but it contains variables (e.g., all note pitches and durations in the representation may be variables). The model abstracts away from the actual music representation format: this information can be encoded in any hierarchic representation format which supports variables and an interface for accessing score information (e.g., a variant of CHARM (Harris et al., 1991), or Smoke (Pope, 1992) supporting variables). The model was implemented using the Strasheela music representation (Anders, 2007). The variable description symbolically states distinctive motif features. Each domain value of this variable describes the features of a motif (an abstract concept, see above) with its own identity (e.g., one domain value describes motif a and another motif b). Because we have no agreed feature set which distinguishes the identity of a motif (cf. the Grove motif definition above), description can contain any information (e.g., the motif’s note durations and its melodic intervals). description can have an arbitrary format, but a consistent format of its domain values simplifies the CSP definition. The following format combines flexibility with convenience: description is a tuple of feature-value pairs (Fig. 1). A feature is a descriptive label (e.g., durations) and its value is a list of (often determined) variables (e.g., the note durations for motif a). The variable variation denotes a specific motif variation. The variation domain consists of functions which map a motif to a Boolean variable (Fig. 1). These functions formalise how motif instances vary the description of a motif (e.g., whether the note pitches defined in the description are followed literally by
A Model of Musical Motifs
55
the representation or in reverse order). When the constraint solver decides for a variation domain value (a function), then and only then this function returns 1 (i.e., true) and it constrains the relation between the motif’s representation and its description. This approach is highly generic because arbitrary constraints can be applied by the variation functions. However, these functions can be complex to define. In a still flexible but more convenient approach, variation functions are created by the function makeVariation . makeVariation expects a tuple of feature-value pairs, where the features correspond to the features of the description, and their values are functions mapping a motif instance to a list of variables (e.g., a function returning the note durations of a motif). Please note that makeVariation unifies this list with the corresponding list in the selected motif description. For example, a model instance may constrain the note durations in the motif’s representation to be equal to the durations in the description. This affects which domain values are selected for these variables.3 Figure 2 summarises the relations between all variables of the model. myMotif is any motif instance in the score (a subsection or a whole piece). The model’s essence is highlighted in bold font.4 For brevity, the definition of makeVariation is omitted. ∀ myMotif ∈ score : ∃ representation , description , variation : representation = representation 1 , . . . , representation n ∧ description = description 1 , . . . , description n ∧ variation = variation 1 , . . . , variation n ∧ myMotif = representation, description, variation ∧1= map(getInitialDomain(variation), f : f (v ) := v (myMotif )) ∧ variation(myMotif ) = 1 Fig. 2. Relations between the motif model variables (essence in bold font)
3
An Example
This section models well-known motifs from the first movement of Beethoven’s Fifth Symphony as an example. Figure 3 classifies some motif instances according to motif identity and variation. The presented classification allows for 3
4
In the implementation, description and variation are encoded by finite domain integers. They point as indices in the respective domains. Selection constraints (Duchier et al., 1998) care for efficient constraint propagation. The function map applies the given function f to every element of the variation’s domain and returns the collected results.
56
T. Anders
considerable mutability of the first variation of motif a. Note that other classifications can be expressed with this model as well. A set of motifs and their classification is modelled by defining domains for the three variables representation, description, and variation. The set of solutions for a single motif instance includes all motifs shown in Fig. 3 – among similar motifs. However, additional rules can further restrict the music (e.g., rhythmic, harmonic, and contrapuntal rules), and many motif instances can be part of a CSP.
Fig. 3. Motifs from Beethoven’s Symphony No. 5 (one possible classification)
The representation domain consists of note sequences, where each note in a sequence has parameter values for its duration and pitch (Fig. 4).5 As these parameters can have any value, all shown Beethoven motifs are members of this domain. Please note that instances of motif a and b differ in length: the motif length is not fixed in representation.6 Similarily, different variations of the same motif identity can differ in length (e.g., a motif can be reduced or embellished). Because a motif’s variation defines the relation between its description and representation, a variation can reduce the representation to, say, the highest or longest note of the description (this is of course an extreme case). The description domain characterises rhythmic and melodic features which distinguish the two Beethoven motifs a and b. Please note that the feature sets differ between motifs: description a specifies the pitchContour (the sequence of pitch interval directions), whereas description b specifies scaleDegreeIntervals (the sequence of distances between note pitches measured in scale degrees). Also, note that description a makes use of variables (e.g., the last note duration is not fixed). 5 6
The pause is not modelled for simplicity. It can be addressed by a note offset parameter (Anders, 2007). The implementation encodes all motif instances with the same – maximum – length internally. Notes are marked as ‘non-existing’ by setting their duration to 0 (Anders, 2007).
A Model of Musical Motifs
representation :=
57
sequence 1 with notes of specific duration and pitch, sequence 2 with notes of specific duration and pitch, ...
description a := durations : (, , , (
pitchContour : (→, (
, . . . , )) →, ), )
description b := durations : (♩, ♩, ♩, ♩, ♩, ♩, ♩, ♩), scaleDegreeIntervals : (3, −1, 1, 1, −3, 0, −1) description := descriptiona , descriptionb , . . . variation 1 := durations : getNoteDurations , pitchContour : getPitchContour , scaleDegreeIntervals : getScaleDegreeIntervals variation 2 := durations : getNoteDurations , pitchContour : f : f (myMotif ) := getDescription (myMotif ) = description a variation :=
∧ inverse(getPitchContour (myMotif )) makeVariation(variation 1 ), makeVariation(variation 2 ), . . .
Fig. 4. Definition of the three variables representation, description, and variation which model the Beethoven motifs (results in classification of Fig. 3)
Finally, the functions in the variation domain constrain the relation between the representation and the description of a motif instance. The functions getNoteDurations, getPitchContour, and getScaleDegreeIntervals access the motif’s representation. For example, getNoteDurations can be implemented as shown in (1), where getNotes returns the notes in the motif’s representation, and getDuration returns the duration of a note. Please remember that makeVariation unifies the variable list returned by these functions with the corresponding variable list in the description. description values can differ in their set of features (see above): variations only constrain those motif aspects specified by the description of a motif (e.g., variation 1 does not constrain the pitch contour in case the motif’s description is motif b ). variation 2 inverses the pitch contour of a motif (cf. Fig. 3), but variation 2 is only permitted for motif a. getNoteDurations(myMotif ) := map(getNotes(myMotif ), getDuration)
4
(1)
Discussion
This paper presented a motif model as a CSP which specifies the relation between the motif’s music representation, a description of distinctive motif features, and
58
T. Anders
motif variation definitions. The model was designed for computer-aided composition, but it can also be used as an executable representation of a motivic analysis. This research does not propose a new concept of motivic similarity, but allows for the application of various similarity models (e.g., the pitch contour). The model does not express a degree or genealogy of variations. However, it supports various additional cases. Non-motivic sections can be modelled by a variation function which does not apply any constraint at all.7 Contrapuntal motif combinations (e.g., a fugue subject) can be search for by constraining multiple motif instances to the same description, but leaving feature values in the description itself undetermined in the definition. Overlapping motifs are possible if the music representation supports such nesting. Finally, higher-level formal relations can be expressed by nesting ‘motif’ instances (e.g., a theme may contain a motif sequence, and is specified by the theme’s description and constrained by its variation).
References Anders, T.: Composing Music by Composing Rules: Design and Usage of a Generic Music Constraint System. Ph. D. thesis, School of Music & Sonic Arts, Queen’s University Belfast (2007) Buteau, C., Mazzola, G.: From Contour Similarity to Motivic Topologies. Musicae Scientiae 4(2), 125–149 (2000) Drabkin, W.M., Macy, L.(ed.): Grove Music Online. Oxford Music Online, http://www.oxfordmusiconline.com/subscriber/article/grove/music/19221 (accessed September 24, 2008) Duchier, D., Gardent, C., Niehren, J.: Concurrent Constraint Programming in Oz for Natural Language Processing. Programming Systems Lab, Universit¨ at des Saarlandes, Germany (1998) Harris, M., Smaill, A., Wiggins, G.: Representing Music Symbolically. In: IX Colloquio di Informatica Musicale, Genoa, Italy (1991) L¨ othe, M.: Knowledge Based Automatic Composition and Variation of Melodies for Minuets in Early Classical Style. In: Burgard, W., Christaller, T., Cremers, A.B. (eds.) KI 1999. LNCS, vol. 1701, pp. 159–170. Springer, Heidelberg (1999) Pachet, F., Roy, P.: Musical Harmonization with Constraints: A Survey. Constraints Journal 6(1), 7–19 (2001) Pope, S.T.: The Smoke Music Representation, Description Language, and Interchange Format. In: Proceedings of the International Computer Music Conference, San Jose (1992) ¨ Searching for a Rhythmical Language. In: PRISMA 01. EuresisEdizioni, Sandred, O.: Milano (2003) Schoenberg, A.: Fundamentals of Musical Composition. Faber and Faber, London (1967) Schulte, C.: Programming Constraint Services. Springer, Heidelberg (2002)
7
To eliminate symmetries (i.e., different solutions which are equivalent), this nonmotivic variation should determine the motif description to some domain value.
Melodic Clustering within Motivic Spaces: Visualization in OpenMusic and Application to Schumann’s Tr¨aumerei Chantal Buteau and John Vipperman Brock University [email protected], [email protected]
Abstract. Based on the concepts of motive contour, gestalt and motive similarity, our model of motivic structure yields topological motivic spaces of a composition in which open neighborhoods correspond to groupings of similar motives. In Buteau 2006 we presented a model extension of an earlier approach in order to integrate the concept of melodic clustering in motivic spaces, demonstrated an application to the soprano voice of Schumann’s Tr¨aumerei, and provided a comparison with a human-made segmentation (clustering) analysis (Repp 1992) and a machine learning approach (Cambouropoulos and Widmer 2000). In this short paper, we present our novel dynamic visualization of melodic clustering in OpenMusic software, and extend our initial analysis of Tr¨aumerei to multi-voice clustering.
1 Introduction As shown in recent works (such as Cahill and Maid´ın 2005 and Cambouropoulos and Tsougras 2004), computer-aided analysis and content-based music retrieval are promising research domains that contribute to the development of a better understanding of the concept of melodic similarity. In computer-aided analysis, any reasonable model of a germinal motif, i.e. those short melodies having a germinal function such as the opening motif in Beethoven’s Fifth symphony, necessitates the inclusion of melodies of different lengths into the method. Our topological approach (Buteau 2003; Mazzola 2002) to the modeling of motivic structure includes the concept of contour similarity for different lengths. It is an immanent approach that formalizes Rudolph R´eti’s (1951) method in which melodic segments are compared with one another in order to determine which melodic segments are germinal motives. Melodic clustering, that is an organization of melodic segments into ‘significant’ categories, is another important analytical structure offering insight into melodic similarity. Our approach builds on work introduced in Buteau 2006, wherein we presented a model extension to motivic spaces that includes the concept of melodic clustering, its application to the soprano voice of Schumann’s Tr¨aumerei, the seventh piece of Kinderszenen, op.15, and a comparison with human-made clustering analysis (Repp 1992) and machine learning approach (Cambouropoulos and Widmer 2000). The results were very close to these reference clusterings. In this paper, we briefly present our novel
We would like to express our gratitude to Carlos Agon (IRCAM, Paris) for his continuous support in the design and implementation of OM-Melos Clustering Tool in OpenMusic.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 59–66, 2009. c Springer-Verlag Berlin Heidelberg 2009
60
C. Buteau and J. Vipperman
dynamic visualization of melodic clustering in OpenMusic software, called OM-Melos Clustering Tool, and extend our initial analysis of Tr¨aumerei to multi-voice clustering1 . Like our results on the soprano voice, our clustering analysis on the ’primary’ motives (Repp 1992) of Tr¨aumerei, appearing in all 4 voices, yield a melodic segmentation very close to the human-made clustering reference. The results on the complete four-voice segmentation are not all as close to the reference as the soprano only results, but there are significant similarities, and overall the complete segmentations are simply coarser. As a consequence, the resulting melodic clusterings within motivic spaces overall strongly contribute to the validation of our topological model of motivic structure.
2 Topological Model of Motivic Structure This section will briefly review motivic spaces; for details and examples, see Buteau 2001; Buteau 2003; Mazzola 2002. Tones are parameterized by at least onset and pitch values. Motives M are non-empty finite sets of tones: M = {m1 , ..., mn } such that all onset values in M are different. We set card(M ) = n. Given a music composition S we consider a (finite) collection of motives in S that we denote M OT (S). We impose that M OT (S) satisfies the Submotif Existence Axiom (SEA), that is every sub-motives of a motif in M OT (S), down to a minimal cardinality, is also in M OT (S). The shape of a motif M is the image of M by a set mapping2 t : M OT (S) → Γt ; for example, Com(M ) = the COM matrix of M , Rg(M ) = projection of M on the onsetpitch plane, and Dia(M ) = vector of consecutive pitch intervals. These 3 examples are respectively called COM-matrix, Rigid, and Diastematic types. We consider a group P action on Γt induced by a group action on M OT (S), e.g., the affine counterpoint paradigmatic group P = CP or the group P = T r of transpositions −1 and translations in time. We introduce the gestalt of a motif M as GesP (P · t (M ) := t t(M )). We consider pseudo-metrics dn for shapes of cardinality n that we retract to motives: the distance between motives M and N with same cardinality n is dt (M, N ) := dn (t(M ), t(N )), and their gestalt distance is gdP t (M, N ) := inf p,q∈P dn (p · t(M ), q · t(N )). For example, the Euclidean distance or relative Euclidean distance REdt (Buteau 2003) for t = Com, Rg, and Dia. If P is a group of isometries, then gdP t is also a pseudo-metric. Given a > 0, we introduce the -neighborhood of a motif M ∗ as Vt,d,P (M ) := {N ∈ M OT (S)|N ∗ ⊂ N s.t. gdP t (N , M ) < }, or simply denoted V (M ). If our setup (defined by t, P , and d) fulfills the inheritance property (Buteau 2001; Mazzola 2002), corresponding to impose that similar motives have similar associated sub-motives, then these neighborhoods form a basis for a topology Tt,P,d on3 M OT (S) (Buteau 2001; Mazzola 2002). The topological space is called motivic space of S. The topology Tt,P,d being only of type T0 (not Hausdorff) with no intuitive 1 2
3
Note that our method does not restrict to monophonic music. The exact construction of the model is on the set M OT of all possible motives from which we take a finite collection M OT (S) of motives in S; see Buteau 2001 for details. In the exact construction the space of a composition is defined as the relativization to M OT (S) of the topology on M OT .
Melodic Clustering within Motivic Spaces
61
geometrical representation, we introduce4 the following functions: pres (M ) := 1 ∗ · #{N ∗ ⊂ N |gdP t (N , M ) < }, where m = card(M ) and n = N ∈ M OT (S) 2n−m 1 ∗ card(N ); con (M ) := N ∈ M OT (S) 2m−n · #{M ∗ ⊂ M |gdP t (M , N ) < }; and the weight of motif M at radius as weight (M ) := pres (M ) · con (N ). The motivic topology for S corresponds to the motivic structure of S (Buteau 2003; Mazzola 2002). The formalization of the germinal function of a motif, i.e. of being omnipresent in a composition given a similarity threshold , is formalized by the motives with largest weights at radius (Buteau 2003; Mazzola 2002). 2.1 Melodic Clustering within Motivic Spaces We now introduce the definition of a melodic cluster in our topological spaces. Given a set X ⊂ M OT (S) of motives and > 0, the -variation set of motif M in X X X is ∈ X|N ∈ V (M ) or M ∈ V (N )}, or equivalently V ar (M ) = V ar (M ) := {N V (M )∪W (M ) ∩X, where W (M ) = {N ∈ M OT (S)|M ∈ V (N )} is a closed set in M OT (S). The intersection with set X corresponds to consider the relative subspace to X. In order to model clustering approaches, such as r Cambouropoulos and Widmer 2000, we introduce an additional set CM intersection with the variations that depends on the cardinality of the motif M . For example, we require that motives in the variation set of M should have a cardinality of at least 70% of the cardinality of M . This can be formalized by the cardinality restriction min(card(M),card(N )) function r : X × X → {0, 1} with r(M, N ) := 1 if max(card(M),card(N )) ≥ 70%, and r 0 otherwise; together with the set CM = {N ∈ X|r(M, N ) = 1}. We call X a clustering set and we introduce the -cluster ClusterX (M ) of motif M r in X (with respect r) as ClusterX (M ) := V arX (M ) ∩ CM . Given a set X of motives and a cardinality restriction function r, clustering the set X of motives corresponds to construct all the -clusters ClusterX (M ), i.e. for all motives M ∈ X and all similarity threshold > 0. Note that the introduction of the -variation sets of motives involves some kind of new ’distance’-function between any two motives in X, which satisfies the reflexivity and symmetry properties of a pseudo-metric, but in general does not satisfy the triangular inequality; take for example two motives M and N of same cardinality with a common sub-motif but with distance gdt (M, N ) = 0. Since weight functions are global functions, it is of interest to introduce local weight functions to a relative space of M OT (S), in particular to the clustering set X, and to compare it with the global weight section on X, i.e. weight|X . We define X locW eightX (M ) as being the product of locP resX (M ) with locCon (M ) whose definitions remain the same as pres and con functions, except for the sum index that changes to ’N ∈ X’.
3 Model Implementation and Visualization in OpenMusic The motivic model was first partially implemented by Mazzola and Zahorka (1994) as a module of the software RUBATOc . It was completely reimplemented by Buteau (2004) 4
For computational efficiency purposes, these functions can be redefined on corresponding quotient spaces of gestalts and for analysis purposes, they can be generalized (Buteau 2001).
62
C. Buteau and J. Vipperman
in JAVA where the major improvements are the rich diversity of the outputs unveiling all details of the topological spaces and a significant enhancement of computational efficiency. The clustering extension was designed, in 2006, based on an algorithm to find maximal cliques in a dynamic graph (Stix 2004). It is implemented in line with the core program efficiency: calculations are reduced to motif classes. In addition to our core JAVA implement, called Melos, we designed a visualization tool in OpenMusic (Agon and Assayag 2002a), called OM-Melos Clustering Tool and implemented in 2007 by Vipperman. Figure 1 shows the overall flowchart of our implementation: the program input is a score file (MIDI format or text file5 for specification of the analysis segmentation), a clustering set file (MIDI or text file), and analysis parameter settings, e.g. topological parameters t, P , and dt . The output6 is a text file to be passed to our implementation in OpenMusic for the detailed visualization of (topological) melodic clustering.
Fig. 1. The overall flowchart of the implementation of our topological melodic clustering model and its visualization (in OpenMusic)
The OM-Melos Clustering Tool automatically exhibits the melodic clustering as a dynamic table (a OM-Maquette interface (Agon and Assayag 2002b)) of labeled colored boxes. It shows the initial state at which motives (represented by boxes) in the clustering set are linked (same color) to one another with same gestalt; see Figure 2. With a key command, it exhibits the melodic clustering by labeling the boxes with the -cluster numbers (possibly more than one number) at each similarity threshold . Additionally, the Clustering Motif Info Window (see Figure 3) displays, for each clustering 5
6
The OM-Melos Score Tool, implemented in OpenMusic, reads and displays a music piece from a MIDI file, allows the user to easily segment the piece for the analysis, and saves it in a text file. More details about the complete OM-Melos tool can be found in Buteau and Vipperman 2008. Note that the program returns two files, one of which is passed to computer algebra system MAPLE for three-dimensional weight graphs and Motivic Evolution Trees (Buteau 2003) and the other file is passed to OpenMusic. For more details on OM-Melos, see Buteau and Vipperman 2008.
Melodic Clustering within Motivic Spaces
63
Fig. 2. The dynamic clustering maquette of OM-Melos Clustering Tool displays the resulting melodic clustering. This example shows the clustering of the soprano voice of Schumann’s Tr¨aumerei (see Figure 4) constructed in the motivic space with t = Com, P = T r, and dt = REdt , in which we added a score line to exemplify how this representation relates to the score. The figure shows the initial state at which clustering motives (boxes) are linked to one another (represented by same color) if they share same gestalt. For example, the ‘salient’ (Repp 1992) ascending motives (motives 1, 6, 10, ... in Figure 4) all have same COM -matrix (shown in brown). The number of lines (6 in this example) in the table is determined by the user.
Fig. 3. The Clustering Motif Info Window of OM-Melos Clustering Tool displays, for a motif in the clustering set X local and global weight function graphs of the motif (two bottom left boxes), the motif’s shape (third box from the bottom left), and the notes forming the motif (bottom right box - the motif is displayed in the upper key). This example shows topological information about the salient ascending motif 1 (see Figure 4) in the soprano voice melodic segmentation of Schumann’s Tr¨aumerei (t = Rg, P = T r, dt = REdt ).
motif box, local and global weight function graphs of the motif and its shape, and shows and plays the notes forming the motif. Two additional visualization functionalities are implemented: the dynamic motif clusters (an OM-Maquette interface) and clustering motif set displays reveal other important details for the melodic clustering analysis.
64
C. Buteau and J. Vipperman
4 Application to Schumann’s Tr¨aumerei In Buteau 2006 our melodic clustering analysis of the soprano voice (28 motives) of Tr¨aumerei (see Figure 4) compared well with the melodic/rhythmic segmentation suggested by music theorist Repp (1992) and with a computer-generated clustering (Cambouropoulos and Widmer 2000). In this section we briefly discuss the extension of our clustering analysis to the ’primary’ (Repp 1992) motives (36 in total) “which represent the leading voice(s) in the polyphonic quartet” (Repp 1992) and involve all 4 voices. We also address the complete 4 voice segmentation (70 motives), and compare it to the melodic segmentation reference (Repp 1992)7 .
Fig. 4. The soprano voice of Schumann’s Tr¨aumerei with the melodic/rhythmic segmentation proposed by Repp (1992)
We constructed the melodic clustering within topological spaces with shape types rigid, COM-matrix, diastematic, and elastic (Mazzola 2002), the paradigmatic groups T r and CP , and the relative Euclidean distance function dt = REdt , for both the primary motives and complete 4 voice segmentations, with the 70%-cardinality ratio restriction function. Using OM-Melos Clustering Tool, we visualized, dynamically, the melodic clustering and compared it with the segmentation reference. Figure 5 shows the melodic segmentation for the primary motives proposed by Repp and by our topological model at a fixed similarity threshold. The tables should read as follows: each cell in the table corresponds, in a chronologically consistent manner, to a primary motif in the score. The very left column indicates the phrase structure of the Tr¨aumerei containing two main phrases, A and B, that appear in some variations (Ai and Bi). The small letter symbols correspond to clustering labels (accordingly to Cambouropoulos and Widmer 2000 with the soprano voice clustering). An empty cell in a table corresponds to a monadic category. The primary motives cluster set shares most of its motives with the soprano voice segmentation (motif clusters ’a’ to ’g’ in Table 1). This contributes to our resulting clusterings of primary motives that concord well with the segmentation reference. 7
Cambouropoulos and Widmer (2000) confined their melodic clustering approach to the soprano voice.
Melodic Clustering within Motivic Spaces
65
Table 1 Table 2 Melodic Segmentation Melodic Segmentation According to Repp Within Motivic Spaces A1 a b c d e i A1 a b c d e i B1 a b f h g j B1 a b’ f i B2 a b f h g j B2 a b f h g B3 a b f h g j B3 a b f h g A1 a b c d e i A1 a b c d e i A2 a b c d e A2 a b’ c d e Fig. 5. The melodic/rhythmic segmentation of primary motives of Schumann’s Tr¨aumerei according to Repp (1992) in Table 1 and according to our topological approach in Table 2 with parameters t = Dia, P = T r, dt = REdt , and similarity threshold = 0.4714. Analysis with other topological parameters yield similar results.
Fig. 6. Motives in Schumann’s Tr¨aumerei (bars 10 - 12): 2-note motives B and E are identified together in motivic spaces with the COM -matrix shape type, whereas Repp (1992) distinguishes them, possibly with their onset but identifies them with simultaneous (super-)motives A and C, respectively, in other voices. Motives F, G and H are similarly identified.
Our resulting 4-voice clusterings8 do not compare as closely to the segmentation reference as the soprano and the primary motives segmentations do. For instance, many of the 2-note motives are all identified together in our motivic spaces, e.g. with COM matrix shape type, whereas Repp distinguishes them possibly with their onset but identifies them with simultaneous (super-)motives in other voices. For example, motives B and E in Figure 6 are not the same ’melodic gesture’ according to Repp 1992, whereas they are identified together in motivic spaces with t = Com. Furthermore, motives A and B are regrouped as same melodic gesture according to Repp 1992, as are motives F, G, and H. Motives C, D, and E are regrouped as related gestures (i.e. primary gesture C and secondary gesture D-E). In motivic spaces with t = Com and P = T r, the motif onset information is lost and motives B, D,and E are identified together in the same gestalt, as are motives G and H. However, when using the 70%-cardinality ratio restriction function, they cannot, for any > 0, be in the same -cluster as their super-motives A, C, and F respectively. This was not an issue when we dealt with only one voice clustering set. In fact, when considering each of the 4 voices separately, our resulting clusterings compare well with 8
Our Java implement computes multi-voice clusterings, but their visualization implementation in OM-Melos Clustering Tool is still in progress.
66
C. Buteau and J. Vipperman
the segmentation reference. In general, for each constructed motivic space on the 4voice composition, some ’Repp’s clusters’ are inevitably regrouped in the same cluster in the motivic spaces, especially when weakening the 70%-ratio restriction function. This makes our clustering results coarser but not contradictory to Repp’s segmentation. Finally, for the motivic space on the soprano voice with rigid or elastic type (with P = T r and dt = REdt ) and generalized weight function9, the so-called ‘salient’ ascending motif ‘6 - 1’ (see Figure 3) is prominent both locally and globally. Further systematic investigations on local and global weight functions on this composition are ongoing.
References Agon, C., Assayag, G.: Object-Oriented Programming in OpenMusic. In: Mazzola, G., et al. (eds.) Topos of Music, pp. 967–990. Birkh¨auser, Basel (2002) Agon, C., Assayag, G.: Programmation Visuelle et Editeurs Musicaux pour la Composition Assist´ee par Ordinateur. In: IHM 2002. ACM Computer, Poitiers (2002) Buteau, C.: Reciprocity between Presence and Content Functions on a Motivic Composition Space. Tatra Mt. Math. Publ. 23, 17–45 (2001) Buteau, C.: A Topological Model of Motivic Structure and Analysis of Music: Theory and Operationalization. Ph.D Thesis, Universit¨at Z¨urich, Z¨urich (2003) Buteau, C.: Motivic Spaces of Scores through RUBATO’s MeloTopRUBETTE. In: Lluis-Puebla, E., Mazzola, G., Noll, T. (eds.) Perspectives in Mathematical and Computational Music Theory, pp. 330–342. Verlag epOs-Music, Osnabr¨uck (2004) Buteau, C.: Melodic Clustering Within Topological Spaces of Schumann’s Tr¨aumerei’. In: Proceedings of ICMC, New Orleans, pp. 104–110 (2006) Buteau, C., Vipperman, J.: Representations of Motivic Spaces of a Score in OpenMusic. Journal of Mathematics and Music 2(2) (2008) ´ Melodic Similarity Algorithms - Using Similarity Ratings for DevelCahill, M., Maid´ın, D.O.: opment And Early Evaluation. In: Proceedings of ISMIR 2005, London, pp. 450–453 (2005) Cambouropoulos, E., Costas Tsougras, C.: Influence of Musical Similarity on Melodic Segmentation: Representations and Algorithms. In: Proceedings of the International Conference on Sound and Music Computing (SMC), Paris, France (2004) Cambouropoulos, E., Widmer, G.: Melodic Clustering: Motivic Analysis of Schumann’s Tr¨aumerei. In: Proceedings of JIM, France (2000) Mazzola, G., et al.: The Topos of Music. Birkh¨auser, Basel (2002) Mazzola, G., Zahorka, O.: The RUBATO Performance Workstation on NEXTSTEP. In: Proceedings of ICMC 1994, San Francisco (1994) Repp, B.: Diversity and commonality in music performance: An analysis of timing microstructure in Schumann’s Tr¨aumerei. Journal of Acoustical Society of America 92(5), 2546–2568 (1992) R´eti, R.: The Thematic Process in Music. Greenwood Press, Connecticut (1951) Stix, V.: Finding All Maximal Cliques in Dynamic Graphs. Computational Optimization and Applications 27, 173–186 (2004)
9
We used the weight function defined with squaring the content (to make more weight on larger motives as opposed to short ones) and having a factor of 0.8 for the cardinality difference between motives.
Topological Features of the Two-Voice Inventions Kamil Adilo˘glu and Klaus Obermayer Berlin University of Technology [email protected], [email protected]
Abstract. The similarity neighbourhood model is a mathematical model making use of statistical, semiotical and computational approaches to perform melodic analysis of given music pieces. This paper is dedicated to the investigation of topological features and conditions in connection with the model on the one hand and concrete analyses on the other. Therefore, checking the topological features of the model as well as the analysis results is a good practice not only for theoretical but also for practical reasons. The topological features of the similarity neighbourhood model are investigated from a theoretical viewpoint, in order to figure out under which conditions the collection of the results yielded by the model define a topology. These topological features are then tested practically on the two-voice inventions. These investigations and tests have shown that the similarity neighbourhood model defines a topology not for all cases, but depending on the analysed musical piece.
1 Introduction The similarity neighbourhood model has been designed to extract the melodic structure of a given musical piece. The model is based on the similarities between melodies of equal length. However, on the basis of the detected similarities, the model further identifies sub- and super-segment relationships within the given piece. The similarity neighbourhood model is inspired by topology. For this reason, a topological terminology is used in the exposition of the model to explain some musical and/or music-theoretical relationships. However, we did not mathematically investigate the family of neighbourhoods as a whole and therefore missed the potential of topological methodology. This applies to the entire model on a theoretical level as well as to the interpretation of individual analytical results. In the field of mathematical music theory, there are several research studies, in which melodic similarities in music pieces are investigated by using topology in a more strict theoretical sense. Buteau (2001, 2003) defined a topological model, in which the - neighbourhoods are defined for each motif based on the similarity degree of them. The inheritance property guarantees that the similarity of two motives is passed on their sub-motives. This ensures that the set of all - neighbourhoods forms a base for a topology. Adiloglu et al. (2006a) defined a correlation based model to identify similarities between melodies. This approach differs also in another respect from Buteaus’s approach: Only sequences of consecutive notes are considered as melodies. Based on the similarities, the neighbourhood sets are defined for each melody. These neighbourhood sets only contain equal length melodies. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 67–77, 2009. c Springer-Verlag Berlin Heidelberg 2009
68
K. Adilo˘glu and K. Obermayer
Mazzola (2002) and Buteau (2001, 2003) quantified the presence of a melody m within a longer melody m in terms of an intensity number which measures the cardinality of those submotives of m which are similar to m. The total presence of m is a weighted sum of all these intensity numbers. The content of a melody is defined in a reciprocal manner accordingly. Adiloglu and Obermayer (2005) redefined presence and content in terms of sets of melodies rather than in terms of numbers. The presence neighbourhood set of a given melody is the set of melodies that contain the given melody. The content neighbourhood set of a given melody consists of the sub-segments of the given melody. Ruwet (1987) claims that the main criterium, which governs the process of segmentation is repetition. Therefore he suggests to start the segmentation of a given piece with the longest repeated passages. He extends this idea into a quite flexible segmentationalgorithm for musical pieces. Adiloglu and Obermayer Adiloglu and Obermayer (2006c) made use of his method to perform a reduction of the analysis results of their model. In this paper, we present the topological investigation of the similarity neighbourhood model (for the theoretical details see Adiloglu and Obermayer 2005, 2006b, 2006c, and Adiloglu et al. 2006a).The topological features of the model are presented on a practical test scenario on the Two-Voice Inventions of J.S. Bach.
2 The Similarity Neighbourhood Model In the similarity neighbourhood model, we use only the chromatic pitch height values of the notes and ignore their durations and inter-onset intervals between neighbouring notes, i.e. we ignore the parameters that constitute rhythm or other parameters that are related to the articulation of the piece. Temporal and articulation information is simply reduced to the sequential order of the notes in time. Hence, we define a melody m of length n as a sequence of n integers (t1 , t2 , . . . , tn ) ∈ Zn , whose coordinates ti denote chromatic pitches. A monophonic piece M in itself is regarded as a sequence of integers M = (τ1 , ..., τN ). To simplify the situation in a polyphonic piece we disregard the vertical contrapuntal incidences, and consider the piece from a syntagmatic point of view merely as a list of voices (Mk )k=1,...,ν , where ν is the number of voices within the piece M . The basic information of a polyphonic analysis is then a ν × ν-upper-triangular-matrix of comparative analyses for every pair (Mk , Ml ) of voices from which further global information can be extracted. The voices are supposed to be disjoint, i.e. the total νnumber of pitch-occurrences of a polyphonic piece is supposed to be the sum N = k=1 Nk , where Nk is the length of the voice Mk . In analogy to the monophonic case, we designate the Nk pitch-coordinates of the voice Mk by using the same symbols as follows: Mk = (τk,1 , ..., τk,Nk ). We need to distinguish between abstract submelodies of a given melody M and concrete occurrences of such sub-melodies within M , which shall be called sub-segments. A sub-melody of M of length n is a sequence (t1 , ..., tn ) ⊂ Zn such that there exists an index i ≤ Nk − n within the k th voice of the piece M with (t1 , t2 , ..., tn ) = (τk,i , τk,i+1 , ...τk,i+n−1 ).
Topological Features of the Two-Voice Inventions
69
n The sub-segment Mk,i = (i, k, (τk,i , ..., τk,i+n−1 )) is the concrete occurrence of the sub-melody (τk,i , ..., τk,i+n−1 ), which starts at index i within the voice k. Thus, it is modeled as a 3-tuple, which consists of the index i, the voice index k and the submelody of length n, which starts at index i within the voice k. In order to denote only n the sub-melody of length n, we write M(i) (putting subscript i in parantheses), i.e. n n Mk,i = (i, k, Mk,(i) ). We use the same terms and notation for the identification of abstract sub-melodies (s1 , ..., sn ). which start at index j relative to any given melody m = (t1 , ..., tn ), s. n t. (j + n − 1 ≤ n) (e.g. m = Mk,i being a sub-segment of M ) and concrete ocn currences mj = (j, (tj , ..., tj+n −1 )) of such sub-melodies as sub-segments of m. Similarly, mn(j) denotes the sub-melody, whereas mnj is the concrete occurrence of this
sub-melody within the given melody m, s. t. mnj = (i, mn(j) ). Note that the voice index k is not necessary for a sub-segment of a given melodic segment m, since the voice index of m is already set. In order to have a transformation invariant representation, the shape of the melodic n segment m = Mk,i is calculated based on the chromatic distances between the conn n secutive pitch-coordinates of the melody Mk,(i) . The shape is defined as μ(Mk,(i) )= (t2 − t1 , t3 − t2 , . . . , tn − tn−1 ). So, the shape of a given melody is the sequence of intervals between the consecutive pitches. We use of the correlation coefficient d : Rn−1 × Rn−1 → [−1, 1] for calculating the absolute value |d(μ(m1 ), μ(m2 ))| of the correlation between the shapes μ(m1 ) and μ(m2 ) of two melodies m1 and m2 of the same length n in order to decide upon their similarity with the help of some threshold. Similarity includes — as a special case of maximal similarity — the musical transformations of chromatic pitch transposition and inversion, because μ(m + t) = μ(m) and d(x, −x) = −1. Note, that the retrograde of a melody cannot be identified in this way. Those segments, which are similar to a given melodic segment m are stored in the similarity neighbourhood of the given melodic segment. Definition 1. The similarity neighbourhood URn (m, M ) of a given melodic segment n m = Mk,i within the voice k of a given piece M is defined as: n n n : |d(μ(Mk,(i) ), μ(Ml,(j) ))| > R}, where URn (m, M ) = {Ml,j
R=2
1 n c1
−1
.
(1) (2)
The distance is calculated between the given melody and all other equal length melodies within the given piece. The similarity neighbourhood set of the melodic segment m thus contains equal length n melodic segments m = Ml,j similar to m. The members m of a similarity neighbourhood set of a melodic segment m are be said to be first order similar with respect to R. Likewise, two melodic segments m1 and m2 are be said to be second order similar, if the intersection of their similarity neighbourhood sets is non-empty, and contains neither m1 nor m2 . A re-iteration of
70
K. Adilo˘glu and K. Obermayer
this principle leads to an equivalence relation, whose equivalence class Ecn (M ) shall be called the connectivity component with the index number c, which contains all equal length melodic segments being first- or higher-order similar to each other. Hereby, every connectivity component can be indexed also by its corresponding representative melody m∗ , which is defined to be the melodic segment with the similarity neighbourhood set of largest cardinality. The connectivity component obtained by this process contains all melodic segments of the same length, which are related to each other in the following sense: For any two different melodic segments ma1 and maj , of a connectivity component, there exists a chain of first-order similarity connection between melodic segments ma1 ma2 ma3 · · · maj . In fact, often the similarity neighbourhood sets of two similar melodic segments of the same length contain a lot of common segments and the connectivity components Ecn (M ) of melodies of length n are the fixed points of the iterative procedure of unification. Constructing the connectivity components simplifies the control of the results, and indicates the second, third and higher order similarities of melodies in a better way.
3 Inheritance Property Intuitively, the inheritance property says that similar melodies have similar submelodies. The inheritance property defined by Mazzola and Buteau addresses a one-directional relationship between motives and sub-motives. The similarity of two motives implies the similarity of their corresponding submotives. However, the inheritance property, which we define below in terms of two conditions, considers a bidirectional relationship, namely, from segments to sub-segments, as well as from sub-segments to segments: n Definition 2. Suppose we are given a melodic segment m = Mk,i of length n and a n sub-segment m = mj of length n (with j + n − 1 ≤ n). n n 1. Further suppose we are given a melodic segment m = Ml,i , s.t. m ∈ UR (m, M ), which is similar to m. In association with m we consider its sub-segment m = n m j . We say that the similarity between the segments m and m is inherited by their sub-segments m and m if these sub-segments are similar as well, i.e. if m ∈ URn (m, M ). n ∈ 2. Further suppose we are given a melodic segment m = Ml,i +j−1 , s.t. m n UR (m, M ), which is similar to m. In association with m we consider the melodic n n segment m = Ml,i within the ambient melody. If m exists, m contains m = mj at the same relative location as m contains m. We say that that the similarity between the sub-segments m and m is inherited by the segments m and m if these segments are similar as well, i.e. if m ∈ URn (m, M ).
The first part of this definition requires that the similarity between two melodic segments implies that their corresponding sub-segments are similar as well. The second part requires the converse — which is not as intuitive as the first part. It says that if
Topological Features of the Two-Voice Inventions
71
Fig. 1. Bidirectional Inheritance Property
there are two similar sub-segments, their corresponding supersegments are also similar, if they exist within M . These conditions are not fulfilled in general. But if they are fulfilled, they help to remove redundant information from the analytical results. An instance of such a bidirectional relationship is shown in Figure 1.
4 Redundant Melodies Ruwet (1987) presents a method for paradigmatic partitioning starting with the identification of the longest repeated segments within a given piece. In the following steps, Ruwet identifies shorter segments as well as sub-segments of the previously identified longer segments, which partition the given piece any further. That is to say, the shorter segments or the sub-segments of the longer segments help to decrease the amount of the unpartitioned parts of the given piece by defining new partitions in those areas. The following strategy employs these ideas of Ruwet in a more general situation, where the segments are not strictly partitioning. From the similarity point of view, all of the sub-melodies of two similar melodies appear within, where these similar melodies appear. Therefore, pursuing Ruwet ideas, segments, which contribute to the partitioning of the given piece should be distinguished from the ones, which only appear redundantly within these partitioning segments. Hence, a melody is called redundant, if there is a longer melody containing the given melody, up to similarity, wherever the given melody appears. Due to the exhaustive nature, the similarity neighbourhood model identifies the redundant melodic segments as well. However these melodic segments can be removed from the set of the results. Definition 3. By weak reduction we mean the following reduction procedure, which is applied to the entire collection of neighbourhoods in order to yield an analogous family n n of neighbourhoods. Given two melodic segments m = Mk,i and m = Mk,j of length n n and n respectively, where m = mj (with j + n − 1 ≤ n). The melody m is removed from the results, if card(URn (m , M )) ≤ card(URn (m, M )). According to Definition 3, the melodic segments are removed, if the number of occurrences is less than or equal to the number of occurrences of the melodic segments containing them. This definition assumes that the shorter melodic segment only appears within the longer melodic segment containing it. However this does not have to be true for all cases.
72
K. Adilo˘glu and K. Obermayer
From music-theoretical point of view, melodies appearing not only within their super-segments but also independently within a given musical piece are important for this particular musical piece. These kinds of melodies should not be removed. On the contrary, they should be kept in the results and investigated further. Therefore, before reducing a melody, the first part of the inheritance property should be checked as well. For this reason, we define the strong reduction method that uses the inheritance property as a criterion for reduction. Definition 4. By strong reduction we mean the following reduction procedure, which is applied to the entire collection of neighbourhoods in order to yield an analogous n n family of neighbourhoods: Given two melodic segments m = Mk,i and m = Mk,j of n length n and n respectively, where m = mj (with j + n − 1 ≤ n). The melody m is removed from results, if card(URn (m , M )) ≤ card(URn (m, M )), and if for all melodic segments m ∈ URn (m , M ), there exists a melodic segment m ∈ URn (m, M ) such that m = mnj . The employed condition guarantees that just those melodies are removed from the analysis, which appear only within similar ambient melodies at analogous positions. Therefore, we call this reduction method strong reduction.
5 Finding Subsequences A music-theoretical melodic analysis explains how the melodic material is introduced and used throughout the given piece. Therefore the segment relationships between melodies should be identified as well. Adiloglu and Obermayer (2005) defined presence and content of melodies to investigate these relationships. The inheritance property is utilised as a condition to define these sets. The presence of a given melody is the appearance of the melody within other ambient melodies of the same piece and within similar melodies to those ambient melodies. The content of a given melody is the sub-segments of the given melody. These neighbourhood sets are called weak, if they do not satisfy the inheritance property. However, for the strong neighbourhood sets the inheritance property is enforced. Here we define the presence neighbourhood sets only for the connectivity components, in order to simplify the interpretation of the presence of two similar melodic segments within their super-segments. Definition 5. The presence n’-neighbourhood for the connectivity component Ecn (M ) (consisting of melodies of length n) is defined to be the union of those connectivity compo nents Edn (M ) (each consisting of melodies of length n ) containing a melodic segment m , such that a melodic segment m in the connectivity component Ecn (M ) is a subsegment of m . The presence neighbourhood for the connectivity component Ecn (M ) is the union of all presence n’-neighbourhood sets where n ∈]length(m), length(M )]:
Topological Features of the Two-Voice Inventions
PEq n (Ecn (M )) =
73
n
Edn (M )|m = m i ,
m ∈Edn (M) m∈Ecn (M)
PEq (Ecn (M )) =
PEq n (Ecn (M )).
The presence n’-neighbourhood sets for a connectivity component eventually consists of several connectivity components, due to the fact that two melodies containing the same melody can possibly belong to separate connectivity components. In this case, all of these connectivity components are included into the presence n’-neighbourhood set of the given connectivity component. However, the following result helps two make inferences about the construction of the presence neighbourhood sets of the connectivity components. Corollary 1. Suppose that there are two melodic segments m = Mkn1 ,i+j and m = Mkn2 ,i +j ∈ Ecn (M ) such that they are contained by the sub-segments Mkn1 ,i and Mkn2 ,i of length n at analogous positions i + j and i + j respectively. These two super segments Mkn1 ,i and Mkn2 ,i belong to the same connectivity component Edn (M ), if the second part of the inheritance property holds. The similarity relation between two melodic segments does not necessarily imply a similarity between their super-segments. The inheritance property is necessary in order this statement to be true. Hence, Corollary 1 mathematically formulates the necessity of the inheritance property to imply a similarity relation between the super-segments of two similar melodic segments. In order to ease the interpretation of the relationships between the sub-segments of two similar melodic segments we define the content neighbourhood sets for the connectivity components as well. Definition 6. The content n’-neighbourhood for the connectivity component Ecn (M ) of melodies of length n is defined to be the connectivity components Edn (M ) of melodies of length n ( with n < n), each of which containing at least one melodic segment m , such that m is a sub-segment of a melodic segment m in the connectivity component Ecn (M ). The content neighbourhood for the connectivity component Ecn (M ) is the union of the content n’-neighbourhood sets for n ∈ [minimum melody length, length(m∗)[: CEq n (Ecn (M )) = Edn (M )|m = mni ,
m ∈Edn (M) m∈Ecn (M)
CEq (Ecn (M )) =
CEq n (Ecn (M )).
For the same reason as in the presence neighbourhood sets, the content n’neighbourhood sets of a given connectivity component can contain more than one connectivity component. The following corollary explains the role of the inheritance property in the construction of these sets.
74
K. Adilo˘glu and K. Obermayer
Corollary 2. Suppose that there are two melodic segments Mkn1 ,i and Mkn2 ,i of length n which belong to the same connectivity component Ecn (M ), such that they contain the sub-segments Mkn1 ,i+j and Mkn2 ,i +j of length n at analogous positions i + j and i + j respectively. These two sub-segments Mkn1 ,i+j and Mkn2 ,i +j belong to the same connectivity component Edn (M ), if the first part of the inheritance property holds. During the construction of the connectivity components, all of the first- and higherorder similar melodies are collected in the same connectivity component. However, the inheritance property does not have an influence on constructing the connectivity components. It does not have an influence on the relationships between the connectivity components either. Hence it is not possible to enforce the inheritance property in this case. However the condition is true that if the second part of the inheritance property holds, for two similar melodies their super-melodies should be contained in the same connectivity component.
6 Melodic Topologies In the similarity neighbourhood model, we do not aim at defining a topological space at first place. However investigating the topological features of this model adds new valuable information to the analytical results and it helps to understand the commonalities and differences between strict topological approaches and ours. Three aspects of the model have been investigated in order to find out whether they define topologies Adiloglu and Obermayer (2006b). It is obvious that the connectivity components define a topological base, because the intersection of the them is always empty. What deserves attention is the intersection behaviour of the presence as well as of the content neighbourhood sets for the connectivity components. Theorem 1. The collection of presence neighbourhood sets for the connectivity components for a given a musical piece M define a base for a topology τ on the set {m|m ∈ M }, if and only if the inheritance property is satisfied. Proof. It is enough to show that given the connectivity components Ecn (M ) and Edn (M ) and a melodic segment m ∈ Edn (M ) such that m ∈ PEq (Ecn (M )), the following is true: PEq (Edn (M )) ⊂ PEq (Ecn (M )). If the inheritance property holds, there exists a melodic segment m such that m = and due to Corollary 2, m ∈ Ecn (M ). Suppose that there exists a melodic segment m ∈ Eun (M ) such that m ∈ PEq (Edn (M )). This means that there exists a melodic segment m such that m = n m j and due to Corollary 2, m ∈ Edn (M ). In the same way, due to Corollary 2, there should exist another melodic segment n o = m i such that o ∈ Ecn (M ). So n m i
PEq n (Ecn (M )) ⊃ Eun (M ) ⊂ PEq n (Edn (M )).
Topological Features of the Two-Voice Inventions
75
This statement is true for the content neighbourhood sets for connectivity components as well. However, in order to prove it for the content neighbourhood case, Corollary 1 together with the inheritance property is used. 6.1 Melodic Topologies on the Syntagms The experiments have shown that the reduction process decreases the number of results quite efficiently. This step makes the obtained results more clear by simply eliminating the irrelevant repetitions. However, the reduction process considered from the topological point of view does not make the results obey the inheritance property. For both of the reduction methods, the remaining similarity neighbourhood sets contain at least one melodic segment, whose super-segment is not contained in the similarity neighbourhood set like the other super-segments. Otherwise the similarity neighbourhood of the sub-segments would be reduced. The following expresses this statement in a mathematical way: ∃m such that m ∈ URn (m, M ). ∀URn (m , M ), where n > n, there is no m ∈ URn (m , M ), for which the following is true: m ⊂ m . Hence, the similarity neighbourhood set URn (m, M ) will not be reduced. The fact will in turn cause that the weak as well as the strong presence neighbourhood set (P resn (m, M ) or P resn (m, M ) R
R
respectively) of the similarity neighbourhood set URn (m, M ) will not contain a melody m , which is a super-segment of the melody m. Because of this fact, it cannot be guarann teed that the presence neighbourhood set PEq (Ecn (M )) for the connectivity component n m ∈ Ec (M ) contain a melody m , which is a super-segment of the given melody m. The melodic segment m, which secured the corresponding similarity neighbourhood set from reduction causes that the similarity neighbourhood set in relation to other similarity neighbourhood sets violates the inheritance property. In the previous section, We considered the cases, where the inheritance property does not hold. Therefore we will not repeat them here. As a consequence, the reduction process does not reform the results so that the inheritance property holds after the reduction has been applied. Therefore, theoretically it is not possible to prove that the inheritance property is satisfied for the similarity neighbourhood sets as well as for the connectivity components of the reduced results. Hence the same theorem proven in the previous section are also valid for the reduced results concerning the melodic topologies. Even-though, it is not possible to prove theoretically, that the reduced results define a topological base, experimentally the topological investigations can differ for the reduced and non-reduced case. In the following section, we investigated the Two-Voice Inventions from a topological viewpoint. 6.2 Investigation of the Inventions The analyses of the two-voice inventions have been tested for their topological features before and after the reduction process. For each case, the collection of the similarity neighbourhood sets defines a topological base. However, the collections of the presence and content neighbourhood sets for the connectivity components do not always define a topological base. This depends upon the concrete inventions.
76
K. Adilo˘glu and K. Obermayer Table 1. Topological Investigation of the Two-Voice Inventions Inv’s Inv 01 Inv 02 Inv 03 Inv 04 Inv 05 Inv 06 Inv 07 Inv 08 Inv 09 Inv 10 Inv 11 Inv 12 Inv 13 Inv 14 Inv 15
Prototypes Pres Cont 19 − 58 5 − 7 42 − 108 5 − 7 21 − 38 5 − 7 36 − 61 5 − 7 48 − 95 5 − 7 19 − 45 5 − 7 15 − 37 5 − 7 12 − 97 5 − 7 30 − 51 5 − 7 25 − 40 5 − 7 26 − 73 5 − 7 32 − 57 5 − 7 18 − 37 5 − 7 51 − 73 5 − 7 17 − 37 5 − 7
Syntagms Pres Cont 58 58 108 108 38 38 61 61 95 95 45 45 37 37 97 97 51 51 40 40 73 73 57 57 37 37 73 73 37 37
Table 1 summarises the topological investigation of the whole corpus. The larger of these two numbers indicate that the longest melodies within the connectivity components are 58 notes long. The longer melodies are singletons. The smaller number means that from this length on, the connectivity components would define a topology. However, connectivity components of the shorter melodies break the rules to define a topological base. Single numbers mean that the whole collection defines a topology. Table 1 shows that the connectivity components constructed without performing the reduction, which are shown in the “Prototypes” column, do not define topologies. The larger values in the “Prototypes” columns indicate the maximum length melodies identified within the piece. The smaller ones indicate the shortest melodies, for which the complete set of melodies up to the longest melodies defines topological base. However, shorter melodies exist, which do not satisfy the topological requirements. On the other hand, the connectivity components constructed after the reduction (“Syntamgs” column) define topological bases not only for the presence neighbourhood sets but also for the content neighbourhood sets.
7 Conclusion The similarity neighbourhood model is a simple but effective model to perform paradigmatic melodic analysis of pieces. The aim of the model is to help identifying the melodic structure of a given piece. These results can be used as input to perform further analysis, such as syntagmatic analysis by considering the interaction of the similarity relations between melodies with their environment or harmonic analysis by studying the solidarity of the melodic variation with harmonic progression. Since the distance measure can measure only the similarities between equal length melodies, the relationships between different length melodies cannot be measured in
Topological Features of the Two-Voice Inventions
77
the same way. Therefore the presence- and content neighbourhood sets were defined to explain the sub-segment relationships of melodies. From the music-theoretical viewpoint, the relations between the melodies of different length are explained to understand how the melodic material is introduced and used throughout the musical piece. The development of the similarity neighbourhood model does not aim to define a topology. Buteau 2001 (2001), Buteau 2003 (2003), and Mazzola (2002) actually make use of the inheritance property in order to construct topological bases. Such a basis can be actually defined by exhausting the limits of inheritance. Some of the distance measures tested on these models obey the inheritance property, such that it can be proven mathematically, without considering the concrete musical piece analysed. Hence the generation of a topological base is guaranteed for these distance measures. For the similarity neighbourhood set, the collection of the connectivity components define a topological base simply because the intersections of the connectivity components is always empty. This topology, however, does not yield any valuable information to the analysis process. On the other hand, the presence as well as the content neighbourhood sets for connectivity components of the similarity neighbourhood model can define a topological base depending on the given musical piece as long as the inheritance property is satisfied by these sets. The tests have shown that the collection of the reduced results satisfy inheritance property for the whole corpus of Two-Voice Inventions. The reduction process inspired by the ideas of Ruwet produces music-theoretically relevant results Adiloglu and Obermayer (2006c). From a mathematical point of view, results obtained after the reduction process have a mathematically stable structure. Even-though a mathematical proof is not possible that the inheritance property is satisfied, practical results are promising in the sense that reduction yields a music-theoretically relevant results as well as brings to the results a mathematical structure.
References Adiloglu, K., Obermayer, K.: Finding Subsequences of Melodies in Musical Pieces. In: Proceedings of ICMC. Pompeu Fabra University, Barcelona (2005) Adiloglu, K., Noll, T., Obermayer, K.: A Paradigmatic Approach to Extract the Melodic Structure of a Musical Piece. Journal of New Music Research 35(3), 221–236 (2006) Adiloglu, K., Obermayer, K.: Melodic Topologies. In: Proceedings of ICMC. Tulane University, New Orleans (2006) Adiloglu, K., Obermayer, K.: A Reduction Method for the Paradigmatic Melodic Analysis. In: Mathematics and Computation in Music, Berlin (2002) (in print) Chantal, B.: Reciprocity between Presence and Content Functions on a Motivic Composition Space. Tatra Mt. Math. Publications 23, 17–45 (2001) Buteau, C.: A Topological Model of Motivic Structure and Analysis of Music: Theory and Operationalization. PhD Thesis. University Zurich, Zurich (2003) Mazzola, G.: The Topos of Music. Birkh¨auser, Basel (2002) Ruwet, N.: Methods of Analysis in Musicology. Music Analysis 6, 11–36 (1987)
Comparing Computational Approaches to Rhythmic and Melodic Similarity in Folksong Research Anja Volk1 , J¨ org Garbers1 , Peter van Kranenburg1, Frans Wiering1 , Louis Grijp2 , and Remco C. Veltkamp1 1
Department of Information and Computing Sciences, Utrecht University 2 Meertens Institute Amsterdam
Abstract. In this paper we compare computational approaches to rhythmic and melodic similarity in order to find relevant features characterizing similarity in a large collection of Dutch folksongs. Similarity rankings based on Transportation Distances are compared to an approach of rhythmic similarity based on Inner Metric Analysis proposed in this paper. The comparison between the two models demonstrates the important impact of rhythmic organization on melodic similarity.
1
Introduction
Computational approaches to melodic similarity such as proposed by Ahlb¨ ack (2004), M¨ ullensiefen (2004) and Typke (2007) contribute to the study of melodies in the different areas of music cognition, ethnomusicology and music information retrieval. In this paper we study rhythmic similarity in the context of melodic similarity as a first step within the interdisciplinary enterprise of the WITCHCRAFT project (Utrecht University and Meertens Institute). The project makes use of and contributes to methods in these three areas in order to develop a content based retrieval system for a large collection of Dutch folksongs. The retrieval system will give access to the collection Onder de groene linde hosted by the Meertens Institute to both the general public and musical scholars. For the latter it is of special interest to be able to classify, identify and trace melodic variants with the help of the retrieval system to be designed. The similarity between different variants of a folksong melody is based on a variety of musical dimensions, such as rhythm, contour or cadence notes. According to cognitive studies metric and rhythmic structures play a central role in the perception of melodic similarity. For instance, in Ahlb¨ ack 2004 an exact repetition of a pitch sequence has not be recognized if it was not congruent with the fundamental metrical structure.In the immediate recall of a simple melody studied in Sloboda 1985 metrical structure was the most accurately remembered structural feature. In this paper we focus on rhythmic similarity by comparing similarity rankings based on Inner Metric Analysis (IMA) to Transportation Distances (see Typke 2007 and Bosma et al. 2006). T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 78–87, 2009. c Springer-Verlag Berlin Heidelberg 2009
Comparing Computational Approaches to Rhythmic and Melodic Similarity
79
Transportation Distances have been successfully applied to the measurement of melodic similarity of, for instance, RISM incipits or karaoke pieces (see Typke 2007). By excluding the pitch factor we apply the Transportation Distances in this paper to rhythm only in order to study the impact of rhythm on melodic similarity. Inner Metric Analysis has been successfully applied to the study of metric structures of musical pieces in the context of music analysis (Fleischer 2003), music cognition (Volk 2003) and classification (Chew et al. 2005). We therefore propose in this paper an approach to the measurement of rhythmic-metric similarity based on IMA and compare the results to those of the Transportation Distances.
2 2.1
Two Computational Approaches to Rhythmic Similarity Transportation Distances
Transportation distances consider melodies as weighted point sets. A similarity (or distance) measure between two melodies is defined on the basis of weight flows between these point sets that have to be minimized. As a metaphor the point set of one melody is regarded as heaps of sand and the point set of the second melody as holes in the ground. Transportation distances compute the minimum amount of work needed to fill the holes with sand. We compare two instances of these distances, namely the Earth Mover’s Distance (EMD) and the Proportional Transportation Distance (PTD). Both distance measures and their application to melodies are described in detail in Typke 2007. In the application to melodies, every note is a point in the Euclidean space with the two coordinates pitch and onset time, the duration of the note determines its weight. In this article we apply these distances to rhythms instead of melodies, hence the coordinates of the points are determined by the onset time only. Figure 1 gives an example for two short rhythms with the minimal flow of weights according to the EMD. The arrows indicate which amount of weight from the first rhythm is transported to which note in the second rhythm. For instance, the first rhythm is being described with the point set r1 = {(0.0, 1.0), (1.0, 0.5), (1.5, 0.25), (1.75, 0.25), (2.0, 0.5), (2.5, 0.5), (3.0, 1.0)}. The first coordinate of each point is the onset time of the note, the second is its weight (which equals its duration). The difference between EMD and PTD becomes evident in the similarity comparison between pieces of different total weight, which is in this case the total length. The EMD realizes partial matching, hence ignores the extra notes in the longer piece. Within the PTD approach the total weight of each piece is normalized in order to prevent partial matching, hence the existence of extra notes that can not be matched in the second melody is effectively penalized. 2.2
Inner Metric Analysis
Inner Metric Analysis (see Mazzola 2002, Fleischer 2003) describes the inner metric structure of a piece of music generated by the actual notes inside the
80
A. Volk et al.
Fig. 1. Example of minimal weight flow between two rhythms. The point size depicts the weight.
bars as opposed to the outer metric structure associated with a given abstract grid such as the bar lines. The model assigns a metric weight to each note of the piece. Figure 2 gives an example for the song OGL 199141 (from the collection Onder de groene linde) belonging to the melody group Deze morgen. The notes of the first phrase are shown in the top example of figure 4. For each note a line depicts the metric weight such that the higher the line, the higher the corresponding weight. The background gives the bar lines for orientation. The metric weight profile corresponds to the typical accent hierarchy of a 6/8 bar.
Fig. 2. Metric weight of OGL 19914, melody group Deze morgen in 6/8
The details of the model have been described in Fleischer 2003 and Chew et al. 2005. The general idea is to search for all pulses (chains of equally spaced events) of a given piece and then to assign a metric weight to each note. The pulses are chains of equally spaced onsets of the notes of the piece called local meters. Let On denote the set of all onsets of notes in a given piece. We consider every subset m ⊂ On of equally spaced onsets as a local meter if it contains at least three onsets and is not a subset of any other subset of equally spaced onsets. Let k denote the number of onsets a local meter consists of minus 1. Hence k counts the number of repetition of the period (distance between consecutive onsets of the local meter) within the local meter. The metric weight of an onset is then calculated as the weighted sum of the length k of all local meters mk that coincide at this onset.
1
OGL as the abbreviation of Onder der groene linde.
Comparing Computational Approaches to Rhythmic and Melodic Similarity
81
Let M () be the set of all local meters of the piece of length at least . The general metric weight of an onset, o ∈ On, is as follows: W,p (o) = kp . {m∈M():o∈mk }
In all examples of this paper we have set the parameter = 2, hence we consider all local meters that exist in the piece. In order to obtain stable layers in the metric weights of the folksongs we have chosen p = 3. 2.3
Defining Similarity Based on Inner Metric Analysis
Metric weights of short fragments of musical pieces have been used in Chew et al. 2005 to classify dance rhythms of the same meter and tempo using the Pearson correlation coefficient. In this article we want to modify this approach to measure the similarity between the rhythms of two complete melodies in terms of the metric structure implied by these rhythms. The similarity measurement is hence carried out on the analytical information given by the metric weights. Since the metric weight is defined only on note onsets, we define in a first step for each of the two pieces the metric weight of all silence events as zero and hence obtain the metric grid weight. The silence events are inserted along the finest grid of the piece which is determined by the shortest existing interval between two consecutive onsets of the piece. Thus we obtain a weight for all events e of the piece along the finest onset grid. We want to compare the consecutive weights within cells of equal total duration (for instance 4 quarter notes in length) of the two pieces. Therefore in cases where the finest onset grids of the two pieces differ, we adapt the grids of the pieces to a common finer grid by adding events e with the weight zero along the finer grid. In the second step, the metric grid weight is split into consecutive segments that cover an area of equal duration in the piece. These segments contain the
Fig. 3. The first three correlation windows of two metric weights to be compared
82
A. Volk et al.
weights to be compared with the Pearson correlation coefficient, we therefore call them correlation windows. The first correlation window of each piece starts with the first full bar, hence the weights of an upbeat are disregarded. For all examples of this article we have set the size of the correlation window to one bar. Figure 3 shows an example for two metric grid weights with the first 3 correlation windows. For the computation of the similarity measure both grid weights are completely covered with disjoined correlations windows. Let wi, i=1,...,n denote the consecutive correlation windows of the first piece and vj, j=1,...,m those of the second piece. Let ck, k=1,...,min(n,m) denote the correlation coefficient between the grid weights that are covered by the windows wk and vk . Then we define the similarity IM Ac,s that is defined on the subsets of the two pieces from the beginning until the end of the shorter piece as the mean of all correlation coefficients: min(n,m) 1 IM Ac,s = ck min(n, m) k=1
The partial similarity IM Ac,s disregards all extra notes at the end of the longer piece. However, in many contexts it might be important to add a penalty for these extra notes that have no counterpart in the shorter piece. Therefore we define the correlation coefficient value between the additional correlation windows of the longer piece to the empty correlation windows of the shorter piece as zero ck, k=min(n,m)+1,...max(n,m) = 0. Hence we obtain the similarity measure IM Ac,e taking into consideration the entire pieces: IM Ac,e =
1 max(n, m)
max(n,m)
ck
k=1
We will use the latter measure for the application to rhythmic similarity ranking in section 3.
3
Evaluation of the Rhythmic Similarity Approaches
In this section we compare the similarity measurements based on IMA, EMD and PTD in a first and simple approach to rhythmic similarity of melodies. The application of these measurements is simple in so far as it does not contain a segmentation procedure and the search for similar segments that are shifted in time. Since the pieces contain musically meaningful segments (phrases) we applied IMA, EMD and PTD to both single phrases and complete pieces. The evaluation of the similarity measurements is based on melody groups of the collection Onder de groene linde (OGL) of related songs. The melodies belonging to one group are being considered as musically similar. The current test corpus of digitized melodies contains 141 songs which are segmented into 567 phrases in total. One melody (or melody phrase) of such a group is selected as the query and the similarity measure to all other melodies (or melody phrases) in the test corpus is calculated and ordered (the ordered list starts with the most similar melody).
Comparing Computational Approaches to Rhythmic and Melodic Similarity
83
A good similarity measurement should therefore list the members of the group the query belongs to among the top hits of the list, if the members are more similar to each other than to members of other melody groups. The melody groups in our test corpus that have been constructed by musicologists fulfill this condition to a certain extent. However, sometimes a very similar song was assigned to a different group based on other than musical reasons - for instance, because of the text. A typical comparison of those ranking lists include the number of melodies that should have been found within the top hits of the list (because they are group members), but get a very low rank (”false negatives”) and the number of melodies that end up high in the ranking list but do not belong to the same melody group as the query (”false positives”). Since our melody groups have not been tested to always contain the most similar melodies, we will in our comparison not only count the false positives but check whether they are nevertheless musically similar. In the following section 3.1 we discuss one example in detail using the melody group Deze morgen, in section 3.2 we summarize very briefly further results of the comparison. 3.1
A Detailed Comparison on the Melody Group Deze Morgen
The melody group called Deze morgen contains 12 melodies which are very similar to each other. However, two songs have one phrase less than the others. First we want to compare the results of the ranking lists for a single phrase, in the second step we will use the entire piece. As the query for the single phrase we used the first phrase of OGL 19914 (the top melody from figure 4). For the evaluation of the ranking list we focus on the ranks that have been assigned to the other first phrases of melodies in this group, since they are all rhythmically very similar to the query. The ranking list according to IMA contains among the first 19 elements 11 members of the group and misses among the top 20 hits only one phrase at rank 29 (see figure 5). All false positives with a better rank than 29 are musically very similar to the query (for instance, most of them are second phrases from melodies of the same group). Figure 4 lists the best hits from the ranking list according to IMA with the exclusion of melodies that duplicate the rhythmic structure of melodies that are displayed. Hence the displayed 9 melodies stem from the best 19 matches. PTD ranks 10 group members among the first 22 matches. Thus it misses the first phrase of OGL 37511 in figure 5 which is placed on rank 68 and a very similar phrase to the latter one is placed on rank 73. The first phrase of OGL 37511 was ranked lower than all other members of the group (rank 29) by IMA as well, indicating that this rhythm is somewhat less similar to the query. However, the low rank of 68 according to PTD is very drastic. For instance, figure 6 gives three examples of phrases that are assigned a higher similarity to the query according to PTD. These are rhythmically less similar to the query than the missed phrase from figure 5.
84
A. Volk et al.
86
Ik ben d'r van de
86
Ik
ben er van de
86
Ik ben van de
ze
mor
ze
86
Ik ben van de
86
En ik ben van de
86
Ik
86
Ik
ben van de
86 Ik
ben van de
ben van de
mor
ze
mor
En ik ben er van de
86
mor
ze mor
ze mor
mor
ze
mor
ge
staan,
gen en vroeg op
mor
ze
gen d'r vroeg op
gen vroeg op
gen vroeg op
ze
ze
staan.
staan
ge
staan
re gen vroeg op
ge
staan
gen vroeg op
ge
gaan
gen vroeg op
ge
staan
ge
staan
gen vroeg op
ge
staan
re gen vroeg op
ge staan,
Fig. 4. Excerpt from the top hits from the list according to IMA (melodies of same rhythm excluded, the listed rhythms cover the first 19 matches)
86
Ik
ben van de
ze
mor
9 8
re gen en vroeg
op
ge staan,
Fig. 5. IMA assigns OGL 37511 rank 29
One of the reasons for the low rank of OGL 37511 are two long notes near the end of the query (the notes of the syllables ”op” and ”staan” in the top melody of figure 4). Both of them do not have a counterpart in OGL 37511 and therefore their weight is distributed over 5 different notes each that are located much earlier in the piece. In contrast to this, the first phrase shown in figure 6 contains many notes that are located in roughly the same area as the long end notes of the query. Also here the weight is distributed among 4 to 6 notes, but this weight has to be transported only locally and not to notes far apart as in OGL 37511. This results in a much higher similarity ranking. The false positives within the first 24 matches of the PTD list are all rhythmically similar except rank 11 which is shown in figure figure 6. On the other hand the ranking list according to IMA contains up to 29 similar elements in the beginning, hence the last elements are being missed by PTD.
Comparing Computational Approaches to Rhythmic and Melodic Similarity
86
Hij vroeg me
en o
op
de
zen dag of ik zijn huisvrouw we zen mag.
wee, en o wee en ach had den we maar een
46 Waar
zo
85
ve
le
he
ren
lo
ge
vree
ren
Fig. 6. PTD assings rank 11 (top melody), rank 34 (middle melody) and rank 50 (bottom melody)
86 Een
juf
fer
tje
fijn
Fig. 7. Melodic phrase at rank 3 according to EMD
86
Ik
ben van de
ze mor
re gen vroeg op
ge
staan
Fig. 8. Melodic phrase at rank 79 according to EMD
EMD ranks 10 group members among the first 24 matches and misses two phrases at the ranks 58 and 79. The false positive on rank 3 (see figure 7) demonstrates the partial matching of the EMD: since the 5 notes of that very short phrase can be matched with the first 5 notes in the query, this phrase gets a rather high similarity measurement. Among the false positives within the first 24 matches are in sum 4 examples of such shorter melodies that are rhythmically not very similar to the query. The low rank 79 for the first phrase of OGL 25904 (see figure 8) has its main reason in the existence of many shorter melody phrases in the test corpus. These phrases match a part of the query with a lower total weight than the weight of this phrase leading to a higher similarity value. In summary all three methods miss only very few group members by assigning them a low rank within the list. While the list ordered according to IMA contains 29 rhythmically similar melodies at the top and covers in this range also all first phrases from the group Deze morgen, both PTD and EMD miss 2 phrases. For the comparison of the entire melodies using OGL 19914 as the query we will give only a very short overview. The ranking list according to IMA contains within the first 11 hits 1 false positive and misses 2 melodies that have one phrase less then all other group members (rank 27, 57). The ranking list according to PTD contains within the first 17 hits 7 false positives at the beginning of the list. The two melodies being missed at the beginning (rank 45 and 107) are again the songs that have one phrase less. The ranking list according to EMD contains
86
A. Volk et al.
within the first 17 hits 9 false positives, three melodies have a much lower rank, two of them are the same being missed by IMA and PTD. Hence the melody group of Deze morgen is an example that demonstrates how much the rhythmic structure alone determines similarity. The comparison of the three models reveals the best results for the IMA, while the EMD has the most false positives due to its partial matching. 3.2
Summary of Further Results
A problem in the application of the PTD occurred in the comparison of last phrases. For instance, in the ranking list for the last phrases of the melody group called Heer Halewijn A both IMA and EMD find 9 similar phrases among the first 10 respective 11 hits. In contrast to this, PTD ranks only 2 of them at the top of the list, the others are ranked lower than rank 31. The reason for this is a difference in the duration of the last note (due to different transcription strategies of the recorded melodies). If the last note in all the examples is adjusted to the same duration, PTD lists 8 of the phrases among the top 11 melodies. Similar effects are observed in the ranking lists for the entire melodies. In most of the examples IMA gains the best result. However, if the query is rhythmically only very little differentiated (such as a quasi continuous eighth note chain), then the results of both Transportation Distances are more convincing. In general rhythmic similarity seems to be an important component of the similarity of melodies in the current test corpus of melodies from Onder de groene linde.
4
Conclusion
The aim of the comparison of the computational approaches to rhythmic similarity in this paper is a first test in how far the different methods are suited for finding rhythmically similar melodies. For the application of the PTD a solution concerning the length of the last note of a phrase has to be found. For the application of the EMD it might be necessary to filter out hits that are much shorter then the query if one is not interested in partial matching. The use of the metric weights obtained by IMA in their role as weights in the Transportational Distances instead of the duration could be a promising merge of the two models. The application of the Transportational Distances to pitches only while ignoring the rhythm information and a comparison to the results obtained in this paper is a further step towards the investigation of the importance of rhythmic similarity in the context of melodic similarity.
References Ahlb¨ ack, S.: Melody beyond notes. PhD thesis. G¨ oteborgs Universitet (2004) Bosma, M., Veltkamp, R.C., Wiering, F.: Muugle: A framework for the comparison of Music Information Retrieval methods. In: Proceedings of the ICMPC 2006, pp. 1297–1303 (2006)
Comparing Computational Approaches to Rhythmic and Melodic Similarity
87
Chew, E., Volk, A., Lee, C.-Y.: Dance Music Classification Using Inner Metric Analysis. In: Proceedings of the 9th INFORMS Computer Society Conference, pp. 355–370. Kluwer (2005) Fleischer (Volk), A.: Die analytische Interpretation. Schritte zur Erschließung eines Forschungsfeldes am Beispiel der Metrik. dissertation. de - Verlag im Internet Gmbh, Berlin (2003) Mazzola, G.: The Topos of Music. Birkh¨ auser, Basel (2002) M¨ ullensiefen, D.: Variabilit¨ at und Konstanz von Melodien in der Erinnerung. PhD Thesis, Hamburg (2004) van Dijk, M.B.G., Kuijer, H.J., Dekker, A.J. (eds.): Onder de groene linde. Verhalende liederen uit de mondelinge overlevering. Uitgeverij Uniepers, Amsterdam (19871991) Sloboda, J.A., Parker, D.H.H.: Immediate recall of melodies. In: Hower, P., Cross, I., West, R. (eds.) Musical structure and cognition, pp. 143–167. Academic Press, London (1985) Typke, R.: Music Retrieval Based on Melodic Similarity. PhD thesis, Utrecht University (2007) Volk, A.: The Empirical Evaluation of a Mathematical Model for Inner Metric Analysis. In: Proceedings of the 5th Triennal ESCOM Conference, Hanover (2003) Wiering, F., Veltkamp, R.C., Typke, R.: Transportation Distances in Music Notation Retrieval. Computing in Musicology 13, 113–128 (2004)
Automatic Modulation Finding Using Convex Sets of Notes Aline Honingh Music Informatics Research Group, Department of Computing, City University, London [email protected] Abstract. Key finding algorithms, designed to determine the local key of segments in a piece of music, usually have difficulties at the locations where modulations occur. A specifically designed program to indicate modulations in a piece of music is presented in this paper. It was previously shown that the major and minor diatonic scale, as well as the diatonic chords, form convex sets when represented in the Euler lattice (Honingh and Bod 2005). Therefore, a non-convex set within a piece of music may indicate that this specific set is not part of a diatonic scale, which could indicate a modulation in the music. A program has been developed that finds modulations in a piece of music by localizing nonconvex sets. The program was tested on the first five preludes and fugues in a major key from the first book of Bach’s Well-tempered Clavier. It has been shown that the algorithm works best for modulations that involve many chromatic notes.
1
Introduction
When a piece of music is said to be in a specific key, we usually mean that the piece starts and ends in this key. It sometimes happens that the piece is entirely in the same key, however often, other keys occur at several instances in the music. A modulation is the act or process of changing from one key to another. In the research of key finding (see for example Krumhansl 1990; Temperley 2001; Longuet-Higgins and Steedman 1971; Chew 2002, 2006), the most difficult part of the analysis is usually formed by the modulations. Vos and van Geenen (1996) developed a key-finding model and detected only two of the six modulations that were analyzed by Keller (1976), when the model was tested on the 48 fugues of Bach’s Well-Tempered Clavier. Furthermore, it also found modulations in 10 other cases in which there was no modulation according to Keller (1976). Temperley (2001) tested his model on the same corpus and found only two of the modulations correctly. Therefore, a specially designed program to indicate the modulations in a piece of music would be a helpful tool to implement in several key finding models.
2
Probability of Convex Sets in Music
It has been observed that the major and minor diatonic scales as well as the diatonic chords form so-called convex sets if they are represented in the Euler T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 88–96, 2009. c Springer-Verlag Berlin Heidelberg 2009
Automatic Modulation Finding Using Convex Sets of Notes Bbb
Db
F
A
Ebb
Gb
Bb
D
F#
Abb
Cb
Eb
G
B
D#
Dbb
Fb
Ab
C
E
G#
B#
Bbb
Db
F
A
C#
E#
Gb
Bb
D
F#
A#
Eb
G
B
D#
89
Fig. 1. Representation of the Euler lattice with the major and minor diatonic scale indicated by a region in lines and dashed lines respectively
lattice1 (Honingh and Bod 2005). This means that the regions or shapes that are described by these scales in the Euler lattice do not have holes or inlets, see figure 1. Let us briefly go into this notion of convexity. Each note can be found at more than one location in the Euler lattice2 (for example, we can find two instances of the note D in fig. 1), which means that each chord can be represented by several configurations of notes in the Euler lattice (see fig. 2). A set of notes defined by its note names (like C, E, G) is said to be convex if (at least) one of its configurations in the Euler lattice constitutes a shape without any holes or inlets (A more formal definition and detailed explanation is given in Honingh 2006). An example is given in figure 2. The finding that the diatonic scales and chords form convex sets, might suggest that non convex subsets of the diatonic scales are not so common. If this is indeed the case, a non-convex set within a piece of music may indicate that this specific set is not part of a diatonic scale which could in turn indicate a modulation in the music. To verify the correctness of this reasoning, we need to investigate the convexity of all possible subsets of the diatonic scales. Hence, we will address the question “what is the chance for a set of n randomly chosen notes from a piece of music to be convex?”. Assuming a certain piece is in one and the same key, this means calculating the chance that a subset of n notes from a scale is convex. We calculate for each possible n-note set that is a subset of the major diatonic scale, whether it is convex or not. This results in a percentage of convex sets. A Matlab program was written for this purpose. The results are displayed in table 1. The values 1 and 7 are left out because the convexity of one note does not mean anything, and there is only one configuration for 7 notes within one scale which is the whole scale and which is necessarily convex. The Euler lattice 1 2
The ‘Euler lattice’ and minor variants of it are known under various names such as ‘Tonnetz’, ‘Oettingen lattice’, and ‘harmonic network’. The different positions of a note in the Euler-lattice are connected to their frequency ratios. See Honingh (2006) for more information.
90
A. Honingh Fb
Dbb
Ab
Bbb
Db
F
Gb
Bb
D
F#
A#
Cb
Eb
G
B
D#
F##
Fb
Ab
C
E
G#
B#
D##
Db
F
A
C#
E#
Bb
D
F#
A#
C##
Bbb
C#
Ebb
Gb
Dbb
A
G
B
D#
G##
F## Fb
Ab
Bbb
Db
F
Ebb
Gb
Bb
D
F#
A#
Abb
Cb
Eb
G
B
D#
F##
Fb
Ab
C
E
G#
B#
D##
Db
F
A
C#
E#
G##
C##
Gb
Bb
D
F#
A#
Eb
G
B
D#
F##
C A
Fb
Ab
Bbb
Db
F
E
Abb
Bbb
Eb
C
Dbb
C A
C#
Ebb
Gb
Bb
D
F#
A#
Abb
Cb
Eb
G
B
D#
F## D##
Fb
Ab
C
E
G#
B#
Bbb
Db
F
A
C#
E#
G##
Gb
Bb
D
F#
A#
C##
Eb
G
B
D#
F##
E C#
Dbb
E
Fb
Ab
Bbb
Db
F
C A
E C#
Ebb
Gb
Bb
D
F#
A#
Abb
Cb
Eb
G
B
D#
F## D##
Fb
Ab
C
E
G#
B#
Bbb
Db
F
A
C#
E#
G##
Gb
Bb
D
F#
A#
C##
Eb
G
B
D#
F##
Fig. 2. Possible configurations of the triad C, E, G. The set C, E, G is said to be a convex set since the first configuration in this figure constitutes a shape without any holes or inlets. Table 1. Percentage of n-note sets that are convex if chosen from a major scale number of notes in percentage convex the set 2 100 % 3 94.29 % 4 94.29 % 5 100 % 6 100 %
is an infinitely large 2-dimensional lattice, however for reasons of computing, we consider here a 9 × 9 lattice. This is big enough to contain all sets that we want to consider and it contains also enough configurations of a set to calculate whether it is a convex set or not. Given a set of note names to the computer program, the program computes every configuration in the 9 × 9 lattice. In the 9 × 9 plane, every note name has 2 or 3 possible configurations. Therefore, if a set consists of n notes, the number of possible configurations lies between 2n and 3n . One could argue that the notes from a piece of music in one key do not only come from one scale, even if the piece of music is in one and the same key. Often more notes appear in a piece of music than only the notes from the scale. For example, in the first fugue from the Well-tempered Clavier of Bach, which is written in C major, the notes that appear throughout the piece are the notes from the major scale in C plus the additional notes F , B, C and G. The idea that the key contains more notes than the scale of the tonic has been formalized
Automatic Modulation Finding Using Convex Sets of Notes
91
Table 2. Percentage of n-note sets that are convex if chosen from the set of notes representing the C major scale with an additional F number of notes percentage in the set convex 2 100 % 3 92.86 % 4 88.57 % 5 92.86 % 6 100 % 7 100 % Table 3. Percentage of n-note sets that are convex if chosen from the set of notes representing the C neutral minor scale with additional D, A, E, B, F number of notes percentage in the set convex 2 100 % 3 80.91 % 4 59.80 % 5 52.02 % 6 49.03 % 7 51.01 % 8 58.79% 9 71.82% 10 89.39 % 11 100%
by, among others, Van de Craats (1989). He claims that in a major key, the augmented fourth is often used and should therefore be included in the scale. This means that in C major, the scale would contain the notes (given in a fifth sequence): F, C, G, D, A, E, B, F . A piece of music in C minor can contain the notes (given in a sequence of fifths): D, A, E, B, F, C, G, D, A, E, B, F , according to Van de Craats. In accordance with the latter claim, Longuet-Higgins (1987) states that “a note is regarded as belonging to a given key if its sharpness3 relative to the tonic lies in the range -5 to +6 inclusive”. Results by other researchers (Youngblood 1985; Knopoff and Hutchinson 1983; Krumhansl and Kessler 1982) are in agreement with Longuet-Higgins’ and Van de Craats’ suggestions. These ‘scales’ of 8 and 12 notes respectively can be used as new key-contents for our Matlab program, to calculate the percentages of sets that are convex. The results can be found in tables 2 and 3. The bigger the total set of notes is to choose from, the higher are the percentages of non-convex subsets. Therefore, in table 3 the percentages of convex sets decrease to a minimum of 49.03% at n = 6, meaning that there is a reasonable chance of finding a 6 note set that is non-convex in a piece of music written in a minor key. From both tables 2 and 3 we see that the highest 3
Sharpness is understood here as the position of pitch name on the line of fifths.
92
A. Honingh
percentages of convex sets appear for the smallest and biggest possible sets in the key. This suggests that the smallest and the biggest non-convex sets are the best indicators of modulations. From the above results we learn that if we choose randomly a set of notes from one key, there is a high chance for the set to be convex. Therefore, we hypothesize that, if we analyze a piece of music by dividing it into sets of n notes, most of the sets are convex. It is thus more special in a piece of music for a set to be non-convex than convex. And because we have seen that sets from one key tend to be convex, a non-convex set within a piece could point to a change of key or modulation. 2.1
Finding Modulations by Means of Convexity
A Matlab program is written that finds modulations in a piece of music by localizing non-convex sets. The more sets that are not convex around a certain location, the stronger is the indication of a change of key. To be able to judge all n-tone sets on convexity, we introduce a sliding window of width n moving over the piece. We start with a window of width 2 after which we enlarge it to 3, etc. We stop at a width of 7 notes, since non-convex n-tone sets with n > 7 rarely occur for a major key. Furthermore, for n > 7, the computation gets highly intensive since all possible configurations (which is a number between 2n and 3n ) should be checked. For each non-convex set a vertical bar is plotted at the position of the notes in the piece that it affects. The hight of the bar represents the number of notes in the set. For each n, a sliding window is moving over the piece resulting in a histogram. These histograms belonging to n = 2 to 7 are plotted in the same figure such that the result is one histogram presenting all non-convex sets in a piece of music. If a set of notes contains several identical notes, the set will be reduced to the set of notes that contains of each note only one. For example, the set of notes {D, E, D, F } gets reduced to the set {D, E, F }. If such a set turns out to be non-convex, it will be indicated in the histogram with a bar of which the hight is associated with the number of notes in the reduced set. The music that we tested the model on is from the Well-tempered Clavier of Bach. Data files containing the notes and other information from all preludes and fugues in the first book of J. S. Bach’s Well-tempered Clavier (BWV 846869) was made available by Meredith (2003). We used these files as input for our program. The only input used by our model are the note names, so no rhythm, meter, note length, key information etc. was involved. As an example we consider the third prelude from book I of the Well-tempered Clavier. The bars in figure 3 show the position of the non-convex sets in the piece. The xaxis represents number of bars in the piece of music, the prelude consists of 104 bars. The values on the y-axis indicate the number of notes in a set. Looking at figure 3 we see three regions in the music in which a lot of non-convex sets appear. We will now see how these regions relate to the structure of the piece. In Bruhn (1993), an analysis of the third prelude can be found. The analysis states that from bar 31 to 35 there is a modulation from A minor to D minor,
Automatic Modulation Finding Using Convex Sets of Notes
93
Bach WTC, BWV848a
number of notes in non−convex set
7
6
5
4
3
2
1
10
20
30
40
50
60
70
80
90
100
measure no.
Fig. 3. Histogram of non-convex sets in the third prelude from the Well-tempered clavier. On the x-axis, the number of bars in the piece is indicated, the y-axis indicates the number of notes in the non-convex set.
from bar 35 to 39 a modulation from D minor to G major, from bar 39 to 43 a modulation from G major to C major and from bar 43 to 47 a modulation from C major to F major. One can see from figure 3 that this region of modulations in bars 31 to 46 is precisely indicated by the first cluster of bars. Looking to the second cluster of bars (bars 63 to 72) in figure 3, one can see that this pattern is repeated a bit later in bars 87 to 96 . These two regions correspond to two (similar) pieces in G having a pedal on the tonic. There are no modulations involved but the notes of the seventh chords are melodically laid out in a way that in forming sets often the fifth is omitted and therefore some sets are nonconvex. The last region of bars in figure 3 is from bar 97 to 102. This region represents a melodic line in which a lot of chromatic notes are involved. One can not become aware of one specific key until the last two bars where the piece again resolves in C major. In the regions in between the marked parts (white space in fig 3) no modulations are present. In those regions the music is in a certain key, which can vary over time, i.e. there can be (sudden) key changes from one bar to another. This method of looking at non-convex sets is therefore only suitable for longer modulation processes. We have learned that sets consisting of 6 and 7 notes give a stronger indication to a change of key than sets consisting of less notes. Therefore regions 1 (bars 31 to 46) and 4 (bars 97 to 102) have strong(-er) indications of a modulation than regions 2 (bars 63 to 72) and 3 (bars 87 to 96) which is in accordance with the analysis of the piece. Thus, this third prelude serves as an indication that the modulation finding program works well. Unfortunately, the method did not work well for all pieces. For pieces in a minor key it was not sufficient to calculate the non-convex sets up to 7 notes, since, according to table 3 we can learn most about these pieces if we look at
94
A. Honingh
10 or 11 note sets. Since the number of possible configurations of a set of notes is a number between 2n and 3n , the analysis of configurations in a minor key requires too much computational time.
3
Results
It is difficult to test the performance of the algorithm for a number of reasons. From the side of looking at the histograms it is difficult to decide which instances of plotted bars to count as an indication of a modulation. We would have to decide on an offset regarding both the number of notes in the non-convex set and the number of non-convex sets. To give an example, an instance of a nonconvex set of 4 notes would not indicate a modulation, but perhaps a nonconvex set of 7 notes would; and one instance of a non-convex set would not be a strong indication of a modulation, but three instances are. From the side of the music, it is difficult from the point of a definition of a modulation. Many types of modulations exist and our algorithm works for some better than for others. For example, a modulation can be made by using a common chord of two (closely related) keys, in which case the algorithm would have difficulty indicating the modulation. The algorithm performs best on modulations that involve some chromatism in between one key and the other, like for example a sequential modulation. Since we do want to be able to make a general judgment on the performance of the algorithm, we decide on the following. We count only the instances that include non-convex sets of 6 or 7 notes in the histogram as indications of modulations. Furthermore, we decide on counting every peak in the histogram as one modulation if the peaks are widely separated, which is, more than three bars apart. More than one peak within three bars therefore merges to one instance of a modulation. We checked the results using the analysis of Bruhn (1993). This work gives a thorough analysis of the Well tempered Clavier, however it was sometimes difficult to interpret the true points of modulation from it. Therefore, we have chosen to only count a modulation when either Bruhn uses the word ’modulation’, or a clear key change is pointed out, or a region of chromatism is pointed out. Herewith, note that we also count the pieces of chromatism that can occur within one key, which therefore do not indicate a real modulation. The algorithm was tested on the first five preludes and fugues in major keys from the first book of Bach’s Well-tempered Clavier. The results are given in table 4. The number of correctly indicated modulations represents the total number of modulations noted by Bruhn (1993) that are correctly identified by the algorithm. Table 4. Results of modulation searching process on the five first preludes and fugues in major keys from book I of the Well-tempered Clavier no. of correctly indicated no. of false no. of false modulations positives negatives 11 7 12
Automatic Modulation Finding Using Convex Sets of Notes
95
The number of false positives represents the number of modulations marked by the program which are not modulations according to Bruhn (1993). The number of false negatives represents the instances that are modulations according to Bruhn (1993) which have however not been marked as modulations by the program. The false positives appear due to regions in the music where many non-diatonic notes are present, such as complicated extended cadences, pedal notes, and chromatic ornamentation. The false negatives appear mostly due to modulations to closely related keys.
4
Conclusions
We have seen that studying non-convex sets can give a rough analysis of the modulations in a piece. The pieces in a major key are easier to analyze than pieces in a minor key, since in the latter some ‘background noise’ of non-convex sets is present. This analysis method uses only little information of the music (only the note names under octave equivalence) which indicates that the method can still be improved. Furthermore, it can perhaps be integrated in other modulation finding theories to optimize the results. Since the method can also be used to visualize repeated (and closely related) passages in a piece of music (of which an example was given in fig. 3), the method can possibly contribute to a structural analysis of music.
Acknowledgments A substantial part of this research was carried out at the University of Amsterdam in the context of the NWO project ’Towards a unifying model for linguistic, musical and visual processing’. The author wants to thank Rend Bod, Henk Barendregt, Elaine Chew and Timour Klouche for helpful comments and suggestions.
References Bruhn, S.: J.S. Bach’s Well-Tempered Clavier: In-depth Analysis and Interpretation. Mainer International Ltd., Hong Kong (1993); Transcription for the Web published (2002-2003) Chew, E.: The spiral array: An algorithm for determining key boundaries. In: Anagnostopoulou, C., Ferrand, M., Smaill, A. (eds.) ICMAI 2002. LNCS, vol. 2445, pp. 18–31. Springer, Heidelberg (2002) Chew, E.: Slicing it all ways: Mathematical models for tonal induction, approximation and segmentation using the spiral array. INFORMS Journal on Computing 18(3) (2006) Honingh, A.K.: The Origin and Well-Formedness of Tonal Pitch Structures. Ph. D. thesis, University of Amsterdam, The Netherlands (2006) Honingh, A.K., Bod, R.: Convexity and the well-formedness of musical objects. Journal of New Music Research 34(3), 293–303 (2005)
96
A. Honingh
Keller, H.: The Well-Tempered Clavier by Johann Sebastian Bach. Norton. Translated by Leigh Gerdine, London (1976) Knopoff, L., Hutchinson, W.: Entropy as a measure of style: The influence of sample length. Journal of Music Theory 27, 75–97 (1983) Krumhansl, C.L.: Cognitive Foundations of Musical Pitch. Oxford Psychology Series, vol. 17. Oxford University Press, Oxford (1990) Krumhansl, C.L., Kessler, E.J.: Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review 89, 334–386 (1982) Christopher Longuet-Higgins, H.: The perception of melodies. Mental Processes: Studies in Cognitive Science, pp. 105–129. British Psychological Society/ MIT Press, London (1987/1976); published earlier as Longuet-Higgins (1976) Christopher Longuet-Higgins, H., Steedman, M.: On interpreting Bach. In: Christopher Longuet-Higgins, H. (ed.) Mental Processes: Studies in Cognitive Science, pp. 82– 104. British Psychological Society/ MIT Press, London (1987/1971); published earlier as Longuet-Higgins and Steedman (1971) Meredith, D.: Pitch spelling algorithms. In: Proceedings of the Fifth Triennial ESCOM Conference, pp. 204–207. Hanover University of Music and Drama, Germany (2003) Temperley, D.: The Cognition of Basic Musical Structures. MIT Press, Cambridge (2001) Van de Craats, J.: De fis van Euler: Een nieuwe visie op de muziek van Schubert, Beethoven, Mozart en Bach. Aramith Uitgevers, Bloemendaal (1989) Vos, P.G., van Geenen, E.W.: A parallel-processing key-finding model. Music Perception 14(2), 185–224 (1996) Youngblood, J.E.: Style as information. Journal of Music Theory 2, 24–35 (1985)
On Pitch and Chord Stability in Folk Song Variation Retrieval J¨ org Garbers1 , Anja Volk1 , Peter van Kranenburg1, Frans Wiering1 , Louis P. Grijp2 , and Remco C. Veltkamp1 1
Department of Information and Computing Sciences, Utrecht University 2 Meertens Institute Amsterdam [email protected]
Abstract. In this paper we develop methods for computer aided folk song variation research. We examine notions and examples of stability for pitches and implied chords for a group of melodic variants. To do this we employ metrical accent levels, simple alignment techniques and visualization techniques. We explore how one can use insight into stability of a known set of variants to query for additional variants.
1
Introduction
The goal of the WITCHCRAFT project (What Is Topical in Cultural Heritage: Content-based Retrieval Among Folksong Tunes) is to develop a content-based retrieval system for a large number of folk song melodies stored as audio and notation. Its purpose is on the one hand to aid folk song researchers in tracing and classifying variants of folk songs and on the other hand to allow the general public to search for melodies with a simple Query by Humming or Keyboard interface. Representing melodies and melodic queries as weighted point sets in the onsetpitch domain, as done in the Muugle system [1], proved to perform well in combination with a couple of pre- and post-processing methods in the general public query task [10]. In the initial part of our project we have tested Muugle’s fitness on a test corpus of 141 symbolically encoded Dutch folk songs for the purpose of the folk song research task: variant classification. Although the results were quite promising, it became clear, that an extended system, which uses more information from the user-query, from the data and from additional feature extractors, would enable researchers to retrieve and classify folk songs in more informed ways. The present paper is about this topic. Overview We assume that classification and retrieval of melodic variants can benefit from the investigation into stable features across melodies which are known to be T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 97–106, 2009. c Springer-Verlag Berlin Heidelberg 2009
98
J. Garbers et al.
related to each other. To know what typically remains stable from one variation of a melody to another allows folk song researchers to decide, if any given melody belongs to a variant group or not. To support this kind of classification with a search engine, we must find ways to formulate queries that specify what should be matched and how strictly. For convenience we prefer to automatically derive such queries from a set of melodies, that are known to belong to the same variant group. (See [5].) Note that the examples and figures, included in this article, are only given to exemplify our computer aided methods. All figures (except for the first) were automatically generated from Humdrum-**kern sources with the help of the Humdrum toolkit, the Guido note-viewer, Rubato and additional scripts that are to be executed once. [7, 6, 8] In section 2 we develop modifications to our present computational framework in order to allow to search for variants in a database, when a group of variant melodies that exemplify the stability and variability of a melody class is given as a query. In section 3 we examine the stability of pitch across variants and how to use this information for querying and in section 4 we do the same for chords that can be implied from the variants.
2
Modifications of the Retrieval System
The unmodified Muugle system compares a melody query, given as a sequence of events in the onset-pitch-duration domain, with melodies from a database and comes up with a ranked list of close matches. It does so (in principle) by computing the so called “Earth-Movers-Distance” (EMD) between the query melody and every database melody, represented as weighted point sets. (See [11] in this volume.) By interpreting the note durations as weights, Muugle assures that it always matches similar amounts of musical duration. Besides of this the EMD requires the definition of a “ground distance”, which in the unmodified Muugle system is realized in terms of the euclidean distance in the onset-pitch domain, with pitch measured in semitones and onset in seconds. A scaling factor in the onset dimension is used to balance the influence of pitch and time in the computation of the ground distance. In preparation of the following sections we need a generalization of our initial ‘melody matches melody’ - approach. In the generalization we want to match a single pitch q at some onset in a melody either a) against a set P of alternative pitches for that same onset or b) against a distribution P of such alternative pitches. The idea is to use event sequences consisting of such alternatives as queries in order to find matching melodies in the database. To formulate these kinds of query with respect to the EMD, we simply have to redefine the pitch distance component of the ground distance for pitch sets and for pitch distributions: a) Let P be a set of (alternative) pitches and q a fixed pitch. Then the minimum pitch distance between q and P is the minimum of the distances between q and any of the pitches in P .
On Pitch and Chord Stability in Folk Song Variation Retrieval
99
b) Let P be a distribution of pitches and q a fixed pitch. Then the average pitch distance between q and P is the weighted average of the distances between q and the pitches in P . Technically, we leave the pre- and post-processing features of the Muugle system intact, as they allow us to compute and combine partial matches and gain transposition and tempo invariance. Several effects of these modifications on the retrieval performance are studied in [3].
Fig. 1. Two EMD matching options for F and C with the refined pitch distance a)
Figure 1 illustrates the effects of the refined ground distance (a). There are four pitch sets in the upper sequence consisting of one pitch each (a melody) and two pitch sets in the lower sequence (a chord sequence). All note duration must flow (in terms of the EMD; see [9] for the formal definition) from the melody to the chords. The G clearly flows to the C major chord and the A to the F major chord. The rest depends on the onset scaling factor: If it is large then F selects a ‘close mismatch’ both in time and pitch and matches with the remaining duration from C major chord and C matches perfectly. If it is low, then F matches with the F from F major chord and the C satisfies the remaining duration of the C major chord.
3
Pitch Stability
In this section we develop methods that help to investigate the pitch variability of a given group of melodic variants. 3.1
Metrical Levels
Metrical symbols such as time signatures (4/4, 6/8) and barlines are used in common music notation to encode metrical accents structures on the time axis and can be used to infer note accents. As a working hypothesis we assume that metrically more accented notes are more stable across folk song variations than less accented notes. We expect smaller amounts of pitch variation on accented onsets in comparison to less accented onsets.
100
J. Garbers et al.
In order to test this hypothesis we visually explore our data by using the Humdrum metpos command to mark the notes of each folk song in a variant group with its position within a metric hierarchy (Levels: 1=bar, 2=half-bar, 3=eights, 4=sixteenth). Then we align the songs in each set by dropping upbeats and unmatched verses. For each metrical level we extract all notes above that level and produce views to compare the projection behaviors on the different levels. When looking for the characteristics of an aligned variant group, we start methodically with very abstract views and proceed to detail views, if necessary. Figures 2-4 show some automatically derived views for the manually aligned variant group ‘Frankrijk B1’ of Onder de groene Linde. [2] 3.2
Evaluation of Pitch Stability
Figure 3 gives us a quick view on the pitch material used per bar at the different metrical levels. By definition we get less or an equal number of pitches at higher metrical levels. But it is interesting to see that there are quite different ranges, both in pitch number and ambitus: While the variation in pitch in bar 3 is reflected on all metrical levels, the variation vanishes on higher metrical levels in bar 7. This might lead us to different matching strategies for different segments (e.g. contour search vs. chord search) when looking for additional members of the variant class that this variant group is a subset of in the database. Figure 4 provides us with slightly more detailed snapshots across all variants taken at different metrically motivated grid positions. The note stability increases from the sixteenth up to the half-bar-level but not up to the bar level. In bar 6, second beat, we have even more stability than on the first beat. To investigate further, where the remaining instability comes from, we look at figure 2 and check the onset positions where the bottom staff contains more than one note. By looking at the other staffs we check, how many variants are responsible for each pitch. In some cases all variants agree except for one outlier (often the first line, e.g. in bar 4), in other cases we find corresponding subgroups within the variants (e.g. last beat in bar 4). Such subgrouping can be interpreted as local pitch alternatives or might lead to the insight that the group actually consists of different coherent subgroups, if the pitches are more often stable within such subgroups. 3.3
Query Formulation
Assume that there are still unclassified melodies in the database and some partial variant groups are already established. To present the user good additional candidates for a given variant group, we can proceed as follows: We first either manually or automatically align the given melodies. Then we compute the pitch sets or pitch distributions for each onset at every metrical level. We construct a query with all alternative pitches or pitch distributions for every onset. We use this for searching in the Muugle database, as described in section 2.
On Pitch and Chord Stability in Folk Song Variation Retrieval
101
Before actually querying the database, one might also want to refine the query (i.e. the pitch distributions) by hand, to get closer to the melodic model that one believes the variants stem from. For this refinement one can use harmonic information (see following Section 4). Matching for example any (new) melody candidate’s first and last bar against the G major chord, seems to be a good generalization (see figure 4).
4
Implied Chord Stability
In this section we develop methods that help to investigate the harmonic variability of a given group of melodic variants and that help to automatically find good candidates for new members of such a group. 4.1
Harmonization
Not all melodies follow harmonic building principles or have implied harmonizations. However, many melodies do allow genre specific harmonizations or already follow associated harmonic constraints. This allows even less trained singers to sing an additional voice, in folk songs typically one third or fourth apart. While melody proceeds at beat level or faster, harmony typically changes more slowly at bar or half-bar level. Melodies that strongly suggest specific harmonizations often contain chord notes as long notes and/or contain them on metrically strong beat positions, and have non-chord tones such as passing tones on metrically weaker beat positions. (Suspensions are an exception of this rule.) When locating actual notes within hypothesized triads, we enter the domain of interpretation and ambiguity. We interpret the given tones in the light of a harmonic model to aid the understanding of the music or to generate accompaniments. In our evaluation we follow the approach described in [4] because we have a harmonic analyzer (HarmoRubette) available as a tool within Rubato, that produces the desired harmonic information for the best harmonic path of a given sequence of pitch sets. We can use this information later to generate prototypical chords that best represent the harmonic information. In a successful analysis those chords would be much the same as those a musician would use in an accompaniment of that melody. 4.2
Evaluation of Implied Chord Stability
We have tried different harmonic analytic models, i.e. different music-theoretical parameterizations of the HarmoRubette. But since the models and their analytic results were still far from optimal we do not go into their details, but show the preliminary results for the different metrical levels. The HarmoRubette generates for each onset a function symbol and a key, such as S(G), the subdominant in G. When running the automatic analysis on the melodies at different metrical levels, we find visually irritating results. For
102
J. Garbers et al.
some variants the results are completely in G major, for some in C major. That makes T(C) and S(G) look quite dissimilar and symbol sequences difficult to compare. An option to cope with this ambiguity is to constrain the analysis to a single key (e.g. G major). Otherwise one would need to invest more knowledge about the harmonic structures behind the melodies and reevaluate them. We did not follow such an approach yet. Two other options are to listen to the represented chords or to compare chord roots instead of the function symbols. (See figure 5 for illustration.) We get more symbol stability, but not surprisingly still different interpretations for ’A’ pitches in G and in C. Already now, the resulting set of chords and — in this case — the extended set of involved chord notes can be used in the role of note distributions in the extended distance measure of section 2. However, this will only sort out harmonically distant melodies but will not result in a fine ranking, because it allows very many melodic alternatives. 4.3
Contextualization
If a melody is naively (mis)-interpreted as a sequence of one-note-chords, the HarmoRubette naturally yields strange results. It chooses fluctuating tonalities, as the harmony sequences are so much under-specified. Analytic ambiguity is inherent to this naive harmonic analysis. But it is nevertheless an interesting point of departure. What we prefer is to analyze more constraining chord sequences. In the following we present several ideas where these more constraining additional notes may come from in the environment. First, the additional notes can come from notes that belong to lower metrical levels of the time span that an accentuated pitch represents. Figure 2 shows many examples, where often from three 8th notes two can be considered chord tones. In practice the harmonic analyzer can be left alone to figure out, which notes make sense as chord notes in the larger context. We simply have to feed it all notes at once. Another option is to try to derive a common chord scheme from the whole set of variations. We tested this by running the analysis on the chords of figure 4 and found the consistent key G major with some very short deviations to ’ii’ and ’vi’ on lower metrical levels. (See figure 5.) From this we might conclude, that to test, if a new candidate melody belongs to the variation group, we just have to build the common chord set and see, if the new notes do not ‘disturb’ the analysis. The HarmoRubette also comes up with note weights that express the conformance of the notes with the analyzed harmonic loci. However, this might not be possible in general for manually unprepared melodies, that can differ in slightly shifted onsets. In such a case the note distribution matching strategy within the EMD seems to be more promising.
On Pitch and Chord Stability in Folk Song Variation Retrieval
5
103
Excerpts from the Variation Group ‘Frankrijk B1’
Fig. 2. The melodies at metrical levels 3 (eights) and above and chords resulting from projecting all notes. (Bars 3-5)
Fig. 3. A collected view of all pitches per bar from figure 4. The staffs refer to reductions to different metrical grids.
104
J. Garbers et al.
Fig. 4. Four views of all notes of all variations. Each staff shows the notes that fall on the grid of a particular metrical level.
On Pitch and Chord Stability in Folk Song Variation Retrieval
105
Fig. 5. Automatic root analysis of the sequence of alternative pitches at metrical level 2 (half-bar). The left column shows the pitches, the right columns show their shared functional analysis and root chords.
6
Summary
For a group of folk song variations we have looked into the note stability and the stability of the ’best harmonic symbol sequence’ on the onset, tactus and bar levels. Therefore we developed a set of tools and views that allow us to get a quick impression about the stability of features for a set of variants at different metrical levels. We found them quite useful to visually examine the pitch stability and found our hypothesis verified, that melody tones at strong positions are more stable among variants. We have also presented the idea to use this information in a refined transportation distance measure, that can match pitch distributions with pitches.
106
J. Garbers et al.
In the follow up paper [3] we show that this leads to better retrieval performance and in [5] we elaborate on making automatic alignments rather than manual alignments. We will further study these methods within the WITCHCRAFT project to improve our general public search engine.
Acknowledgements This work was kindly supported by the Netherlands Organization for Scientific Research within the WITCHCRAFT project NWO 640-003-501, which is part of the program Continuous Access to Cultural Heritage. Further we want to thank the developers of the Humdrum and Rubato toolkits and the encoders of the Dutch folk songs, who made this investigation possible.
References [1] Bosma, M., Veltkamp, R.C., Wiering, F.: Muugle: A framework for the comparison of music information retrieval methods. In: Proceedings of the ICMPC 2006, pp. 1297–1303 (2006) [2] van Dijk, M.B.G., Kuijer, H.J., Dekker, A.J. (eds.): Onder de groene linde. Verhalende liederen uit de mondelinge overlevering. Uitgeverij Uniepers, Amsterdam (1987-1991) [3] Garbers, J., van Kranenburg, P., Volk, A., Wiering, F., Grijp, L., Veltkamp, R.C.: Using pitch stability among a group of aligned query melodies to retrieve unidentified variant melodies. In: Dixon, S., Bainbridge, D., Typke, R. (eds.) Proceedings of the Eighth International Conference on Music Information Retrieval, pp. 451–456. Austrian Computer Society (2007) [4] Garbers, J., Noll, T.: New perspectives of the harmorubette. In: Lluis-Puebla, E., Mazzola, G., Noll, T. (eds.) Perspectives in Mathematical and Computer-Aided Music Theory, Verlag epOs-Music, Osnabr¨ uck (2003) [5] Garbers, J., Wiering, F.: Towards structural alignment of folk songs. In: Bello, J.P., Chew, E. (eds.) Proceedings of the Nineth International Conference on Music Information Retrieval (2008) [6] Hoos, H.H., et al.: Guido, http://guidolib.sourceforge.net/ [7] Huron, D., et al.: Humdrum, http://music-cog.ohio-state.edu/Humdrum/ [8] Mazzola, G., Zahorka, O., Garbers, J.: Rubato, http://www.rubato.org [9] Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40(2), 99–121 (2000) [10] Typke, R.: Music Retrieval Based on Melodic Similarity. PhD thesis, Utrecht University (2007) [11] Volk, A., Garbers, J., van Kranenburg, P., Wiering, F., Grijp, L., Veltkamp, R.C.: Music retrieval based on melodic similarity. In: Noll, T., Klouche, T. (eds.) Mathematics and Computation in Music: Proceedings of the MCM 2007 (2008)
Bayesian Model Selection for Harmonic Labelling Christophe Rhodes , David Lewis, and Daniel M¨ ullensiefen Department of Computing, Goldsmiths, University of London, SE14 6NW, United Kingdom [email protected]
Abstract. We present a simple model based on Dirichlet distributions for pitch-class proportions within chords, motivated by the task of generating ‘lead sheets’ (sequences of chord labels) from symbolic musical data. Using this chord model, we demonstrate the use of Bayesian Model Selection to choose an appropriate span of musical time for labelling at all points in time throughout a song. We show how to infer parameters for our models from labelled ground-truth data, use these parameters to elicit details of the ground truth labelling procedure itself, and examine the performance of our system on a test corpus (giving 75% correct windowing decisions from optimal parameters). The performance characteristics of our system suggest that pitch class proportions alone do not capture all the information used in generating the ground-truth labels. We demonstrate that additional features can be seamlessly incorporated into our framework, and suggest particular features which would be likely to improve performance of our system for this task.
1
Introduction
This paper introduces a straightforward model for labelling chords based on pitch-class proportions within windows, and using this model not only to generate chord labels given a symbolic representation of a musical work, but also to infer the relevant level of temporal granularity for which a single label is justified. The generation of these chord labels was initially motivated by the desire to perform automated musical analysis over a large database of high-quality MIDI transcriptions of musical performances, as part of a larger study investigating musical memory. While the MIDI transcriptions are of high-fidelity with respect to the performances they represent, they do not include any analytic annotations, such as song segmentation, principal melody indications, or significant rhythmic or harmonic motifs; all of these must be generated if desired, but it is not practical to do so manually over the collection of some 14,000 pop song transcriptions.
Corresponding author.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 107–116, 2009. c Springer-Verlag Berlin Heidelberg 2009
108
C. Rhodes, D. Lewis, and D. M¨ ullensiefen
A time sequence of chord labels, as a compact representation of the harmony of the musical work, can not only be used as the basis for the detection of largerscale harmonic features (such as cadences, clich´es and formulae), but can also inform a structural segmentation of the music, since harmony is an indicator of structure in many popular music styles. Such segmentation is a necessary first step for other feature extraction tools – it is, for example, a prerequisite for the melody similarity algorithms presented in M¨ ullensiefen and Frieler (2004). A second use for these chord labels is the automatic generation of lead sheets. A lead sheet is a document “displaying the basic information necessary for performance and interpretation of a piece of popular music” (Tagg 2003b). The lead sheet usually gives the melody, lyrics and a sequence of short chord labels, usually aligned with the melody, allowing musicians to accompany the singer or main melody instrument without having a part written out for them. An advantage of the model we present in this paper is that the overall framework is independent of the type of harmony scheme that it is used with: for example, it can be adapted to generate labels based on tertial or quartal harmonic classification (Tagg 2003a). Furthermore, a similar model selection stage can be used to choose which harmonic classification is most appropriate for a given work, a decision which can be guided by information not present in the observed musical data (such as a genre label) by incorporating that information into a prior probability model. The rest of this paper is organized as follows: after a discussion of previous related work in section 2, we present our model for the dependency of pitchclass content on the prevailing chord, forming the core of our simple model, and discuss its use in window size selection in section 3. We discuss implementation of parameter inference and empirical results in section 4, and draw conclusions and suggest further work in section 5.
2
Previous Work
Most previous work on chord label assignment from symbolic data is implemented without an explicit model for chords: instead, preference rules, template matching and neural network approaches have been considered (Temperley 2001, Chapter 6 and references therein); an alternative approach involving knowledge representation and forward-chaining inference has also been applied to certain styles of music (Pachet 1991; Scholz et al. 2005). One attempt to use probabilistic reasoning to assign chord labels uses a Hidden Markov Model approach with unsupervised learning (Raphael and Stoddard 2004) of chord models; however, the authors note that they do not provide for a way of making decisions about the appropriate granularity for labelling: i.e. how to choose the time-windows for which to compute a chord label. There has been substantial work in the symbolic domain on the related task of keyfinding. For instance, Krumhansl (1990, Chapter 4) presents a decision procedure based on Pearson correlation values of observed pitch-class profiles with profiles generated from probe-tone experiments. Another class of algorithms used
Bayesian Model Selection for Harmonic Labelling
109
for keyfinding is a geometric representation of keys and tones, attempting to capture the perceived distances between keys by embedding them in a suitable space (Chuan and Chew 2005). The profile-based model has been refined (Temperley 2001, Chapter 7) by making several modifications: altering details of the chord prototype profiles; dividing the piece into shorter segments; adjusting the pitchclass observation vector to indicate merely presence or absence of that pitch class within a segment, rather than the proportion of the segment’s sounding tones, and thus avoiding any attempt at weighting pitches based on their salience; and imposing a change penalty for changing key label between successive segments. There are existing explicit models for keys and pitch-class profiles: one such (Temperley 2004) is defined such that for each key, the presence or absence of an individual pitch class is a Bernoulli distribution (so that the pitch-class profile is the product of twelve independent Bernoulli distributions); in this model, there are also transition probabilities between successive chords. This model was further refined in Temperley (2007) by considering not just pitch classes but the interval between successive notes. These models are based on the notion of a fixed-size ‘segment’, which has two effects: first, the key models are not easily generalized to windows of different sizes, as the occurrence of a particular scale degree (i.e. pitch relative to a reference key) is not likely to be independent in successive segments; second, unless the segment length is close to the right level of granularity, a postprocessing stage will be necessary to smooth over fragmented labels. There has been more work towards chord recognition in the audio domain, where the usual paradigm is to model the chord labels as the hidden states in a Hidden Markov Model generating the audio as observation vectors (Bello and Pickens 2005; Sheh and Ellis 2003). One problem in training these models is in the lack of ground truth, of music for which valid chord labels are known (by ‘valid’ here, we mean sufficient for the purposes for which automated chord labelling is intended, though of course these may vary between users); approaches have been made to generate ground truth automatically (Lee and Slaney 2006), but such automatic ground truth generation depends on a reliable method of generating labels from the symbolic data or from something that can be mapped trivially onto it; without such a reliable method, hand-annotated ground truth must be generated, as for example in Harte et al. (2005). One feature of the method presented in this paper in contrast to most existing harmony or key identification techniques is that it has an explicit musicallymotivated yet flexible model for observable content (i.e. pitch-class distributions) at its core, rather than performing some ad-hoc matching to empirical prototypes. This flexibility confers two modelling advantages: first, the parameters of the model can be interpreted as a reflection of musical knowledge (and adjusted, if necessary, in a principled way); second, if evidence for additional factors influencing chord labels surfaces, in general or perhaps for a specific style of music under consideration, these additional factors can be incorporated into the model framework without disruption.
110
3
C. Rhodes, D. Lewis, and D. M¨ ullensiefen
Model
The repertoire of chords that we represent is triad-based (though an extension to include other bases is possible with some care over the dimensionality of the relevant spaces); motivated by their prevalence in western popular music, we aim to distinguish between major, minor, augmented, diminished and suspended (sus4 and sus9) triads with any of the twelve pitch classes as the root, and we will infer probability distributions over these chord labels given the musical content of a window. Of the six, it should be noted that augmented and diminished chords are much rarer in popular music, and that suspended chords, despite their names, are frequently treated in popular music as stable and not as needing to resolve, and so require categories of their own – e.g. in soul or country music where they form independent sonorities; see Tagg (2003a). We introduce the Dirichlet distribution on which our chord model is based, give our explicit model for the dependence of pitch-class proportions on the chord, and then explain how we can use this to perform selection of window size in a Bayesian manner. 3.1
Dirichlet Distributions
The Dirichlet distribution is a model for proportions of entities within a whole. Its density function is 1 αi −1 p(x|α) = x (1) B(α) i i with support on the simplex i xi = 1. The normalizing constant B(α) is defined as Γ (αi ) (2) B(α) = i Γ ( i αi )
∞ where Γ is the gamma function Γ (x) = 0 tx−1 e−t dt. Note that for each individual component of the whole, represented by an individual random variable xi , the corresponding αi controls the dependence of the density (1) for small values of this component: if αi > 1, the probability density tends towards zero in the limit xi → 0; if αi < 1, the density increases without limit as xi → 0. 3.2
The Chord Model
Our introductory chord model is triad-based, in that for each chord we consider the tones making up the triad separately from the other, non-triadic tones. The proportion of a region made up of triad tones is modelled as a Beta distribution (a Dirichlet distribution with only two variables), and the triad tone proportion is then further divided into a Dirichlet distribution over the three tones in the triad. Denoting the proportion of non-triadic tones as t¯, and that of triadic tones as t, where the latter is made up of root r, middle m and upper u, we can write our chord model as for tone proportions given a chord label c as p(rmutt¯|c) = p(tt¯|c)p(rmu|tt¯c)
(3)
Bayesian Model Selection for Harmonic Labelling
111
with support on the simplexes t + t¯ = 1, r + m + u = 1; each of the terms on the right-hand side is a Dirichlet distribution. We simplify the second term on the right-hand side by asserting that the division of the harmonic tones is independent of the amount of harmonic tones in a chord, so that p(rmu|tt¯c) = p(rmu|c). In principle, each chord model has two sets of independent Dirichlet parameters α; in practice we will consider many chords to be fundamentally similar, effectively tying those parameters. This simple chord model does not allow for certain common harmonic labels, such as seventh chords or open fifths (as these are not triadic); we leave this extension for further work. Additionally, there is a possible confusion even in the absence of noise between the suspended chords, as the tones present in a sus4 chord are the same as those in a sus9 chord four scale degrees higher. 3.3
Bayesian Model Selection
We start with a set of possible models for explaining some data, where each individual model is in general parameterized by multiple parameters. Given this set of distinct models, and some observed data, we can make Bayesian decisions between models in an analogous fashion to selecting a particular set of parameters for a specific model; in general, we can generate probability distributions over models (given data) in a similar way to the straightforward Bayesian way of generating probability distributions over the parameter values of a given model. For a full exposition of Bayesian Model Selection, see e.g. MacKay (2003, Chapter 28). In the context of our problem, of chord labelling and window size selection, we choose a metrical region of a structural size: in our investigation for popular music, we choose this region to be one bar, the basic metrical unit in that style. The different models for explaining the musical content of that bar, from which we will aim to select the best, are different divisions of that bar into independently-labelled sections. For example, one possible division of the bar is that there is no segmentation at all; it is all one piece, with one chord label for the whole bar. Another possible division is that the bar is made up of two halves, with a chord label for each half bar. These divisions of the bar play the rˆ ole of distinct models, each of which has Dirichlet parameters for each independentlylabelled section of the bar. In our experiment described in section 4, the corpus under consideration only contains works in common time, with four quarter beats in each bar, and we consider all eight possible divisions of the bar that do not subdivide the quarter beat (i.e. , 1+1+1+1, 1+1+2, 1+2+1, 2+1+1, 2+2, 1+3, 3+1, 4). The Bayesian Model Selection framework naturally incorporates the Occam factors in a quantitative manner: if there is evidence for two different chord labels, then the whole-bar model will not be a good fit to the data; if there is no evidence for two distinct chord labels, then there are many more different poor fits for a more fine-grained model than for the whole-bar model. To be more precise, we can write the inference over models M given observed data D as p(D|M )p(M ) p(M |D) = (4) p(D)
112
C. Rhodes, D. Lewis, and D. M¨ ullensiefen
where p(D|M ) =
p(D|cM )p(c|M )
(5)
c
is the normalizing constant for the inference over chord labels c for a given model M . Note that there is an effective marginalization over chord labels for each model – when considering the evidence for a particular model, we add together contributions from all of its possible parameters, not simply the most likely. We can use the resulting probability distribution (4) to select the most appropriate window size for labelling. The flexibility of this approach is evident in equation (5): the chord models p(D|cM ) can differ in parameter values or even in their identity between window sizes, and that the prior probabilities for their generation p(c|M ) can also be different for different models of the bar M .
4 4.1
Experiment Parameter Estimation
In order to test our chord model, (see equation 3), we must choose values for the α parameters of the Dirichlet distributions. We summarize the maximumlikelihood approach (from a labelled ‘training set’) below, noting also the form of the prior for the parameters in the conjugate family for the Dirichlet distribution); in addition, we performed a large search over the parameter space for the training set, attempting to maximize performance of our model at the labelling task with a binary loss function. i ) log xi ]−log B(α) We can rewrite the Dirichlet density function (1) as e− i [(1−α , demonstrating that it is in the exponential family, and that i log xi is a sufficient statistic for this distribution; additionally, there is a conjugate prior for the parameters of the form π(α|A0 , B 0 ) ∝ e−
i
[(1−αi )A0i ]−B 0 log B(α)
(6)
with support αi ∈ R+ 0. Given N observations x(k) , the posterior density is given by p(α|x(k) ) ∝ p(x(k) |α)π(α), which is e
−
i
(k) (1−αi ) A0i + N −(B 0 +N ) log B(α) k log xi
;
(7)
that is, of the same form as the prior in equation (6), but with the hyperparameters A0 and B 0 replaced by A = A0 + k log x(k) (with the logarithm operating componentwise) and B = B 0 + N . The likelihood is of the form of equation (7), with A0 and B 0 set to 0. The maximum likelihood estimate for parameters is then obtained by equating the first derivatives of the log likelihood to zero; from equation (2), we see that ∂ log B(α) ∂ = Ψ (αi ) − Ψ = log Γ (αk ) − log Γ αk αk , ∂αi ∂αi k k k (8)
Bayesian Model Selection for Harmonic Labelling
113
where Ψ is the digamma function; therefore, ∂ log L ∂ log B(α) , = Ai − B = Ai − B Ψ (αi ) − Ψ αk ∂αi ∂αi
(9)
k
giving Ψ ( k αk ) = Ψ (αi ) − ABi for the maximum point, which we solve numerically for αi using the bounds discussed in Minka (2003). In addition, performing a quadratic (Gaussian) approximation around the maximum, we can obtain estimates for the error bars on the maximum likelihood estimate from ∂ 2 log L = −σα−2 , giving 2 i ∂α i
max
σαi =
B Ψ (αi ) − Ψ
− 12 αk
;
(10)
k
for the purpose of the confidence interval estimates in this paper, we disregard 2 L covariance terms arising from ∂∂αlog . i ∂αj We defer detailed discussion of a suitable form of the prior on these chord parameters to future work. We have derived an approximate noninformative prior (Jaynes 2003, Chapter 12) within the conjugate family, but its use is inappropriate in this setting, where we can bring considerable musical experience to bear (and indeed the maximum a posteriori estimates generated using this noninformative prior give inferior performance than the maximum likelihood estimates in our experiment). 4.2
Results
Our corpus of MIDI transcriptions is made up of files each with thousands of MIDI events, with typically over five instruments playing at any given time; each bar typically contains several dozen notes. We selected 16 songs in broadly contrasting styles, and ground-truth chord labels for those transcriptions of performances were generated by a human expert, informed by chord labels as assigned by song books to original audio recordings. We then divided our corpus of 640 labelled bars into “training” and “testing” sets of 233 and 407 bars respectively. Based on an initial inspection of the training set, we performed maximum likelihood parameter estimation for the chord models for three different sets of labels: major or minor chord labels for an entire bar; major or minor labels for windows shorter than a bar; and all other labels. From the inferred parameters for major and minor chords at different window sizes in table 1, there was clear evidence for qualitatively different label generation at sub-bar window sizes from the behaviour of labelling whole bars: the sub-bar window sizes have high probability density for small non-triadic tones, while whole-bar windows have a vanishing probability density near a zero proportion of non-triadic tones (from the different qualitative behaviour of distributions with Dirichlet parameters below and above 1.0: 0.72 and 1.45 in our case). We interpret this as showing that the ground-truth labels were generated
114
C. Rhodes, D. Lewis, and D. M¨ ullensiefen
such that a sub-bar window is only labelled with a distinct chord if there is strong evidence for such a chord – i.e. only small quantities of non-triadic tones. If no sub-bar window is clearly indicated, then a closest-match chord label is applied to the whole bar, explaining the only slight preference for chord notes in the whole-bar distribution. There was insufficient ground-truth data to investigate this issue over the other families of chords (indeed, there was only one example of an augmented chord in the training data set). Table 1. Maximum likelihood estimates and 1σ error bars for Dirichlet distributions, based on labelled ground truth Chord, win Maj/Min, bar Maj/Min, sub-bar other
αtt¯ {6.28, 1.45} ± {0.49, 0.099} {3.26, 0.72} ± {0.32, 0.054} {5.83, 1.04} ± {0.82, 0.12}
αrmu {3.91, 1.62, 2.50} ± {0.23, 0.11, 0.15} {4.04, 2.66, 2.29} ± {0.21, 0.15, 0.13} {4.08, 2.35, 1.49} ± {0.38, 0.23, 0.16}
Using the maximum likelihood estimates of table 1, we performed inference over window sizes and chord labels over the testing set, obtaining 53% of correct windows and 75% of correct labels given the window. Additionally, we performed a large (but by no means exhaustive) search over the parameter space on the training data, and obtained parameter values which performed better than these maximum likelihood estimates on the testing set, giving 75% windows and 76% chords correctly. It should be noted that the training and testing sets are quite similar in character, being individual bars drawn from the same pieces; it would be difficult to justify claims of independence between the sets. Validation on an independent testset (i.e. music excerpts drawn from different pieces) is currently being undertaken. We interpret these trends as suggesting that the model for chords based simply on tone proportions is insufficiently detailed to capture successfully enough of the process by which ground-truth labels are assigned. The fact that maximum likelihood estimates perform noticeably worse than a set of parameters from training indicates that there is structure in the data not captured by the model; we conjecture that inclusion of a model for the chord label conditioned on the functional bass note in a window would significantly improve the performance of the model. Another musically-motivated refinement to the model would be to include an awareness of context, for instance by including transition probabilities between successive chord labels (in addition to the implicit ones from the musical surface). This corresponds to removing the assumption that the labels are conditionally independent given the musical observations: an assumption that is reasonable as a first approximation, but in actuality there will be short-term dependence between labels as, for instance, common chord transitions (such as IV-V-I) might be favoured over alternatives in cases where the observations are ambiguous; similarly, enharmonic decisions will be consistent over a region rather than having an independent choice made at the generation of each label.
Bayesian Model Selection for Harmonic Labelling
115
The performance of our approach, without any of the above refinements, is at least comparable to techniques which do relax the assumption of conditional independence between labels; for example, the algorithm of Temperley (2001), which infers chord labels over the entire sequence (using dynamic programming to perform this inference efficiently), achieves a comparable level of accuracy (around 77%) on those pieces from our dataset for which it correctly computes the metrical structure.
5
Conclusions
We have presented a simple description of the dependence of chord labels and pitch-class profile, with an explicit statistical model at its core; this statistical model can be used not only to infer chord labels given musical data, but also to infer the appropriate granularity for those labels. Our empirical results demonstrate that adequate performance can be achieved, while suggesting that refinements to the statistical description could yield significant improvements. The model presented ignores all context apart from the bar-long window in question, and operates only on pitch-class profile data; incorporation of such extra information can simply be achieved by extending the statistical model. Similarly, we can incorporate available metadata into our model, for instance by defining a genre-specific chord label prior; and we can change the repertoire of chords under consideration without alteration of the framework, simply by replacing one component of the observation model.
Acknowledgments C.R. is supported by EPSRC grant GR/S84750/01; D.L. and D.M. by EPSRC grant EP/D038855/1.
References Bello, J.P., Pickens, J.: A Robust Mid-Level Representation for Harmonic Content in Musical Signals. In: Proc. ISMIR, pp. 304–311 (2005) Chuan, C.-H., Chew, E.: Polyphonic Audio Key Finding Using the Spiral Array CEG Algorithm. In: Proc. ICME, pp. 21–24 (2005) Harte, C., Sandler, M., Abdallah, S., G´ omez, E.: Symbolic Representation of Musical Chords: A Proposed Syntax for Text Annotations. In: Proc. ISMIR, pp. 66–71 (2005) Jaynes, E.T.: Probability Theory: The Logic of Science. Cambridge University Press, Cambridge (2003) Krumhansl, C.L.: Cognitive Foundations of Musical Pitch. Oxford University Press, Oxford (1990) Lee, K., Slaney, M.: Automatic Chord Recognition from Audio Using an HMM with Supervised Learning. In: Proc. ISMIR (2006) MacKay, D.J.C.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)
116
C. Rhodes, D. Lewis, and D. M¨ ullensiefen
Minka, T.: Estimating a Dirichlet Distribution (2003), http://research.microsoft.com/~ minka/papers/dirichlet/ M¨ ullensiefen, D., Frieler, K.: Cognitive Adequacy in the Measurement of Melodic Similarity: Algorithmic vs. Human Judgments. Computing in Musicology 13, 147–176 (2004) Pachet, F.: A meta-level architecture applied to the analysis of Jazz chord sequences. In: Proc. ICMC (1991) Raphael, C., Stoddard, J.: Functional Harmonic Analysis Using Probabilistic Models. Computer Music Journal 28(3), 45–52 (2004) Scholz, R., Dantas, V., Ramalho, G.: Funchal: a System for Automatic Functional Harmonic Analysis. In: Proc. SBCM (2005) Sheh, A., Ellis, D.P.W.: Chord Segmentation and Recognition using EM-trained Hidden Markov Models. In: Proc. ISMIR, pp. 185–191 (2003) Tagg, P.: Harmony entry. In: Shepherd, J., Horn, D., Laing, D. (eds.) Continuum Encyclopedia of Popular Music of the World. Continuum, New York (2003a) Tagg, P.: Lead sheet entry. In: Shepherd, J., Horn, D., Laing, D. (eds.) Continuum Encyclopedia of Popular Music of the World. Continuum, New York (2003b) Temperley, D.: The Cognition of Basic Musical Structures. MIT Press, Cambridge (2001) Temperley, D.: Bayesian Models of Musical Structure and Cognition. Musicae Scientiae 8, 175–205 (2004) Temperley, D.: Music and Probability. MIT Press, Cambridge (2007)
The Flow of Harmony as a Dynamical System Peter Giesl Department of Mathematics, University of Sussex, Mantell Building, Falmer, BN1 9RF, UK [email protected]
Abstract. When analysing the evolution of harmony within a composition, one can distinguish between two parts: on the one hand a dynamical system and on the other hand the composer. The dynamical system summarises the rules of harmony, whereas the composer intervenes at certain points, e.g. by choosing new initial values. This viewpoint is helpful for the musical analysis of a composition and will be exemplified by analysing the first movement of Beethoven’s first symphony.
1 Dynamical Systems Applied to Harmony In this section we recall definitions and concepts from dynamical systems and apply them to harmony. By harmony we mean chords which we often denote by its functional class according to Riemann’s theory of functional harmony (Riemann 1893). A discrete dynamical system (X, St ) consists of a metric space X, called the phase space, which in our case is the space of all harmonies or chords. The flow operator St : X → X maps the initial harmony x to the harmony St x after the time t. The time is assumed to be discrete, i.e. we measure the time in steps. Moreover, S0 = id is the identity operator and St is a semi-group, i.e. St+s = St ◦ Ss for all t, s ≥ 0. The metric on the space of all harmonies or chords can be determined by the degree of relationship between two chords. All chords in a close functional relation to the tonic, e.g. dominant or subdominant, have a smaller distance than parallel chords or chords with no direct functional relation1 . Each discrete dynamical system is given by the iteration of a map g: X → X; g = S1 maps a chord to its successor. The sequence of chords x0 , x1 , . . . is thus given by the iteration xn+1 = g(xn ). For example, starting with the dominant D, the next chord is the tonic T which then stays there, i.e. g(D) = T, g(T) = T or in a different notation D → T, T → T. This results in St D = T for all t ≥ 1 and St T = T for all t ≥ 0. The map g and the initial chord x0 determine the sequence of chords, the flow of harmony. This is certainly a simplified viewpoint and one could extend the class of dynamical systems under consideration by allowing a multi-valued map g, possibly with certain probabilities. Then a harmony x could be followed by any harmony of the set g(x) = {y1 , . . . , yk } of harmonies y1 , . . . , yk . Another possibility would be a timedependent map g(xn , n) or a map including the history, i.e. g(xn , xn−1 , . . . , xn−k ). 1
Note that some difficulties occur when considering a discrete phase space with a metric in the definitions of stability. However, we will not go into further detail since we are only interested in the general concept.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 117–123, 2009. c Springer-Verlag Berlin Heidelberg 2009
118
P. Giesl
We, however, restrict ourselves mostly to the simplest version described above. The dynamical system is thus the frame, the underlying system of rules, which can be altered by the composer, cf. Section 3. Hence, the composer can alter the function g or, in the multi-valued setting, he can choose one of the harmonies in the set g(x). In the following we will transfer and interpret more concepts and definitions from dynamical systems to harmony. The rules of each voice in counterpoint, cf. Section 2, will help to explain these concepts. A fixed point x0 is a fixed point of the map g, i.e. g(x0 ) = x0 , and thus defines a constant solution. An example is the tonic T. Furthermore, one is interested in the behaviour near a fixed point. An asymptotically stable fixed point (in the following denoted by stable) is a fixed point, such that all solutions with initial chords in a neighbourhood2 of the fixed point finally tend to the fixed point while staying near. The set of all initial chords tending to the fixed point is called the basin of attraction of the fixed point. The tonic T is a stable fixed point and the subdominant is part of its basin of attraction since S → D → T. Hence, all chords with a functional relation to the tonic belong to the basin of attraction. There can be several stable fixed points, and each has its basin of attraction, which can be regarded as its region of influence. The basins of attraction are disjoint and are often separated by an unstable (fixed) point. A fixed point is called unstable, if in each nontrivial neighbourhood of the fixed point there are chords such that the corresponding solutions tend away from the unstable fixed point. The change from one stable fixed point to another is a modulation. Let us consider two fixed points T and T’ where T is A major and T’ is E major, cf. Figure 1. An unstable point could be the diminished seventh chord f − g − b − d. This is an unstable situation which is at the boundary of both basins of attraction. Depending on the next chord, it can proceed to either of the basins of attraction. One could imagine a ball on the top of a hill with two valleys, one to each side: a light wind could let the ball roll to the one or the other direction and it will come to a halt in either of the two valleys. Hence, the unstable chord f − g − b − d could tend to A major or, understood as f − a − c − d, to E major. Coming back to the image of the ball, we define a Lyapunov function for a stable fixed point. A Lyapunov function is a function V : X → R which decreases along solutions, i.e. V (xn+1 ) ≤ V (xn ), and which has a strict minimum at the fixed point. We generalize the concept slightly and only claim the first property, i.e. V decreases along solutions. Then it can have several minima at different fixed points and it has a maximum at an unstable fixed point. In dissipative physical systems the energy is an example of a Lyapunov function. In our case the harmonic energy would be such a generalized Lyapunov function, since it decreases along solutions: a tense chord with high harmonic energy seeks for resolution and leads to a chord which has less energy until it tends to a minimum. The harmonic energy is like gravitation which moves the ball downwards. If we start with a Lyapunov function, we can define a dynamical system in the following way: the multi-valued map g maps x to the set g(x) := {y ∈ X | V (y) ≤ V (x)} 2
Here we assume the neighbourhood to be nontrivial, i.e. it should not only consist of the fixed point.
The Flow of Harmony as a Dynamical System
119
V
A
f - g# - b - d
Eb
Fig. 1. Above: The figure shows two stable fixed points (A in white and E in black) with their respective basin of attraction (dashed and black line). The basins of attraction are separated by an unstable point (diminished seventh in grey). Below: The Lyapunov function V is sketched for the three chords: the two stable fixed points are minima of V while the unstable point is a maximum.
of all harmonies with lower harmonic energy than x. This leaves some freedom for the sequence of chords, since we have several choices. As an example let us define the following Lyapunov function: V (T) = 0, V (D) = 1, V (S) = 2, V (Sp) = 3. Then for all the following sequences the Lyapunov function decreases: Sp → D → T, D → T, Sp → S → D → T.3
2 Dynamical Systems Applied to Counterpoint The intention of this section is twofold: on the one hand we want to show that similar mechanisms as on the global level of harmony are valid on the local level of voices and counterpoint. On the other hand these local mechanisms can explain some of the global features of the preceding section. On this local level the phase space X = Y n is given by the pitch Y for each of the n voices. The map g = (g1 , . . . , gn ): Y n → Y n determines the next note of each voice. The definition of g depends on the energy of all vectors in Y n – some have a high harmonic energy (dissonant) and some a low one (consonant), and the map g seeks to decrease this energy. Let us illustrate these ideas by two voices (n = 2): an interval f − e1 would relax to f − d1 , i.e. g(f, e1 ) = (f, d1 ). The interval f − d1 has a lower energy, but it could still relax to e − e1 , i.e. g(f, d1 ) = (e, e1 ), cf. Figure 2. The function g for more than two voices is partly a composition of the corresponding functions of two voices, however, other aspects must also be taken into account. If we consider again the example of the diminished seventh chord, then again f − d1 leads to e − e1 , or g − b leads to a − a. Thus, a choice has to be made, which pair of voices 3
Note the following equivalences of the diatonic functions: T=I, Sp=ii, S=IV and D=V.
120
P. Giesl
e g
e f
d f
e e
Harmonic Energy V
Fig. 2. A phrygian clausula together with the harmonic energy. The highest energy appears when the lower voice is moved to f , resulting in the dissonant interval f − d1 . From then on the energy decreases until a resolution is obtained.
forms the clausula4 and which voices have to follow. This choice explains the instability of the chord: the decision of the role of each voice determines the future development5. Another example is the deceptive cadence6 : here, some voices follow the rules of the dynamical system, and only the bass is set differently. On a microlocal level, the same concept applies to only one voice, defining a dynamical system on X = Y . Here, the function g: Y → Y depends on the potential energy given by the musical scale or, in Gregorian melodic, by other laws7 : in the latter case a Finalis is a stable fixed point, a T´enor would be an unstable fixed point as small changes will lead the melody away from it.
3 The Composer In summary, the dynamical system is responsible for the general rules, the resolution of tensions and the automatism within the harmonic flow, whereas the composer can introduce a new starting point, change the system or push it into an ambiguous situation. 4
5
6 7
A clausula in this context consists of two voices, one of which goes a tone upwards (Discant Clausula), the other one at the same time going a tone downwards (Tenor Clausula) to end in an octave, cf. (Schwind and Polth 1996) . Usually one of the steps is a semitone (minor second) and the other a whole tone (major second); if the Discant Clausula has the semitone, then the clausula is called authentic, otherwise phrygian, cf. (Jans 1992). In fact, there are even more possibilities and combinations, cf. (Giesl 1999). These clausulae can even lead to a sequence of chords by changing the roles of each voice, cf. (Giesl 2001). A deceptive cadence is D followed by any chord which is not the tonic, often D→Tp or V→vi. The melodic theory of Christoph Hohlfeld, for example, could provide an appropriate function g, cf. (Giesl 2002).
The Flow of Harmony as a Dynamical System
121
A dynamical system cannot explain the flow of harmony nor the counterpoint in all details; on the contrary, an external influence, the composer, is needed to take decisions and to start with an initial situation. The composer can reset the system at any time by starting with a new chord or situation, he is a control to the dynamical system. He can act against the gravitation and put the ball on a hill, or push it into some direction. The composer mostly acts only at particular, singular times and seldom over a longer period. As an example consider the cadence: S → D → T is the determined sequence by the dynamical system. Then, since T is a fixed point, the system would stay there. Now the composer sets a new initial chord, e.g. the subdominant S, and the rest follows by the rules of the dynamical system. However, combinations which are used very often, can become new rules. Therefore, we will not regard or mention the subdominant after the tonic as an action of the composer. In fact, all harmonies that stay within the basin of attraction of a stable fixed point (cadence harmony) are governed by the dynamical system. Only new initial chords outside the basin of attraction or a new stable fixed point are important and notable events for the analysis. As an example we will provide a short harmonic analysis of the first movement of Beethoven’s first symphony by highlighting the decisions of the composer in the appendix.
4 Summary We have described the flow of harmony as a dynamical system on different macroscopic and microscopic levels. The dynamical system summarises the rules, which depend on the cultural and historical situation. The flow seeks to decrease the harmonic energy and enhances the resolution of tensions. The composer is not restricted to the dynamical system: he chooses a new initial chord which then evolves following the rules of the dynamical system. He leads the system to a point with high musical energy, then the dynamical system takes over. This viewpoint is helpful for the musical analysis of compositions as well as for improvisation: one distinguishes between the internal rules (dynamical system) and the external control (individual decisions). Finding the dynamical system for a whole epoch summarises its theory on harmony. Identifying the important harmonic decisions in a specific composition is a more appropriate analysis of the harmonic flow than just listing all harmonies. This analysis, which is based on the harmonic energy as a parameter which is accessible for the audience, thus highlights the important points which make the composition unique and special.
References Giesl, P.: Von Stimmf¨uhrungsvorg¨angen zur Harmonik. Eine Anwendung der Clausellehre auf Wagners Tristan und Isolde. Die Musikforschung 52, 403–436 (1999) Giesl, P.: Von Stimmf¨uhrungsvorg¨angen zu Kleinterzzirkeln. Eine Deutung der Teufelsm¨uhle durch die Clausellehre. Die Musikforschung 54, 378–399 (2001)
122
P. Giesl
Giesl, P.: Zur melodischen Verwendung des Zweiten Modus in Messiaens Subtilit´e des Corps Glorieux. In: Edler, A., Meine, S. (eds.) Musik, Wissenschaft und ihre Vermittlung; Bericht u¨ ber die Internationale Musikwissenschaftliche Tagung Hannover 2001, pp. 259–264. Wißner, Augsburg (2002) Jans, M.: Modale Harmonik. Beobachtungen und Fragen zur Logik der Klangverbindungen im 16. und fr¨uhen 17. Jahrhundert. Basler Jahrbuch f¨ur Historische Musikpraxis 16, 185 (1992) Riemann, H.: Vereinfachte Harmonielehre oder die Lehre von den tonalen Funktionen der Harmonie, London (1893) Schwind, E., Polth, M.: Article Klausel und Kadenz, 2nd edn. MGG 5, pp. 256–282. B¨arenreiter, Kassel (1996)
A Harmonic Analysis of Beethoven’s 1st Symphony, 1st Movement Exposition 18 Starting with a dominant seventh chord, the resolution follows. 2 Another dominant seventh chord is set, another resolution follows... 3 ...and yet another one leading to the basin of attraction of the stable fixed point C which, however, the audience only realises after some time. 9 g, leading to a deceptive cadence. 18 A7 , leading to d minor. 24 f 56 , leading back9 to C. 41 A, leading to d minor. 42 B, leading to e minor and followed by a sequence... 44→45 the unstable situation is clarified to G. 59 Within the basin of attraction of G its dominant D is stabilised by c, 60 leads back with c to G. 64 Within the basin of attraction of G its subdominant C is stabilised by G7 . Note that a is resolved to g in 65/66, and then g → a in 66/67, leading back to G. 77 G – now g minor – is no longer a stable fixed point. We are in a sequence of falling fifths, a free fall without any gravitation. B major seems to be stabilised in 79-81... 82 ...but the action D7 moves us into the basin of attraction of G – note the phrygian and authentic clausulae10 to d in 84. 94–98 For the sequence of diminished seventh chords with resolutions, cf. the following figure: d → e e ← d c → d c b → a a ← b g←a g → a a ← g g ← f → e e ← f e →d 8 9
10
The numbers denote the bars. This is the phrygian version of the Tenor and Discant Clausula (a, f → g) in contrast to the authentic one corresponding to a dominant-tonic relation (a, f → g), cf. footnote 4 and (Giesl 1999). See footnote 4 for explanation.
The Flow of Harmony as a Dynamical System
123
93 The diminished seventh chord is an unstable situation, coming from g → a/f , the resolution is e = d/f → e. 95 The diminished seventh chord is an unstable situation, coming from e → d/f , the resolution is g/b → a. 97 The diminished seventh chord is an unstable situation, coming from a → g/b, the resolution is c/e → d and leads back to the basin of attracion of G. 107 Adding the seventh to G, we are led back to the basin of attraction of C. Development We again have a sequence of falling fifths in 112-124 starting with the newly introduced A7 and again in 124-132, but this time each chord is more stabilised and the decision to move on is taken just before the next harmony. More decisions are made in 149, 153, 157, and 159, finally leading to a minor. The melodic move from e to f (175 to 176) leads us back to C. Recapitulation The same decisions as in the Exposition are taken. In the new part 191-200 an accelerated sequence of decisions causes an unstable situation. Coda 262 C 7 leads to F in 265. 266 A is introduced, leading to d minor in 269. 270 G leads back to the basin of attraction of C. 274 and 276 g leads to a deceptive cadence which is finally resolved in 279 From 279 up to the end we stay at the stable fixed point C.
Tonal Implications of Harmonic and Melodic Tn-Types Richard Parncutt University of Graz
Music composed of tones (in the psychoacoustical sense of sounds that have pitch) can never be completely atonal (Reti 1958). Consider any quasi-random selection of tones from the chromatic scale, played either simultaneously or successively. Most such sets generate associations with musically familiar pitch-time patterns and corresponding tonal stability relationships (Auhagen 1994). A pattern of pitch can imply a tonal centre simply because it reminds us of a tonal passage: it has tonal implications that depend on the intervals among the pitch classes (pcs) in the set.1 The only clear exceptions to this rule are trivial: the null set (cardinality = 0)2 and the entire chromatic aggregate (cardinality = 12). Since every interval, sonority and melodic fragment has tonal implications, even the so-called “atonal” music of composers such as Ferneyhough, Ligeti and Nono is full of fleeting tonal references: at any given moment during a performance, some pitches are more likely than other pitches to function as psychological points of reference. In the following, I will use the terms “tonal” and “atonal” in this broad, psychological sense. A number of terms have been coined in an attempt to map out the diverse terrain that separates (major-minor or harmonic) tonality from (extreme) atonality, including pantonal, extratonal, atonical, neotonal and polytonal. Tonality is multidimensional in the sense that there are many different ways of bridging that gap that manifest as different styles (such as impressionism, bebop and minimalism). The present analysis attempts to map this complexity onto a single dimension. Instead of dividing music into “tonal” and “atonal”, I conceive of degrees of tonality or atonality and imagine the possibility of evaluating the degree of tonality of a passage of music as a positive whole number between, say, 0 for completely atonal and 1 for completely tonal. That number should generally be higher for music that has clearer or longer-lasting tonal anchors. The “atonal” repertoire avoids tonal references by favoring pc-sets with relatively weak tonal implications. A well-known exception is Berg’s violin concerto, a work that is usually regarded as 12-tone but can barely be regarded as “atonal”, because the row at the beginning of the first movement begins with a minor triad. It follows from this exceptional example that, as a rule, “atonal” composers deliberately avoid consonant intervals between successive notes and prefer tonally weak or ambiguous pc-sets. From a logical viewpoint, they may find “atonal” pc-sets in two main ways: 1
The term “pitch” in “pitch-class set” is misleading, because a pc-set is primarily a configuration of intervals. Each set is defined by the number of times each interval class (of which there are six: 1, 2, 3, 4, 5 or 6 semitones) occurs in the set. This set of 6 numbers is called the interval vector (Forte 1973). For example, a major or minor triad contains no semitone, no tone, one minor third, one major third, one fourth (fifth) and no tritone, so its interval vector is [001110]. 2 The cardinality of a pc-set is simply the number of members in the set. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 124–139, 2009. © Springer-Verlag Berlin Heidelberg 2009
Tonal Implications of Harmonic and Melodic Tn-Types
125
either by borrowing them - consciously or unconsciously - from the existing “atonal” literature, or by discovering them by aurally guided exploration and trial and error, exploring the various possibilities creatively and listening carefully. Since the “atonal” repertoire presumably includes all possible pc-sets, it is no longer possible to find “new” ones, suggesting that these two strategies cannot be separated. How might a composer best find pc-sets corresponding to a given desired degree of tonality or atonality? Composers in atonal idioms (including serial approaches) have intuitively favored pc-sets that avoid perfect intervals (fifths, fourths) and favor tritones and semitones. This paper presents a new method by which composers can systematically and quickly seek and find pc-sets of any specified cardinality and strength of tonal implication. This aim is appropriate given the large number of possible pc-sets from which a composer can choose and the long history of constructive interaction between composition and music theory. My approach is intended to shed light on three areas simultaneously: perception, analysis and compositional practice. The tonal implications of a sounding musical fragment depend not only on the underlying pc-set but also, of course, on its musical realization. The realization of a pc-set has several aspects: properties of individual tones (duration, loudness, timbre, temporal envelope); whether melodic (successive) or harmonic (simultaneous); if melodic, the order of the tones (especially important in 12-tone music) and which tones are repeated; if harmonic, voicing (octave register of each tone, spacing between the tones, doubling in different octaves) and onset synchronicity. A tone is more likely to be perceived as a tonal center if it is repeated (or doubled), has a longer duration, or is simply louder than other tones (Oram and Cuddy 1995; Parncutt 1988; 1997). Here, I assume that it is possible to separate effects associated with the intervals within a set from effects of the set’s specific realization, and focus only on the former. While this assumption may not be entirely valid, it is a good first approximation and a useful starting point. In the following analysis of the tonal implications of pc-sets, I make use of Forte’s convenient and well-known method of enumerating all possible pc-sets within given constraints. While Forte’s method is often referred to as pc-set theory, in the present approach it is no more than a systematic classification system or taxonomy, because the taxonomy itself does not generate predictions that can be empirically tested – unlike the perceptual theory with which it is combined in this paper. While the musicanalytical application of Forte’s taxonomy is usually confined to “atonal” music, there is no reason why it should not be applied to any music composed within the confines of the 12-tone chromatic scale. Tn-types of cardinality 3 Rahn (1980) broke down Forte’s pc-sets into types. One such type is the transpositional type or Tn-type. A Tn-type is a pc-set that is invariant under transposition but not inversion. The subscript n refers to the size of a transposition in semitones, and Tn-type refers to all 12 possible transpositions of a given collection of pcs. The mathematical jargon sounds complicated, but the concept is fundamentally simple. The major and minor triads are both examples of Tn-types. A major triad is a set of three pcs: a root and two further tones, 4 and 7 semitones above the root – 047 for short. The intervallic inversion of the major triad is the minor triad 037, and both
126
R. Parncutt
belong to the same pc-set, whose prime form (Forte 1973) is 037. Because 037 is the 11th in Forte’s list of pc-sets of cardinality 3, it is also referred to as 3-11. When the two Tn-types corresponding to this pc-set are separated, the minor triad is labelled 311A and the major 3-11B. Table 1. All Tn-types of cardinality 3 (after Rahn 1980) set name
3-1
3-2
3-3
3-4
3-4
3-6
3-7
3-8
3-9
3-10 3-11 3-12
prime form
012
013
014
015
016
024
025
026
027
036
023
034
045
056
035
046
inversion
037
048
047
For purpose of argument, let us begin by enumerating all possible Tn-types of cardinality 3. There are 19 of them, and they are presented in Table 1. In the table, “set name” corresponds to “name” in Appendix 1 of Forte (1973); the number before the dash is the cardinality, and the number after the dash is the set’s position in a list of all possible sets of that cardinality. The prime form “012” corresponds to C-C#-D in all chromatic transpositions, “013” to C-C#-D#, and so on. Some of the pc-sets (prime forms) in Table 1 are symmetrical and some are not. For example, 012 is symmetrical, but 013 is asymmetrical. An asymmetrical set may be broken down into two Tn-types, which are labelled A and B: e.g. 013 is labelled 32A, and 023 is labelled 3-2B. The tonal implications of the 19 Tn-types of cardinality 3 vary markedly. At one extreme, the major and minor triads have strong tonal implications; the major is the more strongly tonal, since its root is perceived more clearly (Parncutt 1988). At the other extreme, 012 has almost no tonal implications – by which I mean that the tones sound about equally important and no other (virtual) tones are strongly implied. (Even this is not quite true: when 012 is presented harmonically, 0 and 2 are more audible than 1, due to masking.) All other Tn-types of cardinality 3 have tonal implications with various degrees of strength. For example, 023 may be heard as either the 1st, 2nd and 3rd degrees of a minor scale or the 6th, 7th and 8th degrees of a major scale, suggesting that either the 0 or the 3 in 023 may be heard as a point of reference. The major-third (4-semitone) interval embedded within 014 suggests that its reference pitch is 0, regardless of whether the pattern is heard as Neapolitan, Arabic or Flamenco; in Terhardt’s approach, both pitch-salience patterns and cultural associations are learned, but since pitch-salience patterns are ultimately based on universal aspects of pitch perception in speech, they are expected to vary less than cultural associations across listeners and musical contexts. Given the wide range of tonal implications within Tn-types of cardinality 3 (and any other cardinality for that matter), it is surprising that many pc-set theorists tacitly consider all pc-sets a priori to be equivalent or value-free, as if they had no tonal implications – or as if tonal implications did not exist. Can the tonal implications that we learn from music simply disappear (which is psychologically implausible), or are they arbitrary (which is psychoacoustically and ethnomusicologically implausible)? It may be possible to make tonal implications disappear in a magical, ideal world of mathematics located in a far-off galaxy and inhabited by aliens, but in real music
Tonal Implications of Harmonic and Melodic Tn-Types
127
heard by real human beings, pc-sets will always have tonal implications. Moreover, the appeal of so-called atonal music may be due not to an absence of tonal implications, but to their multiplicity, fluctuation and intangibility. The tonal implications of a Tn-type may be understood and quantified by first evaluating the perceptual salience of each chromatic scale degree in the context of that Tn-type. By “salience” I mean the (subjective) importance of something for a listener, or the (objective) probability that a listener will notice or become consciously aware of something - in this case, a tone at a given chromatic scale degree. The perceptual profile of a Tn-type is a set of 12 values, one for each of the 12 chromatic scale degrees. Each value reflects the perceptual salience of that scale degree in the context of (i.e. during or following presentation of) that Tn-type. In the following, I will distinguish between two kinds of perceptual profile, harmonic and tonal, and present separate algorithms for calculating these profiles that are based on contrasting empirical data and perceptual-cognitive3 theory. The harmonic profile The harmonic profile of a Tn-type is a vector of twelve values, each of which is an estimate of the perceptual salience of a pc. In Parncutt (1988; 1989), I assumed the salience of pitches in chords to be proportional to the probability that a pitch will function as the root when the tones are sounded simultaneously (i.e., as a sonority). I assumed that when a given Tn-type is heard repeatedly in different voicings and contexts, the probability increases that a certain pitch will be heard as a reference – a long process that involves learning, history and culture (Parncutt 2005). I then developed a simple algorithm for pc-salience within harmonically presented Tn-types that was based on the virtual pitch algorithm of Terhardt et al. (1982) and the chordroot model of Terhardt (1982). The model was tested by presenting chords of octavecomplex tones (OCTs, Shepard tones) followed by individual OCTs and asking listeners how well the single OCT fits with the chord (Parncutt 1993). In that experiment, and many other experiments reported for example by Krumhansl (1990), OCTs are used to operationalize the music-theoretical concept of a pc (which is equivalent to the music-psychological concept of chroma). Terhardt assumed that the root of a chord is a virtual pitch. By that, he meant that the root corresponds to the fundamental of an approximately harmonic series of audible pure-tone components (partials). Those components, which are a subset of all the chord’s audible4 partials, generally include harmonics of different chord tones. There are usually several possible candidates for the root of a chord; “the” root may be the one corresponding to the most salient virtual pitch, but may also depend on the music with which a listener is familiar, and thus indirectly on the history of musical 3
There is no clear boundary between “perceptual” and “cognitive”. Terhardt’s theory tends to be regarded as perceptual or psychoacoustical, but it is also cognitive in the sense that it involves information processing (or better: his algorithm to predict the pitch salience profile of a complex sound involves information processing). Krumhansl’s approach is explicitly cognitive, but it is based on empirical data obtained from perceptual or psychoacoustical experiments. 4 By “audible” I mean present in the running spectral analysis of the sound which is performed physiologically by the basilar membrane and transmitted to the brain along the auditory nerve. The initial masking stage of Terhardt’s algorithm predicts what is “audible” in this sense and what is not, and assigns spectral pitches to all audible partials.
128
R. Parncutt
syntax and implicitly learned conventions of music theory. Whichever way you look at it, the root is assumed to be learned and enters culture when listeners are repeatedly exposed to consistent patterns of pitch relationships within musical sonorities. Table 2. Root-support intervals (after Parncutt (1988)) Rootsupport interval
diatonic notation size in semitones
root-support weight
P1, P8…
P5, P12…
M3, M10…
m7, m14… M2, M9…
0
7
4
10
2
10
5
3
2
1
Abbreviations: P=perfect, M=major, m=minor
According to Terhardt (1982), the virtual pitch at the root of a chord is generated by the chord’s tones and that the intervals octave/unison, perfect fifth, major third, minor seventh and major second/ninth determine the root. I call these intervals root supports (see Table 2; Parncutt 1988). They are octave generalizations of the intervals between spectral and virtual pitches in typical harmonic complex tones such as voiced speech sounds (i.e., between harmonic overtones and the fundamental). The chord-root model includes free parameters called root-support weights. These are quantitative estimates of the influence of each root-support interval on the salience of the virtual pitch at the lower tone of the interval, and hence on the perceived root of a chord. The weights used in the present calculations are presented in Table 2 and Figure 1 (b, c). They are assumed to depend on the position of the corresponding element in the harmonic series: intervals that occur early in the series are assumed in Terhardt’s approach to be more familiar to the ear and therefore to play a more important role in the determination of virtual pitches and chord roots. The values in Table 2 have been tested by studying the predictions of the model and comparing them with both music-theoretic intuition and various published sources of empirical data. The predictions of the chord-root algorithm (including an additional masking procedure) were tested experimentally in Parncutt (1993) for a limited set of chords of octave-complex tones; when Krumhansl and Kessler (1982) asked how well octavecomplex probe tones follow single chords (rather than short progressions), they obtained essentially the same results (that is, the correlation coefficients between the two sets of profiles are highly significant). The predictions of the chord-root algorithm may also be considered to apply to a typical or average voicing5 of a given Tn-type when it is realized as regular musical tones (harmonic complex tones). Note the absence of the minor third from the root-support intervals presented in Table 2 and Figure 1 (b, c). In this approach, the m3 is not assumed to have any direct influence on the root. First, it is not found in the lower reaches of the harmonic series between an element of the series and the fundamental. Second, the root of the minor
5
The idea of a “typical or average voicing” could be quantified by documenting all voicings of a given Tn-type in a given musical repertoire using software such as David Huron’s Humdrum.
Tonal Implications of Harmonic and Melodic Tn-Types
129
poids (1/n)
1
0 40
36
32
28
24
20
16
12
8
4
0
interval (semitones)
weight
(a) The template assumed by Parncutt (1989), in which weights are set to the reciprocal of harmonic number.
10 8 6 4 2 0 0
1
2
3
4
5
6
7
8
9
10 11
interval class (semitones) (b) The octave-generalized template used in the present calculations (similar to that of Parncutt (1988), but without the m3 interval).
0 11
1
10
2
9
3 8
4 7
5 6
(c) A circular representation of the same template; the numbers are intervals above the root in semitones. Fig. 1. The harmonic series template for calculating virtual pitch salience
130
R. Parncutt
triad can be explained solely in terms of the P5 interval between the fifth and root of the chord. Figure 2 (below) shows how the theory correctly predicts the root of the minor triad without explicitly including the minor third as a root-support interval. Incidentally, the omission of the minor-third interval from the root supports does not contradict the relatively high salience of the third degree of the minor scale in the K-K profiles (see Figure 4 below). On the contrary: this model can explain why Krumhansl found the third degree of the minor scale to be more salient than the fifth: the strong minor third is also present in the pc-salience profile of the minor triad (Parncutt, in preparation). Table 3. Matrix used to calculate the harmonic profiles of Tn-types
10 0 2 0 0 5 0 0 3 0 1 0 0 10 0 2 0 0 5 0 0 3 0 1 1 0 10 0 2 0 0 5 0 0 3 0 0 1 0 10 0 2 0 0 5 0 0 3 3 0 1 0 10 0 2 0 0 5 0 0 0 3 0 1 0 10 0 2 0 0 5 0 0 0 3 0 1 0 10 0 2 0 0 5 5 0 0 3 0 1 0 10 0 2 0 0 0 5 0 0 3 0 1 0 10 0 2 0 0 0 5 0 0 3 0 1 0 10 0 2 2 0 0 5 0 0 3 0 1 0 10 0 0 2 0 0 5 0 0 3 0 1 0 10 The harmonic profile of a Tn-type is calculated by a simple pattern-matching routine in which the octave-generalized template illustrated in Figure 1 (b) is compared with the pcs of the set, in all 12 transpositions around the pc cycle. One way to represent this routine is by matrix multiplication. The first column of the matrix in Table 3 corresponds to the template in Figure 1 (b). Successive columns are generated by rotating the template down, one element at a time. The Tn-type is expressed as a vector (1 row and 12 columns) of 12 numbers corresponding to 12 pcs (0 to 11), with the value 1 for each pc that is present and 0 for pcs that are absent. For example, the major triad (047) is denoted (1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0). This vector is then multiplied by the matrix in Table 3. The result after matrix multiplication is the Tn-type’s harmonic profile. The calculated harmonic profile of the major triad is (18, 0, 3, 3, 10, 6, 2, 10, 3, 7, 1, 0). According to this profile, the most salient pitches evoked by a C major triad are C (pc 0, predicted salience = 18), followed by E and G (pcs 4 and 7, salience = 10 in each case). Tones F (pc 5, salience = 6) and A (pc 9, salience = 7) are predicted to be strongly implied although they are not among the chord’s notes. Similar results are obtained for the minor triad; one striking difference is that the difference in salience between the root and the third is smaller for the minor triad, which can explain why the minor triad is tonally more ambiguous and - in that sense - less consonant than the
Tonal Implications of Harmonic and Melodic Tn-Types
Minor triad 037
Major triad 047 notes
notes
pitches
pitches
0
0
0
11
1 2
8
4
9
3
7
2
8
4
9
7
10
2
8
4
3
5
7
10
2
9
8
4
7
5 6
1
11
2
9
5 6
pitches 0
8
4
5
11
1
10
3
7
3
0
10
3
4
7
notes
0 11
8
9
Augmented triad 048
pitches
1
2
6
Diminished triad 036
0
1
10
5
6
notes
11
9
3
6
11
1
1
10
5
0
11
11
10
131
2
9
3
8
2
9
4
7
1
10
3
8
4
5 7
6
5
6 6
Fig. 2. Calculated harmonic profiles of four common triads
major. That can in turn explain why minor triads and tonalities are less prevalent and less stable than major triads and tonalities (Eberlein 1994).6 The implied pitches at the 4th and 6th scale degree above the root (M6 for the major triad, m6 above the minor) can explain why chord progressions in which roots fall through fifth or third intervals are more prevalent in tonal music than progressions in the other direction (Eberlein 1994): the pitches that are implied by the first triad (the 4th and 6th) are realized as tones in the second (root and 3rd; Parncutt 2005). The predictions of the model for major, minor, diminished and augmented triads are shown in Figure 2. Corresponding experimental data are presented in Figure 3. In Parncutt (1993), 27 listeners (mainly musicians) rated how well a probe tone went with a preceding chord. Both chords and probe tones were constructed from octavecomplex (Shepard) tones. Trials were shuffled and rotated randomly around the chroma cycle. Filled diamonds in Figure 3 are mean experimental ratings; bars are 95% confidence intervals about those means.
6
Eberlein calculated the frequency of occurrence of different sonorities including major and minor triads, and presented the results in an appendix. He consistently found more major than minor triads in the music of the 18th and 19th centuries - even in a sample that included equal numbers of pieces in major and minor tonalities. The reason is evidently that the dominant triad tends to be major in both modes.
Probe tone’s perceived goodness of fit
132
R. Parncutt
major triad 047
minor triad 037
major-minor seventh 047Q
half-diminished seventh 036Q
Interval class relative to conventional root (semitones) Fig. 3. Experimental data on the salience of pitches evoked by common musical sonorities composed of octave-complex tones (after Parncutt 1993). The diamonds denote the mean responses of 27 listeners; the error bars are the 95% confidence intervals. In each trial, listeners heard a chord of octave-complex (Shepard) tones followed by a single such tone. They were asked to rate how well the tone went with the chord on a scale from 0 (very badly) to 3 (very well). In the chord labels, the letter Q means 10.
The tonal profile The tonal profile of a Tn-type is similar to the harmonic profile, but it is calculated in a quite different way. Each value is an estimate of the probability that a chromatic scale degree will be perceived as the tonic when the Tn-type is realized melodically (successively) in random order(s) and register(s). The calculation involves the major and minor key profiles of Krumhansl and Kessler (K-K) (1982), which are reproduced in Figure 4. They comprise 24 values that may be regarded as measures of the stability of chromatic scale steps in the context of major and minor keys (12 for each). In the following, I will not consider Krumhansl’s well-known explanation of the psychological distances between musical keys based on correlation coefficients between key profiles, nor will I develop the mathematical procedures based on the KK profiles proposed by Temperley (e.g. 2007). Instead, I propose a new algorithm for the pc-salience profile of a Tn-type that is based on the assumption that listeners are familiar with the tonal stability relations within major and minor keys, as represented by K-K profiles. I begin by subtracting a constant (2.23) from all values in the profiles so that the minimum value becomes zero. I then estimate the probability that a given set of tones will occur in a given key by adding up the stability, according to the K-K profiles, of those tones in that key. For example, the probability that the set CEF# will
133
mean goodness-of-fit rating
Tonal Implications of Harmonic and Melodic Tn-Types
pitch class Fig. 4. The key profiles of Krumhansl and Kessler (1982). The full line denotes the (i.e. any) minor key (or tonality), the dotted line the major key.
occur in the key of C major is estimated by adding up the stability of C, of E and of F# in the C-major key profile. The novel aspect of this procedure is as follows: I then calculate the tonal profile of the Tn-type as a weighted mean of all 24 K-K profiles (one for each major and minor key), where the weights are the probabilities calculated in the previous step (i.e. how often we expect the Tn-type in question to occur in each key). The underlying idea is that any Tn-type can be heard in any key, but with different probabilities; the tonal profile of a Tn-type is therefore a weighted mean of all 24 key profiles, where each weight is the probability that a key will be cognitively instantiated when the Tn-type is heard. Finally, I normalize that weighted-mean profile so that its mean is 10; individual values are rounded to the nearest whole number. This new algorithm is conceptually simple, but the weighted mean of all 24 keys would be very time-consuming to perform by hand. Although the algorithm is based on the culturespecific assumption that tonality is limited to the Western major and minor modes, it yields intuitively reasonable results for all Tn-types (see appendix). Table 4. Calculated harmonic and tonal profiles for C major and minor triads pitch class
in semitones
0
as letter
C
major triad
harmonic profile
34
3-11B (047)
tonal profile
minor triad
harmonic profile
3-11A (037)
tonal profile
1
2
3
4
5
E
F
G
6
19
11
4 19
D
6
7
0
6
22
0
13
5
17
10
29
2
4
25
0
15
14
7
10
12
8
11
7 14
8
9
10
G
11 B
6
13
2
0
0 22
4
13
4
9
0 19
15
4
2
6
10
8
11
8
134
R. Parncutt
Table 4 compares the two kinds of perceptual profile, harmonic and tonal, for the C-major and minor triads according to these procedures. Both profiles have been normalized so that their mean is 10, and all entries have been rounded to the nearest whole number. The correlation coefficient between the harmonic and tonal profiles is quite high (r = 0.84 for both major and minor), although the two profiles have been calculated on the basis of quite different assumptions and using quite different procedures and numerical values. The results for the major and minor triads are also similar in the following ways. In both harmonic profiles, the most likely root is the conventional root, the third and fifth have relatively high salience, and the fourth and sixth (M6 in the major triad, m6 in the minor) are strongly implied. The approximately equal stability of root and fifth in the tonal profiles is consistent with the idea that the root of a chord does not generally (or even often) coincide with the tonic; for example, a repeated chord near the end of a classical development section is often perceived as a dominant rather than a tonic. The perceptual profile of a Tn-type of cardinality between 1 and 11 always has peaks, which means that it is always to some extent tonal: the clearer the peaks, the clearer the tonality.7 In Parncutt (1988), I developed a simple mathematical formulation of the “peakedness” of a pc-set’s perceptual profile and called it root ambiguity. It was calculated by dividing the sum of the 12 values by their maximum and taking the square root of the result; the square root came from a model developed to account for empirical data on the number of tones simultaneously perceived in a set of musical and non-musical sonorities (their multiplicity) in Parncutt (1989). According to this procedure, the calculated harmonic ambiguity of the major triad is 1.87, which makes it the least ambiguous of all 19 Tn-types of cardinality 3 and is consistent with its ability to blend (to fuse perceptually). Perceptual profiles, consonance and prevalence The appendix presents the calculated perceptual profiles of all Tn-types of cardinality 3.8 These data, when extended to include cardinalities, have interesting compositional and music-analytical applications: they can help composers to find Tn-types of any given degree of tonal strength and analysts to analyze the tonal strength of Tn-types found in the repertoire. However, things are not quite that simple, because the tonal strength of a Tn-type depends not only on the intervals in the set, but also on the prevalence of a set in the tonal literature and the contexts in which it normally appears. And that depends in turn on its consonance, or lack of dissonance.9 This theory is not circular: the prevalence of a Tn-type is assumed originally to depend causally on just two factors, its (lack of) roughness and the peakedness of its tonal 7
My basic assumption is that the flatter the profile, the more ambiguous the tonal implications. I have not considered bitonality, that is, the possibility that a single profile can imply more than one root/chord or tonic/tonality. Bitonality may be regarded as an example of tonal ambiguity. I also deliberately fail to distinguish between ambiguity and multiplicity. A profile with two main peaks may cause a listener to perceive one peak or the other at different times (ambiguity), or both at once (multiplicity). That distinction is beyond the present scope. 8 Profiles for Tn-types of larger cardinality may be obtained directly from the author. 9 This idea applies regardless of how the term “consonance” is defined or understood. The rank order of consonance of common triads is presumably the same as their rank order of prevalence in tonal music: major, minor, diminished, augmented (Parncutt 2006).
Tonal Implications of Harmonic and Melodic Tn-Types
135
profile (cf. Terhardt 1976). But the theory is complicated by the gradual historical evolution of tonal syntax (Parncutt, in preparation). Pitch patterns may be perceived as consonant because they are often heard in tonal music and are therefore familiar. A pitch pattern may also be performed and therefore heard more often because it is a subset of commonly-used scales. Since rough sonorities are generally less prevalent in tonal music, they may also have fewer or weaker tonal implications. The roughness of a Tn-type may be predicted on the basis of the average roughness of the six interval classes (cf. Huron 1994). In a first approximation, the roughest interval is the minor second, followed by the major second and tritone (Plomp and Levelt 1965). These may be combined with the interval vector of each pc-set, which shows how often each interval class occurs in the set. In the absence of a comprehensive table of such calculations, consider the interaction between roughness and the calculated ambiguity of the harmonic profiles in the appendix. The least ambiguous sets according to the appendix are 047 (major), 035 (part of a seventh chord), 027 (suspended), and 037 (minor), in that order. The reason why 037 is more prevalent in tonal music than 027 or 035 evidently involves the roughness of the major second interval within 035 and 027. The most ambiguous Tn-types of cardinality 3 are predicted to be 036, followed by 012, 013 and 023, then by 014, 034, 046 and 048. The model predicts that 036 (the diminished triad) has four root candidates of approximately equal salience, making it highly ambiguous. None of its three tones is reinforced by a root-support interval (see Table 2), so all have approximately equal salience, and a non-chord tone - 8 relative to 036, or Ab relative to CEbGb - is reinforced by all three tones, which gives it the character of a “pitch at the missing fundamental”. Why is 036 so prevalent in tonal music in spite of its tonal ambiguity? First, it is relatively smooth because it contains no major or minor seconds. Second, it is a subset of the prevalent major-minor (dominant) seventh chord (4-27B or 0368), which is the least ambiguous Tn-type of cardinality 4. Third, it is a subset of the standard major and minor scale sets (Parncutt 2006). Thus, it is both relatively smooth and relatively prevalent. The other listed sets are less prevalent because they contain rough second intervals. These sets may therefore be considered suitable for composition of “atonal” music. Conclusion In this paper, I have sketched a new, systematic approach to the enumeration and perceptual analysis of Tn-types. I have attempted to explain the relative tonalness, consonance and prevalence of Tn-types on the basis of the pitch-salience profiles and the roughness of corresponding musical sonorities. The preliminary findings are promising and the approach shows potential for future application in music analysis and composition. This is not a new investigation in the sense that an answer is sought to a new question. Rather, I have considered the implications of existing empirical and theoretical work for music theory, analysis and composition. The novel aspects of this paper include the systematic application of the algorithm presented in Parncutt (1988) to all possible Tn-types, and consideration of the implications of that procedure for both the history of tonal-harmonic syntax and contemporary composition. Another
136
R. Parncutt
original element is the development of a new algorithm for the pitch-salience of a Tntype based on the K-K profiles. The models that I have presented are incomplete in that they do not account for differences in the musical realization of Tn-types. It would be possible, but beyond the present scope, to account quantitatively in the presented models for parameters such as register, doubling, loudness, doubling and repetition. Acknowledgments. I am grateful to Helga de la Motte-Haber, Timour Klouche and an anonymous reviewer for their insightful questions, criticism and suggestions.
References Auhagen, W.: Experimentelle Untersuchungen zur auditiven Tonalitätsbestimmung in Melodien. Kölner Beiträge zur Musikforschung, vol. 180. Bosse, Kassel (1994) Eberlein, R.: Die Entstehung der tonalen Klangsyntax. Peter Lang, Frankfurt (1994) Forte, A.: The structure of atonal music. Yale University Press, New Haven (1973) Huron, D.: Interval-class content in equally tempered pitch-class sets: Common scales exhibit optimum tonal consonance. Music Perception 11, 289–305 (1994) Krumhansl, C.L., Kessler, E.J.: Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review 89(4), 334–368 (1982) Krumhansl, C.L.: Cognitive Foundations of Musical Pitch. Oxford University Press, New York (1990) Oram, N., Cuddy, L.L.: Responsiveness of Western adults to pitch distributional information in melodic sequences. Psychological Research 57(2), 103–118 (1995) Parncutt, R.: Revision of Terhardt’s psychoacoustical model of the root(s) of a musical chord. Music Perception 6, 65–94 (1988) Parncutt, R.: Harmony: A psychoacoustical approach. Springer, Berlin (1989) Parncutt, R.: Pitch properties of chords of octave-spaced tones. Contemporary Music Review 9, 35–50 (1993) Parncutt, R.: A model of the perceptual root(s) of a chord accounting for voicing and prevailing tonality. In: Leman, M. (ed.) Music, gestalt, and computing - Studies in cognitive and systematic musicology, pp. 181–199. Springer, Berlin (1997) Parncutt, R.: Perception of musical patterns: Ambiguity, emotion, culture. In: Auhagen, W., Ruf, W., Smilansky, U., Weidenmüller, H. (eds.) Music and science - The impact of music. Nova Acta Leopoldina, vol. 92(341), pp. 33–47. Deutsche Akademie der Naturforscher Leopoldina, Halle (2005) Parncutt, R.: Peer commentary on N. D. Cook & T. X. Fujisawa, The psychophysics of harmony perception: Harmony is a three-tone phenomenon. Empirical Musicology Review 1(4) (2006), http://emusicology.org/ Parncutt, R.: Key profiles as pitch salience profiles of tonic triads (in preparation) Plomp, R., Levelt, W.J.M.: Tonal consonance and critical bandwidth. Journal of the Acoustical Society of America 38, 548–560 (1965) Rahn, J.: Basic atonal theory. Schirmer, New York (1980) Reti, R.: Tonality, atonality, pantonality: A study of some trends in twentieth century music. Greenwood Press, Westport (1958) Temperley, D.: Music and probability. MIT Press, Cambridge (2007)
Tonal Implications of Harmonic and Melodic Tn-Types
137
Terhardt, E.: Ein psychoakustisch begründetes Konzept der musikalischen Konsonanz. Acustica 36, 121–137 (1976) Terhardt, E.: Die psychoakustischen Grundlagen der musikalischen Akkordgrundtöne und deren algorithmische Bestimmung. In: der Musik, T. (ed.) Carl Dahlhaus and Manfred Krause, pp. 23–50. Technical University of Berlin, Berlin (1982) Terhardt, E., Stoll, G., Seewann, M.: Algorithm for extraction of pitch and pitch salience from complex tonal signals. Journal of the Acoustical Society of America 71, 679–688 (1982)
138
R. Parncutt
Appendix: Calculated Perceptual Profiles of All Tn-Types of Cardinality 3 Row 1: Tn-type labels, harmonic profile (12 values), ambiguity of harmonic profile (a) Row 2: correlation between harmonic and tonal profiles (r), tonal profile (12 values), ambiguity of tonal profile (a)
3- 1
(012)
21 19 23
4
r = 0.72
11 10 11
9 10 11
3- 2A (013) r = 0.75
3- 2B (023) r = 0.75
3- 3A (014) r = 0.75
3- 3B (034) r = 0.75
3- 4A (015) r = 0.83
3- 4B (045) r = 0.83
3- 5A (016) r = 0.73
3- 5B (056) r = 0.73
4 10 10 10 9 11
6
6
8
2
a=2.29
9 11 10
9
a=3.26
19 21
4 23
0 13 10
0 15
6
2
8
a=2.29
11 11
8 12
8 11
9
9 12
8 11
8
a=3.11
21
2 23 19
4 13
0 10 15
0
8
6
a=2.29
12
8 11 11
8 11
8 12
9 11
8
a=3.11
2
a=2.20
9
25 19
6
4 19 10 13
0
11 11
9
9 12 10
9
9 11 11
8 10
a=3.13
25
2
6 19 19 13
4
0 15 10
2
6
a=2.20
12
9
9 11 11 10
8 11 11
9 10
a=3.13
19 25
4
13 11
9 10
25
6
14
4
6
6 11
2
a=2.05
8 14
8
9 11
9 11
7
a=2.97
6
2 19 29
4
4
6 10 11
0
a=2.05
8 10
9 11 13
7 11
9 11
8
a=2.97
10 13
19
9
2
0 29 10
19 19 10
6
6 15
4
2 10 29
8 10 10 10 12
6 10
2
12 10 10 10
2 29 19
0 10
6
9
2 11
a=2.05
8 11 10 10 10
a=3.08
4 10
a=2.05
0 11 10
8 13 10 10 10 10 11
8
a=3.08
Tonal Implications of Harmonic and Melodic Tn-Types
3- 6
(024)
27
0 25
0 23 10
4 10
6 10
8
0
a=2.12
r = 0.75
12
8 12
8 12 10
8 12
8 12
8 10
a=3.11
21
6 23
2
4 29
0 13
6
0 17
0
a=2.05
14
7 12
9
8 14
7 12
8 11 11
7
a=2.95
19
8
4 21
0 32
0
4 15
0 11
6
a=1.93
14
8
9 12
7 14
7 11 11
8 12
7
a=2.95
21
0 29
0
6 10 19 10 10
0
8 10
a=2.05
11
9 12
9
9 10 11 11
8 11
9 10
a=3.14
25
0 11
0 21 10 23
0 10 10
2 10
a=2.20
9 11
8 11
a=3.14
3- 7A (025) r = 0.85
3- 7B (035) r = 0.85
3- 8A (026) r = 0.59
3- 8B (046) r = 0.59
3- 9
3-10
11 10
9
9 12
9 11 10
(027)
30
0 23
6
4 11
0 29
6
8
0
a=1.98
r = 0.90
14
6 14
8 10 11
7 15
7 11 10
8
a=2.82
(036)
19
2 10 19
2 15
a=2.51
r = 0.62
3-11A (037) r = 0.84
3-11B (047) r = 0.84
3-12
139
2 13 19
0 19
4
0
11 10
8 12
8 10 11 10 11
8 11 10
a=3.11
29
2
4 25
0 15
0 19 15
4
2
6
a=2.05
14
7 10 12
8 11
7 14 10
8 11
8
a=2.95
34
0
6
6 19 11
4 19
6 13
2
0
a=1.87
14
7 11
8 12 10
7 14
8 11
8 10
a=2.95
0 25 10
6
6
0
a=2.20
8 10
a=3.15
(048)
25 10
6
r = 0.74
12 10
8 10 12 10
0 25 10
8 10 12 10
Calculating Tonal Fusion by the Generalized Coincidence Function Martin Ebeling Peter-Cornelius-Conservatory of music, Mainz [email protected]
Abstract. Models of pitch perception in the time domain suggest that the perception of pitch is extracted from neuronal pulse series by networks for periodicity detection. A neuronal mechanism for periodicity detection in the auditory system has been found in the inferior colliculus (Langner 1983). The present paper proposes a mathematical model to compute the degree of coincidence in the periodicity detection mechanism for musical intervals represented by pulse series. The purpose of this model is to study the logical structure of coincidence and to define a measure value for the degree of coincidence. The model is purely mathematical but has a strong relation to physiological data presented by Langner. As the sensation of consonance depends mostly on pitch, frequency is the only parameter to be regarded in the model. The integration of other parameters and the adaptation to further physiological data should be easy but still lies ahead. The model is a mathematical basis for a concept of consonance based on pitch perception models in the time domain. In contrast to the concept of the sensory consonance it does not refer to the percept of roughness, which nevertheless is important for the perceived pleasantness of consonances.
1 Background 1.1 Tonal Fusion and Roughness Carl Stumpf observed that consonant intervals show a tendency to cohere into a single sound image. He called this phenomenon Tonverschmelzung – tonal fusion. Consonant intervals show a stronger tendency to fuse than less consonant or dissonant intervals. From extensive hearing experiments Stumpf deduced a system of rules which he termed Stufen der Tonverschmelzung and illustrated it in a curve which he called System der Verschmelzungsstufen in einer Curve (Stumpf 1890/1965). The curve shows the degree of fusion for all intervals over a range of an octave. Not only the consonant intervals which are the prime (1:1), minor third (5:6), major third (4:5), pure fourth (3:4), pure fifth (2:3), minor sixth (5:8), major sixth (3:5) and the pure octave (1:2) - indicated by their frequency ratio - have a higher degree of fusion, but also slightly mistuned intervals nearby the consonant intervals. Stumpf’s concept of tonal fusion is an attempt to define consonance and dissonance psychologically. He emphasizes consciousness and focuses on the mental acts of attending to and referring to sound. Thus, his approach is essentially different from T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 140–155, 2009. © Springer-Verlag Berlin Heidelberg 2009
Calculating Tonal Fusion by the Generalized Coincidence Function
141
Helmholtz’s (1877) idea to explain consonance and dissonance as a result of the sensation of roughness, a theory based on the physiology of hearing. While Helmholtz deals with the perception of tone, Stumpf, by contrast focuses on the apperception of tone. Without discussing roughness in this paper it should be mentioned that the important phenomenon of roughness (Zwicker and Fastl 1999) is widely accepted for the explanation of consonance and dissonance as it is closely combined with the psychophysical model of critical bandwidth filters on the basilar membrane (Plomp and Levelt 1965). Several methods for the calculation of roughness have been proposed (Aures 1985; Kameoka and Kuriyagawa 1969). Considerations of the underlying logic of neuronal processing in the auditory system reveal that tonal fusion is besides roughness a quite considerable concept to explain consonance. This paper is based on the descriptions of a periodicity detection mechanism in the inferior colliculus (IC) found by Langner (1983; 2007). The logic of its operation is studied mathematically by the definition and calculation of the Generalized Coincidence Function (Ebeling 2007).
Fig. 1. Stumpf’s “System der Verschmelzungsstufen in einer Curve” (Stumpf 1890/1965)
1.2 Interspike Interval Distributions, Pitch Estimates and Harmony 1.2.1 Neuronal Code and Pitch The inner ear provides a frequency analysis mechanism which transforms incoming sound into a neuronal code (Zhang, Heinz, Bruce and Carney 2001). Due to the mechanics of the basilar membrane and the frequency selectivity of the hair cells, sound induced pressure waves travelling through the cochlea are converted into neural impulses, representing the differentiatedly resolved frequency components of the sound (Goldstein and Scrulovic 1977). In the case of a single pure tone, the travelling pressure wave maximally activates hair cells at a certain place on the basilar membrane and makes them basically react with a periodic firing pattern. According to the volley principle of Wever (1949) this results in a running spike train in the auditory nerve with a period corresponding to the reciprocal of the frequency of the tone. Two pure tones falling into the same critical bandwidth stimulate the same group of hair cells and cannot be resolved in the auditory system. They produce a pitch
142
M. Ebeling
percept, corresponding to a frequency between the frequencies of the two pure tones and an amplitude modulation with a frequency equal to the difference frequency. All partials of an idealized harmonic sound are multiples of a fundamental frequency. As a result of superposition the period of the fundamental is equal to the period of the envelope of the harmonic sound. The period of the fundamental is also encoded in the cochlea in amplitude modulations resulting from superposition of frequency components above the third harmonic. As a consequence the period of the fundamental is coded temporally in spike intervals in the auditory nerve and can be analysed by neurons in the auditory brain stem (cochlear nucleus: CN) and midbrain (inferior colliculus: IC) (Langner 2005). 1.2.2 Interspike Intervals The time between neural spikes, called interspike interval (ISI), can be measured either between successive discharges (1st- order ISI) or between both successive and non-successive spikes (all- order ISI). Counting all ISI in a discharge pattern leads to histograms that show the interspike interval distributions for the entire auditory nerve. Cariani and Delgutte (1996) have shown that ISI- histograms (autocorrelograms) computed from all-order ISIs show high peaks for periods corresponding to the pitch. This demonstrates that the most frequent all-order interspike interval corresponds to the perceived pitch (Cariani and Delgutte 1996, 1698). 1.2.3 Coinciding Periodicity Patterns for Intervals Tramo, Cariani, Delgutte and Braida (2001) analysed the neuronal responses to harmonic intervals. They used stimuli of isolated, harmonic intervals (minor second, perfect fourth, triton, perfect fifth) formed by complex tones. Each of the two complex tones contained the first six harmonics with equal amplitude and equal phase. They found that in addition to the pitches of notes actually present in the interval, for consonant intervals, the fine timing of auditory nerve fiber responses contain strong representations of harmonically related pitches implied by the interval (subharmonics, e.g. Rameau’s fundamental bass; for the perceptual root of a chord, see Parncutt 1989; 1997). Moreover, all or most of the six partials can be resolved by finely tuned neurons throughout the auditory system. By contrast, dissonant intervals evoked auditory nerve fiber activity that does not contain strong representations of constituent notes or related bass notes. As in the case of dissonant intervals the two complex tones contain many partials too close together to be resolved, these partials interfere with one another, and thus cause coarse fluctuations in the firing of peripheral and central auditory neurons. This gives rise to the perception of roughness and dissonance. (Tramo, Cariani, Delgutte and Braida 2001, 92). Tramo, Cariani, Delgutte and Braida (2001) determined the ISI distributions embedded in the responses of axons throughout the auditory nerve during stimulation with musical intervals. Comparing these ISI histograms with the graphs of the computed autocorrelation functions (with primaries, consisting of six equally strong harmonics) they found the same periodicity patterns in both, the autocorrelation and the ISI distributions.
Calculating Tonal Fusion by the Generalized Coincidence Function
143
1.2 Autocorrelation 1.2.1 Autocorrelation versus Fourier-Analysis Measuring and counting the all-order ISI is an analysis in the time domain which is, from a logical point of view equivalent to the computation of an autocorrelation function. The autocorrelation function shows peaks for all periods of a signal. As periods are distances in time, the autocorrelation function has to be regarded as an analysis in the time domain. From the investigations of Cariani and Delgutte (1996) it becomes probable that a neuronal autocorrelation mechanism for the detection of the periods of running spike trains in the auditory system provides the sensation of pitch. It must be pointed out that the autocorrelation function is as powerful a means to sound analysis as the Fourier-transform. The famous theorem of Wiener - Khintchine (Wiener 1930; Hartmann 2000) says that the autocorrelation function is the Fourier-transform of the power spectrum (energy spectral density) (see also Papoulis 1962, 246). As a consequence, the autocorrelation analysis is equivalent to a Fourier-analysis of a signal. The Fourier-analysis is used for spectral analysis in the frequency domain; the autocorrelation analysis is a periodicity analysis in the time domain. The power spectrum shows all frequencies inherent in the signal but no phase-shifts, the autocorrelation function shows all periods inherent in the signal including all subharmonic periods, but also no phase-shifts. 1.2.2 Hearing Theories and Autocorrelation Neuronal spike patterns in the auditory system can mathematically be represented by pulse sequences. Forming their autocorrelation functions (Papoulis 1962, 249) provides all information about their periodicity and ISIs. Thus, the “existence of a central processor capable of analyzing these interval patterns could provide a unified explanation for many different aspects of pitch perception” (Cariani and Delgutte 1996, 1698). Since Licklider (1951), a lot of auditory theories operating in the time-domain presume an autocorrelation mechanism or a related model to detect the periodicity of the stimuli (overviews: Hartmann 2000; Cheveigné 2005). These models have been tested psychoacoutically or with computer simulations (e.g. Meddis and Hewitt 1991; Patterson and Allerhand 1995) using different stimuli. Those tests give evidence for (e.g. Yost, Patterson and Sheft 1996) and sometimes against (e.g. Kaernbach 1998) an autocorrelation mechanism in the auditory system. Few models are based on physiological data using properties of neuronal circuits in the auditory pathway (Langner 1983; Meddis and Hewitt 1991). The present paper refers to properties of Langner’s model of periodicity detection in the inferior colliculus (IC) so that this model should briefly be presented. 1.3 Langner’s Neuronal Correlator Langner (1983) measured the responses of neurons in the cochlear nucleus (CN) and inferior nucleus (IC) to amplitude modulated signals (Hartmann 2000, 399) and proposed a model that performs a correlation between signal fine-structure and modulation envelope: The model of Langner is based on neuronal delay and coincidence mechanisms. Its processing elements are a trigger, an oscillator, a reducer, and a coincidence neuron. These elements have their counterparts in well-described on-type, which discharge only to stimulus onset, chopper neurons, and pauser neurons in the
144
M. Ebeling
CN and disc cells in the IC. The oscillator responds with short bursts of regular intrinsic oscillations to each modulation period. The integrator collects the energy of the signal, thus generating intervals precisely related to the signal fine structure. By integrating synchronized activity of many nerve fibres the reducer is able to code frequencies to the upper limit of phase coupling. The trigger unit synchronizes the responses of oscillator and reducer cycle to the modulation. The coincidence unit is activated by simultaneous inputs from oscillator and reducer. It responds best when signal fine-structure (detected by the integrator) and signal envelope (which the oscillator is synchronized to) are correlated and the envelope period matches the reducer delay. Thereby each of such a neuronal circuit responds best to a periodically modulated sound (BMF) and is simultaneously representing a certain frequency and a certain pitch (Langner 2005). Three different periods are crucial for coincidence detection:
τ m - The period of the envelope (Hartmann 2000, 412-426) τ c - The period of a carrier frequency, that means the fine- structure of the sound, and
τ o - The period of intrinsic oscillation. Langner (2005; 2007) assumes that the detection of the envelope period yields to the sensation of pitch, whereas the timbre of the sound corresponds to the finestructure of the sound, represented by τ c . The inner oscillation provides a time slot of coincidence, with periods of
τ o = 0.8
ms; τ o = 1.2 ms…up to τ o = 2.4 ms or generally: τ o = 0.8 ms + k ⋅ 0.4 ms . (Langner and Schreiner 1988, 1813). Mathematically described, these three periods correlate if there are small integers n, m, so that the “periodicity equation” is valid (see Langner and Schreiner 1988, 1818):
m ⋅ τm + n ⋅ τc + τo = 0 .
(1)
The intrinsic oscillation with period τ o contributes a fuzziness to the coincidencedetection as intrinsic oscillation raises the coincidence neuron onto an excitation level closely under threshold. At the beginning of stimulation each coincidence neuron shows a responds characteristic of a comb filter. But after about 30 ms inhibition from the onset neuron the coincidence neuron in functionally converted to a bandpass filter (Voutsas, Langner, Adamy and Ochse 2005; Langner 2007). This grants that the whole bank of coincidence circuits acts like an autocorrelator onto the modulation frequencies.
2 Mathematical Model of Generalized Coincidence 2.1 Correlation Functions Applying autocorrelation functions makes it necessary to classify functions (signals) according to their average power that is defined by:
Calculating Tonal Fusion by the Generalized Coincidence Function ____ 2
f
(t ) = Tlim →∞
1 2T
³
T
−T
145
2
f (t ) dt .
(2)
The proposed model of generalized coincidence makes exclusive use of functions ___ 2
(t ) = 0 . Nevertheless, it can easily be ex___ which have the property that 0 < f 2 (t ) < ∞
with finite energy, which means that tended to finite power functions,
f
(Papoulis 1962). In the case of functions with finite energy, the correlation functions of two functions f 1 (t ), f 2 (t ) are defined by: autocorrelation function ∞
ρ i (τ ) = ³ f i (t ) f i (t + τ )dt
(3)
−∞
cross correlation functions ∞
ρ12 (τ ) = ³ f 1 (t ) f 2 (t + τ )dt
(4)
−∞
∞
ρ 21 (τ) = ³ f 2 (t ) f 1 (t + τ )dt
(5)
−∞
Substituting t ′ = t + τ in (4) shows that
ρ12 (τ ) = ρ 21 (− τ ) .
(6)
Let S (t ) = f 1 (t ) + f 2 (t ) be the sum of two functions f 1 (t ), f 2 (t ) . Using definition (3) immediately leads to the sum formula of autocorrelation functions: ∞
∞
−∞
−∞
³ ³ [ f (t ) + f (t )]⋅ [ f (t + τ ) + f (t + τ )]dt = f (t ) f (t + τ )dt + f (t ) f (t + τ )dt + f (t ) f (t + τ )dt + f (t ) f (t + τ )dt ³ ³ ³ ³
ρ S (τ ) =
S (t )S (t + τ )dt =
∞
1 −∞
1
2
1
∞
1
−∞
2
∞
2
2
1 −∞
= ρ1 (τ ) + ρ 2 (τ ) + ρ12 (τ ) + ρ 21 (τ )
∞
2
−∞
2
1
(7) 2.2 Sequence Representation of a Tone In the auditory system, a pitch is represented by a periodic pulse train which mathematically can be represented by a sequence of equally spaced pulses (M positive integer or ∞ ):
xμ (t ) =
M
¦ I (t − mT )
m=− M
μ
(8)
The constant T is the period of the pulse train and it is the reciprocal of the frequency corresponding to the perceived pitch. The function I μ (t ) describes the pulse
146
M. Ebeling
form. A neuronal pulse is built up by a lot of neuronal discharges randomly distributed around time mT and coincidences randomly occur in a time window. Therefore,
I μ (t ) should be a density function, that means: i.
I μ (t ) ≥ 0 for every t;
ii.
³
∞
−∞
(9)
I μ (t )dt = 1 .
(10)
Furthermore, the spread of all single discharges is determined by parameter scribing the “width” of the pulse
I μ (t ) . Taking μ as a real number, I μ (t ) becomes
a family of functions with the generalized limit third property of
I μ (t ) follows:
iii.
μ de-
δ(t ) (Papoulis 1962, 277). Thus, a
lim I μ (t ) = δ(t )
(11)
μ →0
The limit
lim xμ (t ) = μ→0
M
¦ δ(t − mT )
(12)
m =− M
is the idealized case of all neuronal discharges occurring exactly at time
mT . Ex-
I μ (t ) may be the gaussian pulse with μ to determine the variance or the rectangular pulse with the width μ .
amples for
If two pulses fulfil the properties (i)-(iii), their cross correlation functions also fulfil the properties (i)-(iii). Considering definition (3), property (i) and (ii) become obvious for the cross correlation function. Property (iii) can be proved by applying the definition of the generalized limit: If ∞
χ μν (τ ) = ³ I μ (t )Jν (t + τ )dt
(13)
−∞
is the cross correlation function of two pulses
lim lim ³
∞
()()
I μ (t ) and J ν (t ) , it can be shown, that
()
()
χ μν τ φ τ dτ = φ 0 for every continuous test function φ t . By μ →0 ν →0 − ∞ definition of the generalized limit this is equivalent to property (iii). As by definition the autocorrelation function is a special case of a cross correlation function thus follows that the properties (i)-(iii) are also valid for the autocorrelation functions of
I μ (t ) and J ν (t ) .
2.3 Sequence Representation of an Interval In the model the sum of two simultaneously running pulse trains is the mathematical representation of neural spike trains corresponding to an interval. If ν 1 , ν 2 are the
Calculating Tonal Fusion by the Generalized Coincidence Function
frequencies of the two tones constituting the interval and ing periods, the frequency ratio of the interval is
s= Let
147
T1 , T2 are the correspond-
ν 2 T1 −1 or T2 = s T1 = ν 1 T2
(14)
I μ (t ) and J ν (t ) be two families of pulse functions with properties (i)-(iii) as
above. The two tones of the interval shall be represented by the two sequences
xμ (t ) =
M
¦ I μ (t − mT1 ) ; xν (t ) =
m= − M
N
¦ J ν (t − nT2 ) =
n= − N
¦ J (t − ns N
n=− N
ν
T1 ) (15)
−1
Their sum
S (t ) = xμ (t ) + x ν (t )
(16)
is the mathematical representation of the interval with the frequency ratio s. Furthermore, let:
α μ (τ ) be the autocorrelation function of the pulse I μ (t ) ,
α ν (τ ) be the autocorrelation function of the pulse J ν (t ) , χ μν (τ ) , χ νμ (τ ) be the cross correlation functions of the pulses I μ (t ) , J ν (t ) .
From (6) follows:
χ μν (− τ ) = χ νμ (τ )
(17).
By induction on N the equation for N , k , m, n integers, N ≥ 0, k ≤ N:
{
}
# (n,m) n ≤ N ∧ m ≤ N ∧ (n − m) = k = 2 N + 1 − k
(18) can be proved. Together with the definitions (3)-(5) and the sum formula (7) (the linearity of integration respectively), it follows that: The autocorrelation function of xμ (t ) is the sequence: 2M
ρ μ (τ ) = ¦ (2 M + 1 − n )α μ (τ − nT1 ) ;
(19)
n = −2 M
The autocorrelation function of x ν (t ) is the sequence
ρν (τ ) =
¦ (2 N + 1 − n )αν (τ − ns −1T1 ) ; 2N
n = −2 N
(20)
148
M. Ebeling
The cross correlation functions of xμ (t ) and x ν (t ) are (21)
(22) As the parameters n and m are as well positive as negative and as
χ μν (− τ ) = χ νμ (τ ) from (17), both cross correlation functions are equal: ρ μν (τ ) = ρ νμ (τ )
(23)
2.4 Autocorrelation Function of an Interval Applying the sum formula (7) to S (t ) = xμ (t ) + x ν (t ) , it follows from (19)-(21) that
ρ S (τ, s ) = ρ μ (τ ) + ρ ν (τ ) + ρ μν (τ ) + ρ νμ (τ ) = ρ μ (τ ) + ρ ν (τ) + 2ρ μν (τ) = =
¦ (2M + 1 − m )α (t − mT ) + ¦ (2 N + 1 − n )α (t − ns 2M
2N
μ
m = −2 M ∞
+2¦
∞
¦ χ (τ − (ns
m = −∞ n = −∞
μν
1
n = −2 N
−1
ν
−1
T1
)
(24)
) )
− m T1
is the autocorrelation function of the interval with the frequency ratio s. As the autocorrelation function also depends on the frequency ratio s, it is introduced as second variable. 2.5 Definition of the Generalized Coincidence Function We define the generalized coincidence function as the integral
Κ (s ) := ³ ρ 2S (τ, s )dτ D
0
(25)
As we are only interested in positive periods up to a certain length, the integration is performed over the interval 0, D with D>0. For each frequency ratio s, Κ (s ) is a measure value of overall coincidence between the two pulse trains representing the two tones of the interval with regard to pulse forms and pulse widths. This becomes clear from the example of rectangular pulses (see Ebeling 2007).
[
]
3 Application of the Model to Rectangular Pulse Sequences 3.1 Correlation Functions of Rectangular Pulses 3.1.1 Autocorrelation Function of the Rectangular Pulse To apply the model, the degree of coincidence shall be calculated for all frequency ratios s within an octave, that means 1 ≤ s ≤ 2 . As a pulse function to fulfil the properties (i)-(iii) we take the rectangular pulse
Calculating Tonal Fusion by the Generalized Coincidence Function
149
μ 1 ° μ if t < 2 ° I μ (t ) := ® °0 otherwise ° ¯
(26)
Its autocorrelation function is the triangle pulse. (see also Papoulis 1962, 243).
1 § τ · ° ¨¨1 − ¸¸ if τ < μ μ¹ °° μ © α μ (τ ) = Δ μ (τ ) := ® ° 0 otherwise ° °¯
(27)
Fig. 2. The autocorrelation function of the rectangular pulse is the triangle pulse
3.1.2 Cross Correlation Function of the Rectangular Pulse Consider two real functions f1 (t ) and f 2 (t ) with the Fourier transforms
F1 (ω )
and F2 (ω ) . It can be shown (see Papoulis 1962, 244) that the definition of the cross correlation function given in (4) is equivalent to
ρ12 (τ ) =
1 2π
∞
³ F1 (ω )F2 (ω )e
iωτ
dω
(28)
−∞
Thus, the cross-correlation function is the Fourier transform of the cross-energy
spectrum E12 (ω ) := F1 (ω )F2 (ω ) . Recall that the Fourier transform of the rectangular pulse I ε (t ) is (see Papoulis 1962, 20)
150
M. Ebeling
§ ε· 2 sin¨ ω ¸ 1 © 2 ¹ =: F (ω ) I ε (t ) ↔ ε
ε
ω
Considering two rectangular pulses
E μν (ω ) =
(29)
I μ (t ) and Iν (t ) the cross-energy spectrum
§ § μ −ν · § μ +ν · · cos cos ω ω ¸ ¸¸ ¨ − ¨ ¸ ¨ ¨ © 2 ¹¹ ¹ μν ω 2 © © 2 2
(30)
is obtained. As this function is even, equation (28) becomes
1 ρ12 (τ ) = 2π This leads to integrals of the form
∞
∞
³ E μν (ω )cos(ωτ )dω
(31)
−∞
³−∞
cos(ax ) dx = −π a . This evaluation can be x2
shown using the calculus of residue. As a result, the cross correlation function of the two pulses I μ (τ ), I ν (τ ) is the function:
1 § μ +ν · if ° μν ¨ 2 − τ ¸ © ¹ ° °° 1 χ μν (τ ) := ® if ° μ ° ° °¯ 0 otherwise
μ −ν 2
<τ ≤
τ ≤
μ +ν 2
μ −ν 2
Fig. 3. The cross correlation function of two rectangular pulses with widths and
(32)
Calculating Tonal Fusion by the Generalized Coincidence Function
151
3.1.3 Autocorrelation Function of an Interval Represented by Rectangular Sequences The two tones of an interval are represented by sequences of rectangular pulses with pulse width and respectively.
x1 (t ) =
M
¦ I μ (t − mT1 ) ; x2 (t ) =
m= − M
N
¦ I ν (t − nT2 ) =
n=− N
¦ I (t − ns N
n= − N
ν
−1
)
T1 (33)
Fig. 4. A periodic sequence with rectangular pulses with pulse width and period T
Their autocorrelation functions are sequences of triangle pulses of the widths 2 and 2 (see (20) and (27)):
Fig. 5. The autocorrelation function of a sequence of rectangular pulses of the width is a sequence of triangle pulses of the width 2
According
S (t ) = xμ (t ) +
to
(24), x ν (t ) is:
the
autocorrelation
function
of
the
interval
152
M. Ebeling
Fig. 6. The autocorrelation function
ρ S (τ , s ) = ∞
+2¦
§ 9· s ¨τ , ¸ © 8¹
of the major second
¦ (2M + 1 − m ) Δ μ (τ − mT ) + ¦ (2 N + 1 − n ) Δν (τ − ns T ) 2M
2N
1
m = −2 M
1
n = −2 N
∞
¦ χ μν (τ − (ns
−1
−1
) )
(34)
− m T1
m =−∞ n = −∞
3.2 Calculation of the Generalized Coincidence Function
Κ (s ) := ³ ρ 2S (τ, s )dτ , we set the constants as follows: D
To compute • •
• •
0
audible frequencies range from about 20 Hz to about 20,000Hz corresponding to periods with lengths from 0 ms to 50 ms; thus set: D := 50.
D and N = D pulses of x (t ) and x (t ) respec1 2 s −1T1 T1 §D· tively fit into the time window of D, we set: M = floor¨ ¸ and ¨ ¸ © T1 ¹ as at the most M =
§ sD · ¸¸ ; N = floor¨¨ © T1 ¹ choose ν 1 := 100 Hz as a reference frequency of the lowest tone, thus T1 = 10 ms. for μ, ν I refer to physiological data (Langner and Schreiner 1988): a time 1 th of the period of the tone has been found in the periowindow of about 6
Calculating Tonal Fusion by the Generalized Coincidence Function
dicity detection of the IC; thus set pulse
μ=
153
T1 T , ν = 1 , so that the triangle 12 12 s
Δ μ (τ ) as the widest pulse fits into this time window.
()
Fig. 7. shows the graphs of the Generalized Coincidence Function Κ s for all intervals within the range of an octave. The variable s indicates the frequency ration of the interval. The lower tone has a period of T = 10 ms.
4 Conclusion The similarity between Stumpf’s “System der Verschmelzungsstufen in einer Curve” (see Figure 1) and the graph of the Generalized Coincidence Function (Figure 7) is striking. The concept of generalized coincidence is in accordance with historical coincidence theories since Pythagoras (Hesse 2003) and with Carl Stumpf’s considerations on tonal fusion. For intervals close to the prime and the octave the Generalized Coincidence Function as well as Stumpf’s curve show high degrees of tonal fusion. On the other hand, those intervals give rise to the strongest sensations of roughness (Zwicker and Fastl 1999, 257). Hence, those intervals are regarded as the strongest dissonances. Both, tonal fusion and roughness determine the degree of consonance and dissonance. Thus, tonal fusion in the sense of coinciding neuronal periodicity detections and the extensively discussed phenomenon of roughness (Aures 1985; Terhardt 1974; 1976) are decisive for the sensation of consonance and dissonance.
References Aures, W.: Ein Berechnungsverfahren der Rauhigkeit. Acustica 58, 268–281 (1985) Aures, W.: Der sensorische Wohlklang als Funktion psychoakustischer Empfindungsgrößen. Acustica 58, 282–290 (1985) Cariani, P.A., Delgutte, B.: Neural Correlates of the Pitch of Complex Tones. Journal of Neurophysiology 76(3), 1698–1734 (1996)
154
M. Ebeling
de Cheveigné, A.: Pitch perception models. In: Plack, C., Fay, R.R., Oxenham, A.J., Popper, A.N. (eds.) Pitch – Neural Coding and Perception, pp. 169–233. Springer, New York (2005) Ebeling, M.: Verschmelzung und neuronale Autokorrelation als Grundlage einer Konsonanztheorie. Peter Lang Verlag, Frankfurt a. M. (2007) Goldstein Julius, L., Scrulovic, P.: Auditory-nerve spike intervals as an adequate basis for aural spectrum analysis. In: Evans, E.F. (ed.) Psychophysics and Physiology of Hearing, pp. 337– 345. Academic, London (1977) Hartmann, W.M.: Signal, Sound, and Sensation. Springer, New York (2000) v. Helmholtz, H.: On the sensations of tone. Tr. Ellis, Alexander J. Dover, New York (1885); reprint 1954 Hesse, H.-P.: Musik und Emotion. Wissenschaftliche Grundlagen des Musik-Erlebens. Springer, Wien (2003) Kaernbach, C., Demany, L.: Psychophysical evidence against the autocorrelation theory of auditory temporal processing. Journal of the Acoustical Society of America 104(4), 2298– 2306 (1998) Kameoka, A., Kuriyagawa, M.: Consonance Theory. Journal of the Acoustical Society of America 45(6), 1451–1469 (1969) Langner, G.: Evidence for neuronal periodicity detection in the auditory system of the guinea fowl implications for pitch analysis in the time domain. Experimental Brain Research 52, 333–355 (1983) Langner, G.: Neuronal mechanisms underlying the perception of pitch and harmony. Annals of the New York Academy of sciences 1060, 50–52 (2005) Langner, G.: Temporal Processing of Periodic Signals in the Auditory System: Neuronal Representation of Pitch, Timbre, and Harmonicity. Zeitschrift für Audiologie 46(1), 8–21 (2007) Langner, G., Schreiner, C.E.: Periodicity Coding in the Inferior Colliculus of the Cat. Journal of Neurophysiology 60(6), 1799–1822 (1988) Licklider, J.C.R.: A Duplex Theory of Pitch Perception. Experimenta VII/4, 128–134 (1951) Meddis, R., Hewitt, M.J.: Virtual pitch and phase sensitivity of a computer model of the auditory periphery. Journal of the Acoustical Society of America 89(6), 2866–2894 (1991) Papoulis, A.: The Fourier Integral And Its Applications. McGraw-Hill, New York (1962) Parncutt, R.: Harmony. A psychoacoustic approach. Springer, Berlin (1989) Parncutt, R.: A model of the percetual root(s) of a chord accounting for voicing and prevailing tonality. In: Leman, M. (ed.) Music, gestalt, and computing - Studies in cognitive and systematic musicology, pp. 181–199. Springer, Berlin (1997) Patterson, R.D., Allerhand, M.H., Giguère, C.: Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America 98(4), 1890–1894 (1995) Plomp, R., Levelt, W.J.D.: Tonal consonance and critical bandwidth. Journal of the Acoustical Society of America 35, 548–560 (1965) Rhode, W.S.: Interspike intervals as correlate of periodicity pitch in cat cochlear nucleus. Journal of the Acoustical Society of America 97(4), 2414–2429 (1995) Stumpf, C.: Tonpsychologie. Knuf, Hilversum (1890); reprint (1965) Terhardt, E.: On the Perception of Periodic Sound fluctuations (Roughness). Acustica 30, 201– 213 (1974) Terhardt, E.: Ein psychoakustisch begründetes Konzept der Musikalischen Konsonanz. Acustica 36, 121–137 (1976/1977) Tramo, M.J., Cariani, P.A., Delgutte, B., Braida, L.D.: Neurobiological Foundations for the Theory of Harmony in Western Tonal Music. In: Zatorre, R.J., Peretz, I. (eds.) The Biological Foundations of Music. Annals of the New York Academy of Sciences, vol. 930, pp. 92–116 (2001)
Calculating Tonal Fusion by the Generalized Coincidence Function
155
Voutsas, K., Langner, G., Adamy, J., Ochse, M.A.: A brain-like neural network for periodicity analysis. IEEE Trans. Syst. Man Cybern. B: Cybern. 35, 12–22 (2005) Wever, E.G.: Theory of Hearing. Wiley, New York (1949); reprint 1965 Wiener, N.: Generalized Harmonic Analysis. Acta Mathematica 55, 117–258 (1930) Yost, W.A., Patterson, R., Sheft, S.: A time domain description for the pitch strength of iterated rippled noise. Journal of the Acoustical Society of America 99(2), 1066–1077 (1996) Zhang, X., Heinz, M.G., Bruce, I.C., Carney, L.H.: A phenomenological model for the responses of auditory –nerve fibers. I. Nonlinear tuning with compression and suppression. Journal of the Acoustical Society of America 109(2), 648–670 (2001) Zwicker, E., Fastl, H.: Psychoacoustics. Springer, Berlin (1999)
Predicting Music Therapy Clients’ Type of Mental Disorder Using Computational Feature Extraction and Statistical Modelling Techniques Geoff Luck*, Olivier Lartillot, Jaakko Erkkilä, Petri Toiviainen, and Kari Riikkilä Department of Music, P.O. Box 35 (M), 40014, University of Jyväskylä, Finland [email protected] Abstract. Background. Previous work has shown that improvisations produced by clients during clinical music therapy sessions are amenable to computational analysis. For example, it has been shown that the perception of emotion in such improvisations is related to certain musical features, such as note density, tonal clarity, and note velocity. Other work has identified relationships between an individual’s level of mental retardation and features such as amount of silence, integration of tempo with the therapist, and amount of dissonance. The present study further develops this work by attempting to predict music therapy clients’ type of mental disorder, as clinically diagnosed, from their improvisatory material. Aim. To predict type of mental disorder from computationally-extracted musical features of music therapy improvisations. Method. Two hundred and sixteen music therapy improvisations, obtained from seven music therapists’ regular sessions with their clients, were collected in MIDI format. A total of fifty clients contributed musical material, and these clients were divided into three groups according to their clinical diagnosis: RET (mentally retarded), DEV (developmental disorder), and NEU (neurological disorder). The improvisations were subjected to a musical feature extraction procedure in which 43 musical features were automatically extracted in the MATLAB computing environment. These features were then entered into a discriminant function analysis as predictors of type of mental disorder. Results. The analysis produced two significant discriminant functions, and the emergent model correctly classified 80% of clients. Significant predictor variables fell into three main categories: those relating to pitch, those relating to temporal aspects, and those relating to tonal clarity and dissonance. Conclusions. The present study suggests that an individual’s type of mental disorder can be predicted from a statistical analysis based upon the computational extraction of detailed musical features from their improvisatory material. As such, it offers further evidence for the usefulness of computational music analysis in music therapy.
1 Introduction This study was born out of a three-year project at the Department of Music of the University of Jyväskylä, Finland, entitled Intelligent Systems in Music Therapy. The *
Corresponding author.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 156–167, 2009. © Springer-Verlag Berlin Heidelberg 2009
Predicting Music Therapy Clients’ Type of Mental Disorder
157
project was funded by the Academy of Finland, and brought together an international team of scholars from fields such as music therapy, psychology, computer science, and mathematics. The primary aim of the project was to combine computational analysis techniques with traditional music therapy practice to develop evidence-based methods in improvisational music therapy. Improvisation-based therapy is a widely-used form of music therapy, and is designed to improve the communication abilities of the client. It has been utilised with a wide range of clinical populations, and a variety of improvisational music therapy methods are currently in use (see Wigram 2004 for a discussion). Given that the level of communication a client is able to achieve is likely to be related to their clinical condition, it seems logical that at least some aspects of a client’s clinical condition might be revealed directly in the music they produce. Until recently, the relationship between improvised material and clients’ clinical condition has been mainly examined by subjective analysis of recordings and transcriptions of the material (e.g., Bruscia 1982), or through the use of questionnaires (e.g., DiGiammarino 1990). However, we believe that computational investigation of musical processes can offer more objective and accurate methods for the analysis of clinical improvisations.
2 Previous Music Therapy Research One of the advantages of using questionnaires is that they allow large samples to be investigated. For instance, DiGiammarino (1990) conducted an impressive surveybased study concerning the musical skills of 120 adult and elderly individuals with mental retardation. Participants were grouped into those with profound or severe retardation, moderate retardation, or mild retardation. DiGiammarino found that, while individuals with profound or severe mental retardation were rated as the least able in terms of instrumental skills1 (as might be expected), it was the moderately retarded group who were rated as the most capable overall, as opposed to those with mild levels of retardation. One possible reason for DiGiammarino’s (1990) finding, that individuals with mild mental retardation were rated as being less capable than individuals with moderate mental retardation in all but one of the listed instrument-related music skills, could lie in the fact that questionnaires concerning individuals with mental disorders, such as mental retardation, are often completed by parents or teachers. This indirect nature of data collection may lead to inaccurate data being gathered. In particular, field-specific terminology, such as that used in musicology, may lead to questions being misunderstood, and incorrectly answered, by those who complete them. Thus, a more direct method of data collection is desirable. Anecdotal reports, such as case studies, typically utilize rather more direct subjective observations than questionnaire-based research. In anecdotal reports, the analysis methodology is rarely specified, but qualitative analysis of audio or video recordings is commonly used since it is very challenging for a music therapist to both lead a session and try to observe musical details at the same time. 1
The original aim of DiGiammarino’s study was to assess clients’ musical skills; information about how skills may relate to clinical diagnosis was a by-product.
158
G. Luck et al.
Anecdotal client histories have been reported in relation to clients with various diagnoses. As with DiGiammarino’s (1990) questionnaire-based study, however, none of the anecdotal studies described in the literature have had the specific aim of diagnosing a client’s condition. Any connections made between the skills demonstrated by clients with various diagnoses are unsystematic, and occasional relationships are noted by the authors. Nonetheless, anecdotal evidence, such as that based on qualitative analysis of audio or video recordings, is more direct, and thus perhaps more reliable than questionnaire-based data. However, it still lacks a degree of objectivity, being frequently based upon a single therapist’s observations and conclusions. To be sure, some level of objectivity can be achieved if multiple observers are used by checking inter-rater reliability. Moreover, qualitative and quantitative approaches should also be seen as being complimentary to each other. Still, qualitative methods are generally regarded as being less objective than quantitative methods (see, for example, Kleinig & Witt 2001). The use of MIDI-based analysis has increased in recent years, both due to its wider availability, and the fact that accuracy and objectivity can be achieved with less effort. MIDI is a protocol for transmitting musical information between devices such as keyboards, synthesizers, and computers, and it describes music as a set of performance actions rather than as the sounded result. Such actions include, for instance, note-on, note-off, and volume. These actions can be stored in the MIDI File Format, which is usually the starting point of MIDI-based analyses. Crucially, MIDI data permits both detailed analysis and visualisation of a client’s performance. An early example of the use of MIDI-based analysis is reported by Spitzer (1989), who used computer-based musical skill training in a therapy setting, employing a MIDI keyboard to record some of the clients’ performances. Miller and Orsmond (1994), meanwhile, conducted a study in which a relatively heterogeneous sample population (including autistic and developmentally disabled individuals) improvised freely on a keyboard, after which the authors carried out an aural analysis of the recorded MIDI sequences. A similar study was reported by Orsmond and Miller (1995). However, while the adoption of a potentially more objective MIDI-based approach has improved upon the indirect or subjective nature of data collected in questionnaire and case history studies, and begun to clarify relationships between diagnoses and musical functioning, little research in this area has been carried out to date. In particular, one of the key strengths of a MIDI-based approach remains relatively unexploited, namely the possibility to carry out an automated analysis of a client’s improvisation. Such an automated analysis would be based upon the computational extraction of a set of musical features from the improvised material. These features, and their relationship to certain characteristics of the client, can then be examined using a variety of statistical techniques. The MIDI-based approach allows one to easily examine the relationship between a client’s level of cognitive functioning and the musical performances they produce. More specifically, one can investigate relationships between a client’s diagnosed clinical condition and the features which characterise their improvised material. With this in mind, we turn now to the computational music analysis.
3 Computational Music Analysis Algorithms for the computational extraction of musical features have been developed for both audio and MIDI representations of music (e.g., Downie 2003; Leman 2002).
Predicting Music Therapy Clients’ Type of Mental Disorder
159
These algorithms are usually based on methods of signal processing, music processing, machine learning, cognitive modelling, and visualization. Typical application areas of these algorithms include computational music analysis (e.g. Lartillot 2004; 2005; Cambouropoulos 2006), automatic classification (e.g. Toiviainen & Eerola 2006; Pampalk, Flexer, & Widmer 2003), organization (e.g. Rauber, Pampalk, & Merkl 2003) and transcription (e.g. Klapuri 2004) of music, as well as content-based retrieval (Lesaffre et al. 2003). Two recent studies have applied the computational approach to the analysis of music therapy improvisations. Luck et al. (2006) investigated relationships between clients’ level of mental retardation and computationally-extracted features of their improvisations. They found that improvisations produced by clients with higher levels of mental retardation (that is, with lower levels of cognitive functioning) were characterised by features such as long periods of silence, better integration of tempo with the therapist, and higher amounts of dissonance, compared to those with lower levels of mental retardation. In a related study, Luck et al. (2008) utilized computational extraction of musical features and statistical modelling to examine perception of emotion in music therapy improvisations. In the context of a three-dimensional model of emotion (e.g., Osgood, Suci & Tannenbaum 1957), listeners provided separate continuous ratings of activity, pleasantness, and strength, and a series of regression models indicated that perception of emotion was related to musical features such as note density, tonal clarity, and note velocity. There is a compelling need to develop objective methods of improvisation analysis due to the increasing dependence upon evidence-based forms of treatment. Moreover, explicit knowledge concerning relationships between musical features and diagnostic populations would allow therapists to generate realistic expectations regarding their clients’ progress. The present study demonstrates the feasibility of this approach in predicting clients’ type of mental disorder from their improvised material.
4 Method A total of seven qualified music therapists, five from leading Finnish institutions for the intellectually disabled2 and two private practitioners, provided the researchers with 216 music therapy improvisations produced by 50 clients. The improvisations were produced in regular therapy sessions using a pair of identical 88 key weightedaction MIDI keyboards (Fatar Studiologic 880-PRO master keyboard), and recorded using Cubase MIDI sequencing software. To give some idea of the amount of data collected, the combined duration of the improvisations was 26 hours and 40 minutes, and a total of 779,803 notes were recorded. Of these, the therapists played 359,967 notes, and the clients 419,836. The average length of the improvisations was 7 min 24 sec (sd = 4 min 45 sec). The clients were divided into three groups according to their clinical diagnosis3: RET (mentally retarded), DEV (developmental disorder), and NEU (neurological disorder). 2
Pääjärvi Federation of Municipalities, Rinnekoti-Foundation, Satakunta District of Services for Intellectually Disabled, and Suojarinne Federation of Municipalities. 3 Each client’s clinical condition was diagnosed by a Medical Doctor prior to their attendance at the first therapy session. Diagnoses were based upon the International Classification of Diseases (ICD) codes.
160
G. Luck et al.
The number of improvisations contributed by each group was as follows: RET = 34, DEV = 87, NEU = 644. Thus, a total of 185 improvisations were selected for analysis. The analysis was carried out with algorithms developed specifically for this study in MATLAB using the MIDI Toolbox (Eerola & Toiviainen 2004). Firstly, the temporal evolution of 15 musical features was computed by moving a 6 s sliding window along the musical sequence at successive intervals of 1000 ms, and analysing the contents of each successive window. In what follows, each of the musical features is listed, and, where appropriate, explained in more detail. A. Temporal surface features. These features were based on the MIDI note onset and offset positions, and were computed for each position of the sliding window (except feature 4). 1. Note density. Number of notes in the window divided by the length of the window. 2. Average note duration in the window. 3. Articulation. Proportion of short silences in the window. Short silences were defined as intervals not longer than two seconds during which no note was played. These short silences are not included in the silence factor, as they are generally not perceived as real silence, but rather as intermediate pauses characterising the performance style. Values close to zero indicate legato playing, while values close to one indicate staccato playing. 4. Silence factor. Proportion of long silences within the whole improvisation. Long silences were defined as time intervals longer than two seconds during which no note was played. The silence factor is given by the sum of all these silence intervals divided by the total length of the musical excerpt. Note that, unlike the other features which are time-series in nature, this is a scalar feature. It is included here because it relates to the temporal surface of the improvisations. B. Register-related features. These features were based on the MIDI pitch values of notes, and were computed for each position of the sliding window. 5. Mean pitch. 6. Standard deviation of pitch. C. Dynamic-related feature. This feature was based on the MIDI velocity parameter, and was computed for each position of the sliding window. 7. Mean velocity. D. Tonality-related features. These features, based on the Krumhansl-Schmuckler key-finding algorithm (Krumhansl 1990), give a statistical assessment of the tonal dimension of the improvisations, and were computed for each position of the sliding window. 8. Tonal clarity. To calculate the value of this feature, the pitch-class distribution within the window was correlated with the 24 key profiles representing each key (12 major keys and 12 minor keys). The maximal correlation value was taken to represent tonal clarity. 4
Thirty-one improvisations were produced by clients who could not be categorised into one of these groups, and were thus excluded from the analysis.
Predicting Music Therapy Clients’ Type of Mental Disorder
161
9. Majorness. Calculated as Tonal clarity, but only the 12 major key profiles were considered. 10. Minorness. Calculated as Tonal clarity, but only the 12 minor key profiles were considered. E. Dissonance-related features 11. Sensory dissonance. Musical dissonance is partly founded on cultural knowledge and normative expectations, and is more suitable for the analysis of improvisation by expert rather than by non-expert musicians5. More universal is the concept of sensory dissonance (Helmholtz 1877/1954), which is related to the presence of beating phenomena caused by frequency proximity of harmonic components. Sensory dissonance caused by a limited number of sinusoids can be easily predicted. The global sensory dissonance generated by a cluster of harmonic sounds is subsequently computed by adding the elementary dissonances between all the possible pairs of harmonics (Plomp & Levelt 1965; Kameoka & Kuriyagawa 1969). In the present study, the dissonance measure is based on the instrumental sound (MIDI default piano sound) used during all the improvisations. Since successive notes may also appear dissonant, even when not played simultaneously, we also took into consideration the beating effect between notes currently played and notes remaining in a short-term memory (fixed in our model to 1000 ms). Sensory dissonance was calculated every 1000 ms. F. Pulse-related features. A method was developed which enabled the automatic detection of rhythmic pulsations in MIDI files. More precisely, a temporal function was first constructed by summing Gaussian kernels, that is, narrow bell curves, centred at the onset point of each note. The height of each Gaussian kernel was proportional to the duration of the respective note; the standard deviation (i.e., the width of the bell curve) was set to 50 ms (see Toiviainen & Snyder 2003). Subsequently, the obtained function was subjected to autocorrelation using temporal lags between 250 ms and 1500 ms, corresponding to commonly presented estimates for the lower and upper bounds of perceived pulse sensation (Westergaard 1975; Warren 1993). In accordance with findings in the music perception literature, the values of the autocorrelation function were weighted with a resonance curve having its maximal value at a period of 500 ms (Toiviainen 2001; see also van Noorden & Moelants 1999). The obtained function will subsequently be referred to as the pulsation function. Like all the other musical parameters, the pulsation function was computed for each successive position of the sliding window. The analysis of a complete improvisation results in a two-dimensional diagram called a pulsation diagram. The x-axis indicates the temporal progression of the improvisation, and the y-axis indicates the tempos of the pulsations. Tempos are expressed as pulse periods, which correspond to the inverse of tempo. On the y-axis, the periods range from 250 ms (corresponding to 5
Musical dissonance is a complex notion related to the quality and stability of sounds and of musical discourse. In music theory, dissonance is generally related to culturally conditioned senses of tension related to musical structures considered as unstable and leading to the expectation of their resolution. This vision of dissonance is hence restricted to the study of improvisations complying with the related musical tradition, and would therefore not be applicable to the non-expert performances that are the objects of our study.
162
G. Luck et al.
a tempo of 240 beats per minute (bpm)) to 1500 ms (corresponding to a tempo of 40 bpm). From the Pulsation Diagrams, two musical features were deduced: 12. Individual pulse clarity. The evolution of client’s and therapist’s pulse clarity is obtained by collecting the maximal values of each successive column in the respective pulsation diagram. 13. Individual tempo. The evolution of client’s and therapist’s tempo is obtained by collecting the tempo values associated with the maximum values of each successive column in the respective pulsation diagram. Based on the features described thus far, we derived 24 variables that were based on the clients’ improvisations only. These were the following: For features 1 – 3, and 5 – 13, we calculated mean and variance for the client. In addition, feature 4 (silence factor) was extracted for the client.
5 Quantifying the Client-Therapist Interaction In order to quantify the client-therapist interaction we calculated a number of variables that were based upon the musical features derived from both the clients’ and the therapists’ playing. Firstly, for features 5 and 7 we calculated the average difference between client and therapist. The decision to calculate only these two differences, and not differences between client and therapist for other features, was based on clinical work. Music therapists often tend to take ‘bass-line position’, i.e., to play in the lower register, in order to enable give the role of soloist (the higher register) to the client. If the client has the capacity to understand and employ such roles, this should be seen as a difference in average mean pitch. Average mean velocity is another variable where the nature of basic expression and communication can be seen. In the conscious role of a soloist, one probably tends to play louder than the accompanist, while the accompanist tends to play softer, thus giving room for the soloist. If the client has the capacity to understand and employ such roles, this should be seen as a difference in average mean velocity. In order to assess the common pulsation developed in synchrony by both players, a new diagram called a synchronised pulsation diagram was produced by multiplying each individual player's values at respective points of their related pulsation diagrams. Two features were derived from the common pulsation diagram: 14. Common pulse clarity. Similarly to individual pulse clarity, the evolution of common pulse clarity is given by the maximal pulsation values in the synchronised pulsation diagram. 15. Common tempo. Similarly to individual tempo, the evolution of common tempo is given by the tempos related to the maximal pulsation values in the synchronised pulsation diagram. Another important dimension of musical expression that is of particular interest in music therapy is the degree of communication between the therapist and the client. In particular, when communication takes place, players imitate one another or jointly elaborate same gestures. The musical dialog may therefore be assessed by observing
Predicting Music Therapy Clients’ Type of Mental Disorder
163
the degree of local similarity between the temporal evolutions of both improvisations, along the different features presented in the previous section. To quantify the degree of client-therapist interaction on different musical dimensions, we derived the following 18 musical variables: For features 5 and 7, we calculated the average difference between client and therapist. Table 1. Computed features and variables (shown by asterisks). Of the variables, 25 are clientonly, and 18 relate to the client-therapist interaction.
Features
Variables Client only
Client and therapist Average
Common
Common
difference
mean
variance
14. Common pulse clarity
*
*
15. Common tempo
*
*
Mean
Variance
Integration
1. Note density
*
*
*
2. Average note duration
*
*
*
3. Articulation
*
*
*
4. Silence factor
* (scalar)
5. Mean pitch
*
*
*
6. SD of pitch
*
*
*
7. Mean velocity
*
*
*
8. Tonal clarity
*
*
*
9. Majorness
*
*
*
10. Minorness
*
*
*
11. Sensory dissonance
*
*
*
12. Pulse clarity
*
*
*
13. Tempo
*
*
*
*
*
*
164
G. Luck et al.
For features 1 – 3, and 5 – 13, we calculated the integration between the client and the therapist. For features 14 and 15 we calculated the mean and variance. In summary, 43 variables (25 client-only and 18 client-therapist) were thus extracted, and it was these variables that were used in subsequent statistical analyses. Table 1 summarizes these features and variables.
6 Results The 43 variables were entered stepwise into a discriminant function analysis as predictors of type of mental disorder. Prior probabilities were based on group sizes. The analysis produced two significant discriminant functions, and, after cross-validation, exactly 80% of clients were correctly classified by the model. Figure 1 shows the two functions plotted against each other.
3
2
1
NEU
Function 2
DEV 0
-1
DGGROUP
RET
Group Centroids
-2
NEU -3 DEV RET
-4 -4
-3
-2
-1
0
1
2
3
4
Function 1
Fig. 1. Function 1 plotted against Function 2
Function 1 was comprised of the following variables: client’s average articulation, integration of articulation between client and therapist, client’s average mean pitch, client’s average dissonance, client’s average standard deviation of pitch, and integration of pulse clarity between client and therapist. Function 2, meanwhile, was comprised of: client’s average pulse clarity, client’s amount of silence, variation in client’s standard deviation of pitch, and integration of tonal clarity between client and therapist. It can be seen that function 1 discriminates fairly equally between all three groups, while function 2 discriminates mainly between DEV/NEU and RET. Overall, the significant variables can be classified as being either non-contextual pitch-related, contextual pitch-related, or temporal in nature.
Predicting Music Therapy Clients’ Type of Mental Disorder
165
7 Discussion The present study demonstrates that an individual’s type of mental disorder can be predicted with a high degree of accuracy from a statistical analysis based upon the computational extraction of detailed musical features from their improvised material. As such, it offers further evidence for the usefulness of computational music analysis in music therapy. Clients’ clinical condition was clearly revealed in the characteristics of the music they produced, suggesting that a client’s communication abilities are related to their clinical condition. Thus, this study provides an effective and objective method of improvisation analysis to help satisfy the increasing need for evidence-based forms of treatment. Moreover, the explicit relationships between musical features and diagnostic populations identified using this method will allow therapists to generate realistic expectations regarding their clients’ progress. In sum, this study demonstrates the feasibility of this approach in predicting clients’ type of mental disorder from their improvised material. The techniques described in this study, as well as those described in our previous work (Luck et al. 2006; Luck et al. 2008), have three main advantages over traditional methods of analysis. First, they offer an objective assessment of a client’s therapy session; in essence, the subjective (human) element, and its associated shortcomings, is removed. Second, the computational approach offers both speed and precision. This is particularly useful for long therapy processes, such as those lasting several weeks, months, or even years, which produce an enormous amount of data for a therapist to digest and consolidate. Third, our approach lightens the cognitive load of the therapist, enabling multiple aspects of the therapy process to be analysed simultaneously. We do not, of course, suggest that this type of analysis should be used in place of a clinical diagnosis. Nor do we suggest that therapists should rely solely on the computational approach. Instead, we envisage computational methods being used alongside traditional diagnostic and analytical methods to offer a more complete overall approach. To this end, we are developing a music therapy toolbox (MTTB) for MATLAB in conjunction with a stand-alone application for use by music therapists, researchers, and practitioners. The toolbox is dedicated to the analysis of solo and duo improvisations recorded in MIDI format, each improvisation being stored on a separate track. A screenshot of the stand-alone application is shown in figure 2. A graphical user interface displays the different analyses computed by the algorithms in a condensed visual form, and offers a list of commands organized along menus displayed at the top of the window. Once an improvisation is loaded, a traditional piano-roll representation of the chosen MIDI file, for both improvisers, is displayed at the upper part of the main window. The ‘Improvisation’ menu also offers the possibility of playing the MIDI sequence through an external MIDI sequencer. In addition, the piano-roll representation can be zoomed in and out, and the start and end point of the extract to be analysed can be specified in two ways: either manually by moving a slider, or more precisely by entering an exact instant time in seconds. The analysis is performed on a selection of musical features that can be freely specified by the user. The results of the analysis are displayed in the window as a series of curves temporally aligned to the piano-roll representation of the MIDI sequence. For each musical feature, the client- and therapist-related components are
166
G. Luck et al.
Fig. 2. Screenshot of the MTTB standalone music therapy application
displayed in two distinct colours: respectively green and white. Synchronicity, on the contrary, is related to both players in the same time frame, represented by a simple black curve. The analysis curves can both be zoomed in and out as well. A multicoloured imitation diagram can be superposed to the curves, as seen for the density, mean pitch, mean velocity and pulse clarity features in figure 2. Warmcoloured linear masses in the diagram indicate strong local imitations. Vertically centred lines indicate synchronous imitation between both players. When the line is at the upper side of the diagram, the client imitates the therapist after a specific delay, displayed by the vertical axis, in seconds. Similarly, when the line is at the lower side of the diagram, the therapist imitates the client. Finally, the length of the line indicates the duration of the imitation. This is just a short summary of the features of the MTTB and the standalone application. Both the toolbox and standalone application will initially be released for Windows OS, and later for Mac OS. It is hoped that this application will be adopted by music therapists, researchers, and practitioners in Finland and beyond.
References Bruscia, K.E.: Music in the assessment and treatment of echolalia. Music Therapy 2, 25–41 (1982) Cambouropoulos, E.: Musical Parallelism and Melodic Segmentation: A Computational Approach. Music Perception 23, 249–268 (2006) DiGiammarino, M.: Functional music skills of persons with mental retardation. Journal of Music Therapy 27, 209–220 (1990)
Predicting Music Therapy Clients’ Type of Mental Disorder
167
Downie, J.S.: Music information retrieval. Annual Review of Information Science and Technology 37, 295–340 (2003) Eerola, T., Toiviainen, P.: MIDI Toolbox: MATLAB tools for music research. University of Jyväskylä, Kopijyvä, Jyväskylä, Finland (2004) Klapuri, A.: Automatic music transcription as we know it today. Journal of New Music Research 33, 269–282 (2004) Kleining, G., Witt, H.: Discovery as basic methodology of qualitative and quantitative research. Forum: Qualitative Social Research 2(1) (February 2001), http://qualitativeresearch.net/fqs-texte/1-01/1-01kleiningwitt-e.htm (accessed April 24, 2008) Lartillot, O.: A Musical Pattern Discovery System Founded on a Modelling of Listening Strategies. Computer Music Journal 28, 53–67 (2004) Lartillot, O.: Multi-dimensional motivic pattern extraction founded on adaptive redundancy filtering. Journal of New Music Research 34, 375–393 (2005) Leman, M.: Musical audio mining. In: Rotterdam, J.M. (ed.) Dealing with the Data Flood: Mining data, text and multimedia. STT Netherlands Study Centre for Technology Trends (2002) Lesaffre, M.K.T., Martens, G., Moelants, D., Leman, M., De Baets, B., De Meyer, H., Martens, J.-P.: The MAMI Query-By-Voice Experiment: Collecting and annotating vocal queries for music information retrieval. In: 4th International Conference on Music Information Retrieval (ISMIR 2003), Baltimore, Maryland, USA and Library of Congress, Washington, DC, USA, October 26-30 (2003) Luck, G., Toiviainen, P., Erkkilä, J., Lartillot, O., Riikkilä, K., Mäkelä, A., Pyhäluoto, K., Raine, H., Varklia, L., Värri, J.: Modelling the relationships between emotional responses to, and musical content of, music therapy improvisations. Psychology of Music 36, 25–46 (2008) Luck, G., Riikkilä, K., Lartillot, O., Erkkilä, J., Toiviainen, P., Mäkelä, A., Pyhäluoto, K., Raine, H., Varklia, L., Värri, J.: Exploring relationships between level of mental retardation and features of music therapy improvisations: a computational approach. Nordic Journal of Music Therapy 15, 30–48 (2006) Miller, L.K., Orsmond, G.I.: Assessing structure in the musical explorations of children with disabilities. Journal of Music Therapy 31, 248–265 (1994) Orsmond, G.I., Miller, L.K.: Correlates of musical improvisation in children with disabilities. Journal of Music Therapy 32, 152–166 (1995) Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illinois Press, Champaign (1957) Pampalk, E., Flexer, A., Widmer, G.: Improvements of Audio-Based Music Similarity and Genre Classification. In: 6th International Conference on Music Information Retrieval (ISMIR 2005), London, UK, September 11-15 (2005) Rauber, A., Pampalk, E., Merkl, D.: The SOM-enhanced JukeBox: Organization and visualization of music collections based on perceptual models. Journal of New Music Research 32, 193–210 (2003) Spitzer, S.: Computers and music therapy: An integrated approach. Four case studies. Music Therapy Perspectives 7, 51–54 (1989) Toiviainen, P., Eerola, T.: Autocorrelation in meter induction: The role of accent structure. Journal of the Acoustical Society of America 119, 1164–1170 (2006) Wigram, T.: Improvisation. Methods and techniques for music therapy clinicians, educators and students. Jessica Kingsley, London (2004)
Nonlinear Dynamics, the Missing Fundamental, and Harmony Julyan H.E. Cartwright1 , Diego L. Gonz´alez2, and Oreste Piro3 1
Instituto Andaluz de Ciencias de la Tierra, CSIC-UGR, Granada, Spain Istituto per la Microelettronica e I Microsistemi, CNR, Bologna, Italy 3 Instituto de F´ısica Interdisciplinary Sistemas Complejos, CSIC-UIB, Palma de Mallorca, Spain
2
Abstract. We review the historical and current theories of musical pitch perception, and their relationship to the intriguing phenomenon of residue pitch. We discuss the nonlinear dynamics of forced oscillators, and the role played by the Fibonacci numbers and the golden mean in the organization of frequency locking in oscillators. We show how a model of the perception of musical pitch may be constructed from the dynamics of oscillators with three interacting frequencies. We then present a mathematical construction, based on the golden mean, that generates meaningful musical scales of different numbers of notes. We demonstrate that these numbers coincide with the number of notes that an equal-tempered scale must have in order to optimize its approximation to the currently used harmonic musical intervals. Scales with particular harmonic properties and with more notes than the twelve-note scale now used in Western music can be generated. These scales may be rooted in objective phenomena taking place in the nonlinearities of our perceptual and nervous systems. We conclude with a discussion of how residue pitch perception may be the basis of musical harmony.
1 Pitch Perception From the beginning of acoustics research great efforts have been devoted to the elucidation of the mechanisms by means of which our auditory system can with astonishing performance analyse and discriminate between complex sounds. In particular, pitch perception has been a subject of great interest, not least due to the key role played by pitch in music. The first attempts to explain the pitch of complex sounds on a physical basis were made as early as the middle of the nineteenth century, just after Fourier methods were developed. The original approach, put forward by Ohm (1843), considered pitch as a consequence of the ability of the auditory system to perform Fourier analysis on acoustical signals. In this view, a physical Fourier component of frequency ω0 is needed in the incoming stimulus in order to have a sensation of pitch matching that of a pure sinusoidal wave of the same frequency. However, this approach quickly runs into difficulties. Contemporaneously with the work of Ohm, Seebeck (1843) showed that if the fundamental frequency (or even the first few higher harmonics) is (are) removed from the spectrum of a periodic sound signal, the perceived pitch remains unchanged and matches the pitch of a sinusoidal sound T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 168–188, 2009. c Springer-Verlag Berlin Heidelberg 2009
Nonlinear Dynamics, the Missing Fundamental, and Harmony
169
with the frequency of the missing fundamental. As some facts about the perception of the missing fundamental can be described quite naturally in terms of the stimulus periodicity, Seebeck proposed a periodicity detection theory for the pitch perception of complex sounds. In this way was born an historical controversy between spectral and periodicity theories, that, in our opinion, has to date not been completely resolved. von Helmholtz (1863) reinforced Ohm’s view, asserting that the ear acts as a rough Fourier analyser, and launched the hypothesis that this analysis is performed in the basilar membrane of the cochlea. Moreover, to explain the pitch of the missing fundamental, he proposed that a physical component at this frequency can be generated by the nonlinearities of the ear as a difference combination tone. However, with the aid of electronic equipment just then becoming available, Schouten (1962) performed a very well-conceived experiment which demonstrated that the missing fundamental is not a difference combination tone. He elaborated a theory of pitch based on the periodicity properties of the unresolved or residue components of the stimulus. Subsequently, von B´ek´esy (1960) demonstrated experimentally that the hypothesis of von Helmholtz is essentially correct, that is, the basilar membrane effects a rough Fourier analysis of the incoming stimulus. Later, central processor theories for pitch perception arose (Goldstein 1973; Wightman 1973; Terhardt 1974), while the present situation is represented by, on one hand, the spectral model of Cohen et al. (1995), and on the other, the temporal model of Meddis and Hewitt (1991) the heir to the periodicity throne.
2 Residue Behaviour Suppose that a periodic signal be presented to the ear. The pitch of the signal can be quantitatively well described by the frequency of the fundamental, say ω0 ; see Fig. 1(a). The number of harmonics and their relative amplitudes gives the timbral characteristics to the sound (the typical examples are musical sounds). Now suppose that the fundamental and perhaps some of the first few higher harmonics are removed (this series of partials, not necessarily multiples of a fundamental, is termed a complex sound, or complex). Although the timbral sensation changes, the pitch of the complex remains unchanged and equal to the missing fundamental (Fig. 1(b)). This is termed residue perception. The first explanation of this phenomenon associated the residue with a difference combination tone. A difference combination tone arises from a nonlinear interaction of two pure tones and has a frequency given by the difference between the frequencies of the pure tones. For the case of a harmonic complex sound it is clear that the difference combination tone between two successive partials has the frequency of the missing fundamental i.e., (n + 1)ω0 − nω0 = ω0 . But if we now shift all the harmonics by the same amount Δω (Fig. 1(c)), the difference combination tone remains unchanged and the same should be true of the residue. This is the experiment undertaken by Schouten (1962) with negative result: he found that the perceived pitch also shifts, showing a linear dependence on Δω. This phenomenon is known as the first pitch-shift effect and has been accurately measured in many independent experiments. In Fig. 2 we show a graph of the pitch shift for three different measurements. We see that the slopes of the lines decrease with increasing frequency of the central harmonic of
170
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
Fig. 1. (a): Fourier spectrum of a harmonic complex sound with pitch P determined by its fundamental ω0 . (b): Fourier spectrum of a harmonic complex sound. The partials are successive multiples (k, k + 1, . . .) of the missing fundamental ω0 that determines the pitch of the complex. (c): Fourier spectrum of an anharmonic complex sound. The partials are obtained by a uniform shift Δω from the harmonic situation. Although the difference combination tones between successive partials remain unchanged and equal to the missing fundamental, the perceived pitch shifts by a quantity ΔP that depends linearly on Δω.
Fig. 2. Pitch as a function of the central frequency n = k + 1 for a three-component complex tone (k, k + 1, k + 2). Circles, triangles and dots represent data for three different subjects (Schouten 1962). The perceived pitch shifts linearly with the detuning. The dashed lines represent the first pitch-shift effect. Their slopes decrease as 1/n (see text).
the complex. A first attempt to model qualitatively the behaviour of the pitch shift shows that the slope depends roughly on the inverse of the harmonic number of the central partial of the complex, say k. However the change in slope is slightly but consistently larger than this (but smaller if we replace k by (k + 1)). This behaviour is known as
Nonlinear Dynamics, the Missing Fundamental, and Harmony
171
the second pitch shift effect. Finally, an enlargement of the spacing between partials while maintaining fixed the central frequency produces a decrease in the residue pitch. As this anomalous behaviour seems to be correlated with the second pitch shift effect it is usually included within it (see Schouten 1962).
3 Nonlinear Dynamics of Forced Oscillators The basic idea in the following is to perform a qualitative modelling of the auditory system, identifying experimental data with structurally-stable behaviour of forced nonlinear oscillators. We can consider the ear as a nonlinear black box and the stimulus as a superposition of a variable number n of purely sinusoidal waveforms. With this aim in mind we briefly review the fundamental dynamical features exhibited by a generic nonlinear oscillator in the case n = 1 and subsequently we show how these results can be extended to the case n = 2. 3.1 n = 1 3.1.1 Synchronization A periodically forced nonlinear oscillator can exhibit an extremely rich variety of responses. The most simple are periodic responses, known also as synchronized or phaselocked responses. In growing order of complexity we can encounter two-frequency quasiperiodic and chaotic responses. We restrict our analysis to a presentation of some fundamental ideas about synchronization and quasiperiodicity. The interested reader can find additional details in (Hao 1989; Jackson 1989; Jackson 1990) and references therein; a good review of the application of nonlinear dynamics to acoustics is (Lauterborn and Parlitz 1988). Synchronization is detected as a rational ratio between the frequency of the external periodic force and the intrinsic frequency of the oscillator, or between two intrinsic frequencies in higher-dimensional autonomous systems. The first description of synchronization is due to Huygens (1665), who observed that the pendulums of two clocks fixed on the same mounting after a time swung synchronously. In this case the two frequencies are equal and we say that we have a 1/1 synchronized response (1/1 being the frequency ratio). More complicated cases arise for an arbitrary rational frequency ratio. A beautiful example in nature is the 3/2 ratio between the orbital and rotational periods of the planet Mercury. A typical forced oscillator, for example the forced van der Pol oscillator (van der Pol and van der Mark 1927; Parlitz and Lauterborn 1987), shows an infinity of these phase-locked solutions depending on the values of the frequency and amplitude of the external force. For a constant value of the amplitude, the effective frequency ratio varies in a complicated manner with the external frequency, describing a nondifferentiable function, a fractal known as the devil’s staircase (see Fig. 3). Every plateau in the graph corresponds to a particular phase-locked solution. The relative widths of the plateaux are locally organized in a hierarchical manner according to a numbertheoretical property of the rationals that characterize them. Two rationals p/q and r/s are said to be adjacents if |q r − p s| = 1. (1)
172
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
Between adjacents we can define a Farey sum operation as follows p r p+r ⊕ = . q s q+s
(2)
The rational obtained is called the mediant of the two adjacents. The mediant gives the local hierarchy of the widths of the plateaux, determining the plateau with the greatest width between two plateaux characterized by adjacent rationals. Repeatedly performing the mediant operation on a pair of adjacent rational numbers, we obtain a Farey tree. The Farey tree provides a qualitative local ordering of two-frequency resonances, and gives rise to a structure of plateaux at all rationals known as the devil’s staircase. The devil’s staircase in turn is the skeleton for the layout of the resonances in parameter space as Arnold tongues (Gonz´alez and Piro 1983; Aronson et al. 1983; Gonz´alez and Piro 1985; Cvitanovi´c et al. 1985; Hao 1989; Arrowsmith et al. 1993). Starting from 0/1 and 1/1, the Farey tree contains 1/2, 2/3, 3/5, 5, 8, 8/13, . . ., i.e., is composed of the Fibonacci numbers, and converges on the number with continuedfraction representation (1, 1, 1, . . .): 0.618 . . ., the reciprocal of the golden mean.
Fig. 3. Devil’s staircase for the forced van der Pol oscillator (from Parlitz and Lauterborn 1987). Each plateau corresponds to a rational phase-locked solution, the denominator being the period of the response measured in periods of the external force.
3.1.2 Quasiperiodicity A quasiperiodic response can be expressed as a sum of periodic functions. The arguments are the linear combinations of a basic set of frequencies s
pi − ω i ,
(3)
i=1
s being the minimum integer for which the equation s
pi ω i = 0
i=1
has no integer solutions others than the trivial one.
(4)
Nonlinear Dynamics, the Missing Fundamental, and Harmony
173
For n = 1 we can have quasiperiodic responses of order s = 2. Difference combination tones, such as (ω2 − ω1 ) for example, are included in this class. However, as we mentioned above, difference combination tones are not adequate for describing the residue. Moreover, two-frequency quasiperiodic responses are structurally unstable in the sense that small perturbations of the system destroy them. Consequently, having in mind the search for structurally stable responses able to reproduce residue behaviour, we increase the dimensionality of the system to n = 2, that is, a nonlinear oscillator forced with two independent external periodic forces. 3.2 n = 2 3.2.1 Synchronization When the external frequency ratio is irrational, periodic solutions cannot exist. When this ratio is rational they are destroyed by small perturbations of the external forces and consequently are not useful for a description of robust behaviour of the auditory system. 3.2.2 Three-Frequency Resonances In the case n = 2 we can have three-frequency quasiperiodic responses. However, an important result in the theory of dynamical systems, the theorem of Newhouse, Ruelle, and Takens (Newhouse and Takens 1978), asserts that three-frequency quasiperiodic responses are structurally unstable, and thus of no interest for our purposes. Another possibility (the last if we exclude chaotic solutions) is three-frequency resonant responses. These cannot be expressed as a linear superposition for s = 2, but are not truly three-frequency quasiperiodic in the sense that (4) has nontrivial integer solutions for s = 3. In previous work we have shown that these responses are structurally stable and are hierarchically organized in a similar fashion to phase-locked responses for n = 1. We extended the number-theoretical approach briefly described in Sect. 3.1 to the case of three-frequency resonances. In the following we review succinctly the main results, the details being beyond the scope of this article (see Cartwright et al. 1999b). If ω1 and ω2 are the two external frequencies and their frequency ratio can be approximated by a rational p/q, we can define a generalized Farey sum operation between two fractions of real numbers, say ωi /ri and ωj /rj , modifying the adjacency condition of (1). We say that the two fractions are adjacents if they satisfy |ωi rj − ωj ri | = |ω1 q − ω2 p|.
(5)
Now we can define the generalized mediant between these adjacents as ωi ωj ωi + ωj ⊕ = . ri rj ri + rj
(6)
Starting with ω2 /q and ω1 /p and recursively applying (6) we can construct a hierarchical structure of three frequency resonances. The first steps of this structure together with experimental results obtained with an electronic oscillator (see Calvo et al. 2000) are shown in Fig. 4.
174
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
Fig. 4. On the left we have plotted the third frequency (fi = 2πωi ) of a three-frequency resonance versus a control parameter (the DC offset of one of the external forces). The threefrequency resonances are obtained as structurally-stable responses of a nonlinear electronic oscillator driven by two independent external periodic forces. On the right we see the frequency values predicted by the generalized Farey sum operation. The external frequencies are fixed at 2100 Hz and 3600 Hz.
4 A Nonlinear Theory for the Residue We consider now the problem of the residue as a multiperiodic forcing of a generic nonlinear system that represents the auditory system. The simplest case for the stimulus is a complex sound consisting of only two partials, say k and k + 1, lying in the vicinity of successive multiples of some missing fundamental ω0 . Now we search for structurally stable solutions that could be associated with the residue. As we have seen, periodic solutions are structurally unstable to perturbations of the external stimulus. Two-frequency quasiperiodic solutions (difference combination tones) are unable to reproduce residue behaviour. Three-frequency quasiperiodic solutions are structurally unstable to perturbation of the system’s parameters (by the Newhouse–Ruelle–Takens theorem). There remain only two possibilities, threefrequency resonant solutions and chaotic ones. Bearing in mind the results of Sect. 3.2, we propose that the residue is associated with the third frequency in a three-frequency resonance formed by a frequency generated in the auditory system itself (in the vicinity of the missing fundamental ω0 ) and two external frequencies (in the vicinity of kω0 and (k + 1)ω0 , respectively). The vicinity of the external frequencies to successive multiples of some missing fundamental ensures that k/(k + 1) is a good rational approximation to their frequency ratio. Consequently, from the results of Sect. 3.2, ω2 /(k + 1) and ω1 /k are adjacents. With the aid of (6) we obtain the value of the third frequency in the three-frequency resonance of greatest width between them ω1 ω2 ω1 + ω2 ⊕ = . k k+1 2k + 1
(7)
Nonlinear Dynamics, the Missing Fundamental, and Harmony
175
Since the external frequencies can be written with equal detuning as ω1 = kω0 + Δω,
ω2 = (k + 1)ω0 + Δω,
(8)
the shift of the third frequency with respect to the missing fundamental is ΔP =
2Δω . 2k + 1
(9)
This equation gives a linear dependence of the shift on the detuning Δω, in accordance with the first pitch-shift effect (see Fig. 2). The predicted slope is 1/(k + 1/2), just in the middle between 1/k and 1/(k + 1). In Fig. 5 we have superimposed the behaviour of the corresponding three frequency resonances on the data of Fig. 2. The agreement is very good, explaining the first aspect of the second pitch-shift effect (Sect. 2). The second aspect can be interpreted as follows: the term 2Δω in (9) arises from two equal contributions Δω obtained by means of a uniform shift in the two forcing frequencies. If now, maintaining ω2 fixed we increase the distance to ω1 (we enlarge the spacing between successive partials) the first contribution remains constant and equal to Δω while the second diminishes, determining a decrease in the third frequency of the resonance and thus in the residue (see (9)).
Fig. 5. Plot of the predicted pitch shift effect (9) on the data of Fig. 2
The residue is not merely an acoustical curiosity. Its importance is shown in our ability to listen to music in a small transistor radio with negligible response to low frequencies. In fact, residue and music perception seem to be profoundly correlated. Moreover, the residue seems to play an important role in speech intelligibility. Hearing aids which furnish fundamental frequency information produce better scores in profoundly hearing impaired subjects than amplification (Faulkner et al. 1991). It is clear that an improvement in the knowledge about the basic mechanisms involved in pitch perception may allow a similar improvement in hearing aids through the implementation of analogue compensation processing (some kind of intelligent amplification) of the acoustical signals.
176
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
We have shown that three-frequency resonances, which are generated as structurallystable responses of forced nonlinear oscillators, can describe residue behaviour, that is, the pitch of complex sounds. We have also alluded to the fundamental role in the theory of nonlinear dynamical systems played by the golden mean. Next we explore how we can generate meaningful musical scales of different numbers of notes based on the golden mean.
5 The Golden Mean in Art and Science From antiquity humanity has sought through scientific enquiry a rational explanation of nature. As artworks were considered an imitation of nature, the same purpose has pervaded the history of the arts. The Pythagoreans were the first to put into mathematical terms the rules for aesthetics, borrowing them from music (Allott 1994). Later there arose the concepts of eurhythmy or commodulation: the application of rhythmical movements or harmonious proportions in a piece of music; a painting; a sculpture; a building; a dance. Throughout the Middle Ages, mathematical ideas of proportion lived side by side with the body of artistic activity, but during the Renaissance, the natural sciences and mathematics began a process of separation from the arts, both theoretically as well as in practical terms (James 1993). One of the reasons for the divorce was that all efforts failed to give a rational basis to the role played by numerical proportions in the aesthetics of an artwork. This lack of scientific rationale caused a rejection of works on numerical proportion in aesthetics by the scientific community, which began to consider writings in this area esoteric and unscientific. The divergence between arts and sciences grew wider in the twentieth century, with the end of the last movements retaining the ancient mathematical roots of art: neoclassicism and cubism. From this point on, the tendency of artists has been to consider that the mathematical design of an artwork implies an unacceptable constraint to creativity. If, in the future, the gulf between arts and sciences is to be reduced, this may come about through being able to understand in an objective fashion the phenomena that take place in our perceptual and nervous systems when we look at a painting (Zeki 1999), or listen to music. Some of these phenomena may be rooted in the fundamental role in the theory of nonlinear dynamical systems played by a particular number: the golden mean. There exist many scientific, technical, and even esoteric writings about the use of √ the golden section, Φ = (1 + 5)/2 = 1.618 . . ., and its companion φ = 1/Φ = Φ − 1 = 0.618 . . . in art (Ghyka 1977; Huntley 1970). There also exists a similar tradition regarding its role in science and technology (Schroeder 1990; Schroeder 1992). The number and some of its numerical properties were certainly known to the Greeks (Herz-Fischler 1998), and it was possibly the key to the Pythagorean discovery of irrational numbers through its geometrical application to the pentagram. Kepler described Φ as one of the ‘jewels of geometry’, while Leonardo da Vinci illustrated the book about Φ by Luca Pacioli, with whom da Vinci studied mathematics, which Pacioli entitled De Divina Proportione (Pacioli 2001). The name of golden mean, golden section, or golden number, however, may have first been ascribed to it in the 19th century (Markowsky 1992). The golden ratio Φ is found frequently in nature — in phyllotaxis, sea shells, seed heads, etc. — to what do we owe this ubiquity? The ancient Greeks argued for a
Nonlinear Dynamics, the Missing Fundamental, and Harmony
177
mathematical description of the world, and that numbers — the branch of mathematics now known as number theory — describe all things in the universe. They developed a theory of proportions as an explanation for our aesthetic perception of the universe and as a guide for the work of artists. A proportion is the equality of at least two ratios: r = a/b = c/d. This is termed a discrete proportion because the four elements are distinct. If two elements of the proportion coincide, the proportion becomes continuous. For example, if b= c, the proportion reads r = a/b = b/d, which has the √ solution b = ad, r = a/d, when b is known as the geometric mean of a and d. We can further simplify the proportion by making one element dependent on the other two. Given d = a + b, so the ratio of the smaller part a to the larger part b is the same as the ratio √ of the larger part b to the whole a + b, we√obtain only two possibilities for r: φ = ( 5 − 1)/2 = 0.618 . . ., and −Φ = −(1 + 5)/2 = −1.618 . . .; this is the geometric definition of the golden section. In art, the appropriate links between proportions of the parts and the whole gives to the artwork the quality of eurhythmy. Eurhythmy is currently more generally associated with arts that work in the time dimension, such as music or dance, but in antiquity it was used equally for the arts working with the spatial dimensions, such as painting, sculpture or architecture. Many artists have attempted to develop a parallelism between figurative and non-figurative arts; the writing of da Vinci on music and painting is famous. We can find such projects in modern painting also. Gino Severini, for example, tried to put musical rules into visual terms, while Paul Klee held, as did Goethe, that colour may be managed through a general theory of composition in the same way that sound is managed through the framework of musical theory: a sort of synthesis like that obtained in the works of Bach or Mozart. Less clear, however, is the contrary: the translation of visual aesthetics to the musical world (Huntley 1970; Lendvai 1966). Table 1. Names and frequency ratios of the currently accepted harmonic intervals in Western music in descending order of consonance Unison Invariant 1/1 Fifth Invariant 2/3 Octave Invariant 1/2 Fourth Mixed 3/4 Minor Third Variable Minor Sixth Variable
Major Sixth Variable 3/5 Major Third Variable 4/5 5/6 5/8
Western science was born with the Pythagoreans, who developed the first mathematical model of a physical problem. This starting point also coincides with the start of rational studies of music, because the Pythagoreans developed a musical theory: that of harmonic musical intervals. Legend tells how Pythagoras entered a smithy and heard the noise of hammers of different masses working a great piece of incandescent iron. Some of the hammers striking simultaneously produced harmonious sounds. This motivated Pythagoras to study musical harmony with different tuneable instruments. In this way he identified at least the principal harmonic musical intervals: the unison, the octave, the perfect fifth and the fourth. His principal observation was that some simple numerical relationships defined these intervals (see Table 1 for the list of harmonic intervals currently accepted in Western music). Of course these numbers depend on the
178
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
physical variables chosen to represent the sounds, but in time it emerged that the fundamental magnitude related to harmony is frequency. Fortunately, many numerological approaches maintain their validity because they work with the lengths of strings, since ratios obtained with these lengths are just the inverse of frequency ratios (a string fixed at both ends oscillates at a fundamental frequency inversely proportional to its length). Pythagorean ratios were quickly utilized for the construction of a musical theory. This musical theory was based fundamentally on the construction of a musical scale: the Pythagorean musical scale.
6 The Need for Musical Scales As a first approximation we can say that any frequency can be assigned a pitch, that is, a comparative sensation that allows us to say that a sound is higher or deeper than another. However, because there is a continuum of frequencies in any finite interval, there is an infinity of possible pitches. We should point out a couple of caveats: first, pitch can be ascribed directly to frequency only for pure tones (sounds that contain only one frequency in their spectrum), and for a definite intensity; second, the ear does not have an infinite resolving power, and thus two pure tones sufficiently close in their frequencies are judged to be of the same pitch. However, the resolving power is sufficiently high to be considered a continuum for the frequency values of the notes in any practical musical scale. For example, a semitone is given by a distance of 100 cents in the equaltempered scale of twelve notes — there being 1200 cents in an octave — but the ear can distinguish a substantially lesser interval: the just noticeable difference limen is as little as three to four cents at 1000 Hz. What then is the need for musical scales? A practical demonstration cannot give us the complete answer but can convince us of the practical necessity of a discretization of the octave into notes. If we take a known melody and replace the interval between notes by a continuum glissando the melody loses all its musical attractiveness and can become unidentifiable, despite the existence of the fixed frequency clues of the limit of the original intervals. This problem has long been recognized in practical terms and also, by the Pythagorean school at least, in theoretical terms. The Pythagorean scale can be obtained by successive applications, ascending or descending from a tonic, of the interval of the perfect fifth. The notes obtained in this way must be replaced by their octave equivalents in order to have all the notes in the same octave. The Pythagorean process, however, has a problem because it never ends: an integer number of fifths never coincides with any other integer number of octaves; in number-theoretical terms, the problem is that the Diophantine equation 2x = 3y has no integer solutions. The essence of the Pythagorean scale is the preservation of harmonic intervals, mainly the fifth and the octave. From Pythagoras up to the present day, many musical scales have been developed that try to accommodate the desire for harmonic intervals with the reality that they do not fit within the octave, the most important being the equal-tempered scale of twelve notes. Equal-tempered scales are defined by irrational numbers, and do not exactly preserve any of the harmonic intervals of Table 1 except for the octave, but, for some particular number of notes, they approximate them.
Nonlinear Dynamics, the Missing Fundamental, and Harmony
179
7 The Golden Scales The construction of a musical scale is then a problem involving approximating irrational numbers by rationals. The mathematical technique to obtain the best such approximations is well known, and consists of writing the irrational number as a continued fraction (Hardy and Wright 1975). The golden mean φ has the continued-fraction expansion √ 5−1 1 φ= , = 1 2 1+ 1 1+ 1 + ... and the best rational approximations to φ are given by the convergents of this infinite continued fraction, arrived at by cutting it off at different levels in the expansion: 1/1, 1/2, 2/3, 3/5, 5/8, 8/13, and so on; the convergents of the golden mean are ratios of successive Fibonacci numbers.
Fig. 6. The octave interval, defined by the notes of frequency 1/2, the tonic, and 1/1, its superior octave, is divided by its geometric mean 1/2 as shown. The interval is defined by the first two convergents of the golden number, 1/1 and 1/2, to which we have added the next convergent, 2/3. However, this breaks the symmetry of the scale. There exists another solution which consists of the permutation of the short and long intervals defined by 2/3, i.e. 3/4. This solution can be viewed as that symmetric to 2/3 through the symmetry axis 1/2. Symmetry is meant here in the Greek sense, that is, as an equality of ratios, i.e. (2/3)/ 1/2 = 1/2/(3/4). If we take logarithms of all quantities the symmetry becomes the usual sort and the geometric mean, 1/2, can be viewed as a mirror.
Most musical scales are discretizations of the octave. The octave interval is such that the sensations produced by two notes separated by an octave are very similar, and harmonious when sounded simultaneously. This is independent of cultural roots or specific musical training, and is a shared characteristic that seems to be linked to human physiology. As the octave is an interval defined by the first and second convergents 1/1 and 1/2 of the golden number, we can attempt to construct a scale by continuing the series, adding the succeeding convergent of the golden mean 2/3. The choice of a note x in the octave interval (1/2, 1) satisfies the minimal condition to have a proportion: we have three elements (1/2, x, 1) that define two ratios, a = 1/(2x) and b = x. However, the introduction of this rational number, 2/3, breaks the symmetry of the interval because there are now two ratios defined, a = (1/2)/(2/3) = 3/4 and b = (2/3)/1 = 2/3. This is to say that there is a hidden solution that corresponds to the permutation of the intervals. If we equate the two ratios, a = b, this gives for x the geometric mean of 1/2
180
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
and 1: x = 1/2. For the geometric mean the two ratios are equal; for the rational 2/3 there is one interval greater than the other and the permutation corresponds to the exchange of these. If we include this hidden solution, 3/4, we reestablish the symmetry as if a mirror were placed at the geometric mean (see Fig. 6). This palindromic character for a musical scale was first proposed by Newton in his notebooks written between 1664 and 1666. Newton pursued this idea further and presented in Opticks (Newton 1952) the visible optical spectrum divided into ratios corresponding to those of a musical scale, with the divisions in the form of a palindrome. At this point we can generalize our procedure. For this it is sufficient to notice that the first note included, the new rational approximant to the golden mean, creates a new interval, (1/2, 2/3), which, as before, can be divided by a geometric mean which, in turn, can be approximated by a rational that corresponds to the succeeding approximant of the golden mean. This choice breaks the symmetry, which can be reestablished through the image of this approximant in the geometric mean mirror of this interval 1/3, and further, its image in the previous mirror 1/2. At the next level — including the convergent 3/5 — this construction gives us a pentatonic golden scale; C (1/2), D (3/5), F (2/3), G (3/4), A (5/6), and C (1/1). Now we need only a rule for proceeding in the subdivision of the interval: the maximum and minimum values for the intervals between successive notes. We can see in Fig. 7a, which shows the procedure performed until the third level (including the convergent 5/8), that all the intervals except one fall in a band determined by the ratios between convergents of the golden number. The greatest interval is that including the first geometric mean 1/2. If we seek to subdivide this interval further we find that there is no rational solution that preserves the palindromy inside the band. Thus the unique possible choice is the irrational geometric mean itself. This is curious, because we are forced to choose a note that is essentially different to the others, having an irrational interval. The scale at which we have arrived consists of twelve notes; the same number of notes as has the equal-tempered scale now in use (see Fig. 7b). Moreover, the golden scale construction has generated all the harmonic intervals currently accepted by Western music (Table 1). Because of the equal numbers of notes, we can give the same names to the golden scale notes as their equal-tempered counterparts and compare their dispersion; see Table 2. It is intriguing that the irrational note corresponds to the interval C to F , which has long been a problem in musical theory because of its ambiguity: being difficult to define as consonant or dissonant. Because of this it has been named the ‘diabolus in musica’; in our construction it is certainly an irrational devil! In Fig. 8 we have calculated the mean quadratic dispersion as a function of the number of notes for an arbitrary equal-tempered scale. This is an indication of how well the harmonic intervals listed in Table 1 are simultaneously approximated by a given scale. We find a marked minimum at twelve notes, and in order to better this the number of notes must rise to nineteen. Contrary to what one might naively expect, simply raising the number of notes or, equivalently, diminishing the interval between adjacent notes, does not automatically achieve a better approximation to the harmonic intervals. As a consequence, the number of notes of an equal-tempered scale must be determined by this condition and cannot be arbitrarily chosen. In Fig. 8 we can see that the function also has a significant minimum for thirty-four notes, and if we continue the construction
Nonlinear Dynamics, the Missing Fundamental, and Harmony
181
Fig. 7. (a) The golden scale construction developed until the fifth convergent c5 = 5/8 (upper panel), and the intervals between adjacent notes (lower panel). We can see that the intervals are distributed in a band. If we take as a rule that the intervals cannot be greater than the quotient of convergents cn−2 /cn+1 , in this case c3 /c6 = (2/3)/(8/13) = 13/12 = 1.08, or less than that of the convergents cn /cn−1 , here c5 /c4 = (5/8)/(3/5) = 25/24 = 1.04, we find that the anomalous interval 1.13, between 2/3 and 3/4, must be subdivided once. However there is no solution to this problem in rational numbers, because the inclusion of a rational number and its image generates at least one interval less than 1.04. The only possibility is thus to include the irrational axis 1/2 itself. (b) The result of including the irrational axis. We can see that all the intervals are now within the previously defined band. As the number of notes coincides with the number of notes of the usual equal-tempered scale of twelve notes, we have given the same names to the notes of this golden scale.
Table 2. Comparison of the notes of the twelve-note equal-tempered scale with those of the golden scale with the same number of notes Note Equal-Tempered Twelve-Note Difference Difference Scale (Hz) Golden Scale (Hz) (%) (cents) C8 Do8 4186.00 4186.00 0.00 0.0 C Do 4434.92 4465.07 0.68 -11.7 D Re 4698.64 4651.11 -1.01 17.6 4978.03 5023.20 0.90 -15.6 D Re E Mi 5274.04 5232.50 -0.78 13.7 F Fa 5587.65 5583.33 -0.11 2.0 F Fa 5919.90 5919.90 0.00 0.0 G Sol 6270.96 6279.00 0.12 -2.0 6644.87 6697.60 0.79 -13.7 G Sol A La 7040.00 6976.67 -0.89 15.6 7458.62 7534.80 1.02 -17.6 A La B Si 7902.13 7848.75 -0.68 11.7 C9 Do9 8372.00 8372.00 0.00 0.0
of the golden scale one step further we find a scale with thirty-four notes (Table 3). Our golden scale construction, then, provides scales with optimal numbers of notes to best preserve the harmonic intervals.
182
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
In axiomatic terms, the construction of the order n scale from the order n − 1 scale can be summarized thus: first, include the next convergent of the golden section, cn . Construct the geometric mean of the interval (cn−1 , cn ) and its reflections in the previous geometric mean mirrors. Include all the possible reflections of the convergents obtained up to this point in the geometric mean mirrors, following the rule that an interval may not be greater than cn−2 /cn+1 , nor less than cn /cn−1 (these ratios are to be inverted depending on whether n is odd or even, so that they are always greater than one). If an interval remains too large after including all possible rationals, then it must be subdivided by the irrational geometric mean until the rule is satisfied. The completed scale should be palindromic. There is very little arbitrary in the construction of these scales: everything comes given by just one number, the golden section. The original notes are convergents of the golden section, the admissible intervals are quotients of convergents of the golden section, and the symmetry axes are geometric means between neighbouring convergents of the golden section. With the exception of the first in the series, the pentatonic golden scale, the golden scales are not just, with all intervals rational, but neither are they equal-tempered, with all intervals irrational. As they include both rational and irrational intervals, we may term them mixed scales.
8 Playing and Transposing with Golden Scales in Equal Temperament As with any other non-equal-tempered scale, the golden scales cause problems for transposition. The golden scale of twelve notes is interesting for compositional purposes since it has the same number of notes as the usual equal-tempered one, while some of the notes deviate appreciably from the corresponding equal-tempered ones. An interesting way to use this scale while maintaining the possibility of transposition is to approximate it by a subset of an equal-tempered scale of a greater number of notes. As the next step in the golden scale construction gives a thirty-four-note scale, and because this scale contains all the intervals of the twelve-note one, we can approximate the notes of the latter with the notes of an equal-tempered scale of thirty-four steps. As we can see in Fig. 8, this choice is a better approximation to the harmonic intervals, in the sense of having a smaller mean quadratic dispersion. In Table 4 we show the notes of the thirty-four note equal-tempered scale that approximate the corresponding notes in the twelve-note golden scale, and the differences between them expressed in cents. The maximum deviation is of the order of six cents, very near to the just noticeable difference limen. Thus, the thirty-four-note equal-tempered scale can be used as a very good approximation to the golden one, with the benefit that in the equal-tempered scale a musical composition can be transposed without difficulty. Moreover, we can change tonality within microtonal intervals, by going to a non-twelve interval available in the thirty-four-note scale. This is a general principle: we can play an order n − 1 golden scale as a subset of the order n one and, for sufficiently high n the order n scale can be approximated by an equal-tempered one with the same number of notes. Thus, the order n − 1 scale can be played with transposition in this latter scale with the additional possibility of microtonal change of tonality. The example above demonstrates that, for practical applications, it is not necessary to raise further the number of notes;
Nonlinear Dynamics, the Missing Fundamental, and Harmony
183
Table 3. The golden scale construction carried out up the the sixth convergent of the golden mean, c6 = 8/13, gives us a thirty-four-note golden scale. This contains within it the whole twelvenote golden scale; the additional notes are reflections of convergents of the golden section, plus irrational notes from the inclusion of mirrors at the geometric means of the intervals. It is hence a mixed scale with both rational and irrational intervals. From a musical viewpoint, this allows one to play with consonance, dissonance, and tonality. Note Interval 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
1/2 20/39 25/48 8/15 13/24 5/9 √ √ 4 5/(3 3) √ 1/ 3√ √ 4 3/ 5 3/5 8/13 5/8 16/25 13/20 2/3 √ √ 2√3 3/3 3 1/√ 3 2 1/ √ 3 3/2 √ √ 3/(2 2 3 3) 3/4 10/13 25/32 4/5 13/16 5/6 √ √ 4 √5/(2 3) 3/2 √ √ 3 4 3/(2 5) 9/10 12/13 15/16 24/25 39/40 1/1
Associated Note of Thirty-Four-Note Twelve-Note Scale Golden Scale (Hz) C8 Do8 4186.00 4293.33 4360.42 C Do 4465.07 4534.83 D Re 4651.11 4741.47 4833.58 4927.48 D Re 5023.20 5152.00 E Mi 5232.50 5358.08 5441.80 F Fa 5581.33 5691.98 5804.82 F Fa 5919.90 6037.26 6156.94 G Sol 6279.00 6440.00 6540.63 G Sol 6697.60 6802.25 A La 6976.67 7112.20 7250.36 7391.21 A La 7534.80 7728.00 B Si 7848.75 8037.12 8162.70 Do9 8372.00 C9
the thirty-four-note scale (n = 6) is at the threshold of our sensorial pitch sensitivity for a just generation of all the harmonic intervals that we have considered relevant to the construction of a useful musical scale.
184
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
Fig. 8. Mean quadratic dispersion σ as a function of the number of notes in an equal-tempered scale. This number is the square of the difference between the note of the equal-tempered scale that best approximates each harmonic interval, multiplied by the relative weight of each interval and summed over all the intervals. The weights of the intervals are set such that the fifth weighs more than the fourth, which weighs more than the major third and major sixth, which weigh more than the minor third and minor sixth. σ is then an indication of the degree to which a given equaltempered scale approximates all the harmonic intervals of Table 1. There is a marked minimum for the usual twelve-note scale which coincides with the number of notes of the golden scale (the fifth convergent of the golden number). To obtain a better value, the number of notes must rise to nineteen. The two following minima are at thirty-one and thirty-four notes, and the latter value coincides with the number of notes of the golden scale developed until the sixth convergent of the golden number, 8/13.
9 Can Our Senses Be Viewed as Generic Nonlinear Systems? We have shown that we can construct meaningful musical scales based solely on number-theoretical properties of the continued fraction development of the golden number and its convergents. But is the role of the golden number in musical aesthetics a coincidence? The development of dynamical systems theory is changing our view about nonlinear phenomena in nature. We have mentioned the cultural hypothesis which considers that the role of the golden number in aesthetics is due to the ubiquity of this number in natural phenomena. It is now clear that in many cases this role in natural phenomena is due to underlying dynamical mechanisms (Ball 1999). Number theory in general, and certain numbers such as the golden number in particular, play important parts in the dynamics of nonlinear systems (Gonz´alez and Piro 1983; Cartwright et al. 1999b). To give just one example, patterns seen in phyllotaxis and in the generation of Fibonacci spirals have been reproduced in a dynamics experiment on the organization of ferrofluid drops in a silicone oil (Douady and Couder 1992). Musical scales are constructed around musical intervals, which may be consonant or dissonant (Table 1). Here we must be careful to distinguish the concept of musical consonance from that of psychoacoustical consonance (Plomp and Levelt 1965); psychoacoustical consonance makes use of the idea of roughness, but many observations about the consonance of musical intervals cannot be explained on those grounds. The first to put forward an explanation for musical consonance was Rameau (Rameau 1722). In his theory of harmony, Rameau assumed that musical chords conveyed informa-
Nonlinear Dynamics, the Missing Fundamental, and Harmony
185
tion about a fundamental sound: a bass note representing the tonal meaning of the chord. Related ideas are Rieman’s aural subharmonics (Riemann 1903), and those that have their origins in Tartini’s third tone (Tartini 1754). More recently, Terhardt gave fresh impetus to the theory of fundamental bass, proposing that the psychoacoustical phenomenon of virtual or residue pitch may be ascribed to it (Terhardt 1974). However, Terhardt’s ideas lack a clear connection between the physical parameters of the sound and the virtual pitch response. Except for von Helmholtz’s ideas on virtual pitch (von Helmholtz 1863), which make use of combination tones, other theories of the phenomenon show the same lack of physical significance. Recently we proposed a new theory of residue perception, based on nonlinear dynamics (Cartwright et al. 1999a; Cartwright et al. 2001). Following the line of reasoning of Terhardt, this becomes ipso facto a physical explanation for musical consonance. Our theory is based on a type of dynamical attractor termed a three-frequency resonance. These resonances are hierarchically organized following rules borrowed from number theory and confirmed through simulation and experiment (Cartwright et al. 1999b). In this hierarchical ordering, a central part is played by the generalization of a number-theoretical operation known as the Farey sum, which also plays a central role in the organization of synchronized responses in periodically forced oscillators (Gonz´alez and Piro 1983). There, the Farey sum leads to a privileged role for the golden section. Table 4. The notes of the thirty-four-note equal-tempered scale that approximate the corresponding notes in the twelve-note golden scale, and the differences between them expressed in cents Twelve-Note Thirty-Four Note Difference Golden Scale Equal-Tempered Scale (cents) C Do 0 0 C Do 3 5 D Re 5 6 9 -2 D Re E Mi 11 -2 F Fa 14 4 17 0 F Fa G Sol 20 -4 23 2 G Sol A La 25 2 29 -6 A La B Si 31 -5 C Do 34 0
We propose that following our theory of residue perception, musical consonance may be explained in physical terms. The auditory system is a very complex and highly nonlinear dynamical system, so we expect that universal dynamical attractors may convey a perceptual and functional meaning in neural processing. Universal dynamical attractors of interest for pitch perception, that is three-frequency resonances, are organized by means of a number-theoretical operation, the generalized Farey sum, which implies a privileged role for the golden section in their hierarchical organization. The
186
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
part the golden section plays in the hierarchical organization of musical intervals, outlined in this paper, may then be a consequence of the dynamical ordering pointed out above at the level of neural processing in the auditory system. A final hypothesis can be proposed: the tonal meaning and the relative consonance of a musical chord may be described by the stability of a dynamical attractor which represents the residue pitch. This idea is quantitatively testable, because this stability can be measured through different dynamical indicators. Our theory for the pitch perception of complex sounds by the human auditory system demonstrates that the auditory system’s response to musical sounds is compatible with the universal response of a nonlinear dynamical system to such stimuli. Because neuronal networks are very complex dynamical systems, this is not such an unexpected result. It may be on this basis that the presence of the golden number in musical aesthetics can be explained: harmonic intervals are another manifestation of the universal nonlinear behaviour associated with pitch perception. The same phenomena may occur at the level of the visual system, because object identification appears to correspond physiologically to synchronization of neuron populations to a given frequency (Engel et al. 1992). The presence of different elements in an image might be then detected through different neuronal groups that synchronize to different frequencies. And this returns us to the premise of our theory: the nonlinear interaction of two or more frequencies produces resonances that are hierarchically arranged in a manner described by the golden mean. Thus, the more recent results take us nearer to the more ancient theories, to the Pythagorean dogma that all the universe is described by numbers and rhythms (Allott 1994) — in modern terms number theory and dynamics — and that nature is from all points of view similar to itself — in modern terms universality. We may conclude with the words of the Gothic architect Jean Vignot on the continuation of the work on Milan cathedral in 1392: “Ars sine scientia nihil est”.
References Allott, R.: The Pythagorean perspective: The arts and sociobiology. J. Social and Evolutionary Syst. 17, 71–90 (1994) Aronson, D.G., Chory, M.A., Hall, G.R., McGehee, R.P.: Bifurcations from an invariant circle for two-parameter families of maps of the plane: A computer-assisted study. Commun. Math. Phys. 83, 303–354 Arrowsmith, D.K., Cartwright, J.H.E., Lansbury, A.N., Place, C.M.: The Bogdanov map: Bifurcations, mode locking, and chaos in a dissipative system. Int. J. Bifurcation and Chaos 3, 803–842 (1993) Ball, P.: The Self-Made Tapestry: Pattern Formation in Nature. Oxford University Press, Oxford (1999) von B´ek´esy, G.: Experiments in Hearing. McGraw-Hill, New York (1960) Calvo, O., Cartwright, J.H.E., Gonz´alez, D.L., Piro, O., Sportolari, F.: Three-frequency resonances in coupled phase-locked loops. IEEE Trans. Circuits and Systems I 47, 491–497 (2000) Cartwright, J.H.E., Gonz´alez, D.L., Piro, O.: Nonlinear dynamics of the perceived pitch of complex sounds. Phys. Rev. Lett. 82, 5389–5392 (1999a) Cartwright, J.H.E., Gonz´alez, D.L., Piro, O.: Universality in three-frequency resonances. Phys. Rev. E 59, 2902–2906 (1999b)
Nonlinear Dynamics, the Missing Fundamental, and Harmony
187
Cartwright, J.H.E., Gonz´alez, D.L., Piro, O.: Pitch perception: A dynamical-systems perspective. Proc. Natl. Acad. Sci. USA 98, 4855–4859 (2001) Cohen, M.A., Grossberg, S., Wyse, L.L.: A spectral network model of pitch perception. J. Acoust. Soc. Am. 98, 862–878 (1995) Cvitanovi´c, P., Shraiman, B., S¨oderberg, B.: Scaling laws for mode lockings in circle maps. Physica Scripta 32, 263–270 (1985) Douady, S., Couder, Y.: Phyllotaxis as a physical self-organized growth process. Phys. Rev. Lett. 68, 2098–2101 (1992) Engel, A.K., K¨onig, P., Kreiter, A.K., Schillen, T.B., Singer, W.: Temporal coding in the visual cortex: New vistas on integration in the nervous system. Trends in Neurosciences 15, 218– 226 (1992) Faulkner, A., Ball, V., Rosen, S., Moore, B.C.J., Fourcin, A.: Speech pattern hearing aids for the profoundly hearing impaired: Speech perception and auditory abilities. J. Acoust. Soc. Am. 91, 2136–2155 (1991) Ghyka, M.C.: The Geometry of Art and Life. Dover (1977) Goldstein, J.L.: An optimum processor theory for the central formation of the pitch of complex tones. J. Acoust. Soc. Am. 54, 1496–1516 (1973) Gonz´alez, D.L., Piro, O.: Symmetric kicked self oscillators: Iterated maps, strange attractors and the symmetry of the phase locking Farey hierarchy. Phys. Rev. Lett. 55, 17–20 (1985) Gonz´alez, D.L., Piro, O.: Chaos in a nonlinear driven oscillator with exact solution. Phys. Rev. Lett. 50, 870–872 (1983) Hao, B.L.: Elementary Symbolic Dynamics and Chaos in Dissipative Systems. World Scientific, Singapore (1989) Hardy, G.H., Wright, E.M.: An Introduction to the Theory of Numbers, 4th edn. Oxford University Press, Oxford (1975) von Helmholtz, H.L.F.: Die Lehre von dem Tonempfindungen als physiologische Grundlage f¨ur die Theorie der Musik. Braunschweig (1863) Herz-Fischler, R.: A Mathematical History of the Golden Number. Dover (1998) Huntley, H.E.: The Divine Proportion: A Study In Mathematical Beauty. Dover (1970) Huygens, C.: Observation a faire sur le dernier article de precedent journal, o`u il est parl´e de la concordance de deux pendules suspendu¨es a` trois ou quatre pieds l’une de l’autre. J. des Scavans (12 (23 March)) (1665); Huygens’ notebook is reprinted in (Huygens 1888–1950) Huygens, C.: Œuvres Compl`etes de Christiaan Huygens, vol. 17, p. 185. Societ´e Hollandaise des Sciences (1888–1950) Huygens, C.: Extrait d’une lettre escrite de La Haye, le 26 fevrier 1665. J. des Scavans (11 (16 March)) (1665); See the correction published in the following issue (Huygens 1665) Jackson, E.A.: Perspectives of Nonlinear Dynamics, vol. 1. Cambridge University Press, Cambridge (1989) Jackson, E.A.: Perspectives of Nonlinear Dynamics, vol. 2. Cambridge University Press, Cambridge (1990) James, J.: The Music of the Spheres. Grove Press (1993) Lauterborn, W., Parlitz, U.: Methods of chaos physics and their application to acoustics. J. Acoust. Soc. Am. 84, 1975–1993 (1988) Lendvai, E.: Duality and synthesis in the music of B´ela Bart´ok. In: Kepes, G. (ed.) Module, Proportion, Symmetry, Rhythm. George Braziller (1966) Markowsky, G.: Misconceptions about the golden ratio. Coll. Math. J. 23, 2–19 (1992) Meddis, R., Hewitt, M.J.: Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification. J. Acoust. Soc. Am. 89, 2866–2882 (1991) Newhouse, S.E., Ruelle, D., Takens, F.: Occurrence of strange axiom A attractors near quasiperiodic flows on T m , m ≥ 3. Commun. Math. Phys. 64, 35–40 (1978)
188
J.H.E. Cartwright, D.L. Gonz´alez, and O. Piro
Newton, I.: Opticks. Dover (1952) ¨ Ohm, G.S.: Uber die definition des tones, nebst daran gekn¨ufter theorie der sirene und a¨ hnlicher tonbildender vorichtungen. Ann. Phys. Chem. 59, 513–565 (1843) Pacioli, L.: Divine Proportion. Arabis Books (2001) Parlitz, U., Lauterborn, W.: Period-doubling cascades and devil’s staircases of the driven van der Pol oscillator. Phys. Rev. A 36, 1428–1434 (1987) Plomp, R., Levelt, W.J.M.: Tonal consonance and critical bandwidth. J. Acoust. Soc. Am. 38, 548–560 (1965) van der Pol, B., van der Mark, J.: Frequency demultiplication. Nature 120, 363–364 (1927) Rameau, J.-P.: Trait´e de l’harmonie, Paris (1722) Riemann, H.: Catechism of Orchestration, London (1903) Schroeder, M.R.: Number Theory in Science and Communication. Springer, Heidelberg (1990) Schroeder, M.R.: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W. H. Freeman, New York (1992) ¨ August, S.: Uber die definition des tones. Ann. Phys. Chem. 60, 449–481 (1843) Schouten, J.F., Ritsma, R.J., Cardozo, B.L.: Pitch of the residue. J. Acoust. Soc. Am. 34, 1418– 1424 (1962) Tartini, G.: Trattato di Musica. Padua (1754) Terhardt, E.: Pitch, consonance, and harmony. J. Acoust. Soc. Am. 55, 1061–1069 (1974) Wightman, F.L.: The pattern transformation model of pitch. J. Acoust. Soc. Am. 54, 407–416 (1973) Zeki, S.: Inner Vision. Oxford University Press, Oxford (1999)
Dynamic Excitation Impulse Modification as a Foundation of a Synthesis and Analysis System for Wind Instrument Sounds Michael Oehler1 and Christoph Reuter2 1
University for Music and Drama Hanover, Germany [email protected] 2 Musicological Department, University of Cologne, Germany [email protected]
Abstract. The Variophon is a wind synthesizer that was developed at the Musicological Institute of the University of Cologne in the 1970/80ies and was at that time based on a completely new synthesis principle: the pulse forming process. The central idea of that principle is that every wind instrument sound can basically be traced back to its excitation pulses, which independently of the fundamental always act upon the same principles. In a recent project, supported by the German Research Foundation (DFG), the synthesis method of excitation impulse modification has been transferred to a digital platform. The aim of the software-based modelling is twofold: creating an experimental system for analyzing and synthesizing (wind) instrument sounds, as well as building a synthesizer, that would be an alternative to comparable wind instrument synthesis applications. On the one hand this sound synthesis technique accounts for the place where the sound is generated, on the other hand only a single breath controller is required to produce all the sound-nuances that are possible on a real instrument. First of all the analogue circuits of the different instrument modules of the Variophon will be mapped onto a digital representation by means of the analogue circuit simulation software LTSpice. After the original algorithms have been analysed, the Digital Variophon will be rebuilt in the modular Reaktor environment by Native Instruments (NI). Finally the experimental system will be validated by means of a prototypical perception experiment.
1 Introduction The pulse forming principle as a synthesis method for wind instrument sounds was developed in the 1970ies at the Musicological Institute at the University of Cologne, Germany, by Jobst Fricke (Fricke 1975). The pulse forming can be seen as a kind of alternative to the better known physical modelling, where the emphasis is put on the excitation function. The central idea of that principle is, that every wind instrument sound can basically be put down to its excitation pulses, which independently of the fundamental tone always act upon the same principles, and in which Karl Erich Schumann’s Principles of Timbre are reflected (Schumann 1929, 15-18, 98, 100). In 1975 Fricke (1975, 407) discovered the principles of generating wind instrument-like spectra with typical stable formant areas and spectral gaps evoked by the excitation T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 189–197, 2009. © Springer-Verlag Berlin Heidelberg 2009
190
M. Oehler and C. Reuter
pulses of double-reeds or lips. Voigt (1975, 51, 54) and Schmitz subsequently constructed a wind instrument synthesizer (Realton Variophon), that was entirely based on the pulse forming principle.1
2 Cyclical Spectra Constant opening or closing times of the reeds and lips are the basic condition for stable formant areas, independent of the pulse frequency. That means, while the fundamental frequency may vary, the pulse widths have to remain the same. Besides the pulse width, there are several other factors influencing the resulting cyclical spectra (Fricke 1989, 115f.; Reuter 1995, 75-84). Rather small changes of the pulse width or the pulse form may cause a modification of the relevant spectrum and therewith an audible change of timbre.
(a)
(b) Fig. 1. (a) Constant pulse width (/T =1/10) and the resulting spectral envelope. (b) Constant pulse width (/T =1/5) and the resulting spectral envelope. 1
Parts of this paper have been presented in May 2007 on the first International Conference on Mathematics and Computation (MCM2007), Berlin, entitled New Directions in Wind Instrument Synthesis – the digital pulse forming.
Dynamic Excitation Impulse Modification
191
A comparison of the excitation pulses in the time domain with corresponding spectra in fig.1a and fig.1b show that the ratio of the pulse width and the cycle T determines the spectral gaps in the frequency domain. Having a pulse width of and a cycle of T one can find the spectral gaps at the partials n·(T/) with n ∈ N. This corresponds to Karl Erich Schumann’s principle of formant areas: Independent from the pitch and without the necessity of any bandpass filter, the position of the spectral minima and maxima remain constant because of a constant width of the excitation pulses (only at low frequencies a low pass filter effect can be found, caused by the sound-hole or bell, which in many cases is too small for the radiation of the lowest frequencies).
Fig. 2. Constant triangle pulse width (/T =1/10) and the resulting spectral envelope (after Voigt 1975, 53)
In fig.2 the pulse forming principle is schematically illustrated using an isosceles triangle pulse. Having a pulse width of and a cycle of T the spectral gaps can be found at the partials 2n·(T/) with n ∈ N. If T/ is no integer, the following functions describe the spectral distribution of energy for square (1) and isosceles triangle pulses (2).
(1)
(2)
|Ck| is the amplitude of partial k, F the pulse area and ko describes the ordinal number, for which the partial’s amplitude becomes a minimum for the first time (see Wagner 1947, 70; Voigt 1975, 52). This demonstrates the spectral gaps’ dependency on the exact pulse form. Concerning triangle pulse chains, Auhagen supplied evidence, that not the described ratio of pulse width and cycle duration is responsible for the spectral gaps, but the partition ratio of the rising and falling edges as shown in fig.3 (Auhagen 1987, 710).
192
M. Oehler and C. Reuter
Fig. 3. Triangle pulses (after Auhagen 1987)
That way, triangle pulses with different widths could produce spectra with the same spectral gaps, if the duration of one edge as a common divisor is kept constant. If t1 t2-t1, the gaps are unequally distributed, as it is shown in fig.4 (see Auhagen 1987, 710).
Fig. 4. Triangle pulse (t2 = 1/2,7 T and t2-t1 = 1/6,2 T) and the resulting spectral envelope (after Auhagen 1987, p.711)
Playing a tone on a double reed instrument in ff causes the rising and falling edges of the pulses, that represents the transition from opening to closing movements of the reeds and vice versa, to be less smooth than they are when playing a tone in pp. As can be seen in fig.5, a pp played tone has not only a more rounded shape than a ff played tone, but the higher partials are also less prominent than in the spectrum of a ff tone.
Dynamic Excitation Impulse Modification
193
With other words: The more squared the shape of the pulses, the stronger are the amplitudes of the higher partials. The shorter the width of the pulses, the more the spectral gaps and formant areas can be found at the higher partials. This corresponds to the principle of shifting and skipping formant areas found by Karl Erich Schumann.
Fig. 5. Constant pulse width (/T =1/10) played pp and the resulting spectral envelope
3 Synthesis and Analysis Framework The electronic technician Jürgen Schmitz saw the potential and possibilities of these pulse forming principles and he began to develop a wind synthesizer based on electronic pulse chains with constant widths modulated by a breath controller. Using that technology, he created the first prototype in 1975, the Martinetta, followed by the Variophon that was manufactured by the Realton Company in Euskirchen since 1987. The Variophon was the first and only synthesizer on the basis of this natural pulse forming process (see Enders 1985). To keep the idea and the electronic circuit secret, he encapsulated the circuit boards of each instrument modules into epoxy resin. Unfortunately discrepancies between the shareholders and the resignation of Schmitz caused the ruin of the Realton Company and also stagnancy in the scientific work on this field (exceptions: the experiments of Wolfgang Auhagen (1987) and Johannes Blens (1993)). The synthesis algorithms were locked in the epoxy resin encapsulated black boxes: How did it work? What was the electronic circuit which is the core of the whole synthesis principle and that is covered unresolved below this layer until now? In order to construct a synthesis and analysis framework for wind instrument sounds it was necessary to decode the algorithms of each single instrument module. 3.1 The Digital Variophon In a recent project, supported by the DFG, the Variophon algorithms were transferred onto a digital platform (see Oehler & Reuter 2006). This was the initial point to improve the sound production process, based on the pulse forming principle, since a software-based version makes it possible to bypass some restrictions, resulting from the limited technical feasibility at that time, for example to synthesize the excitation
194
M. Oehler and C. Reuter
impulses of original instruments by means of cosinusoidal or polygonal impulses, where the rising and falling edges of the impulses can be adjusted freely. An important step in the rebuilding process was made, when diagrams of the circuit boards together with some unencapsulated instrument modules could be found and analysed. Now it was possible to measure the shape of the originally used pulses as well as the sound transformation in dependence of frequency and dynamic at every single point of the circuit.
Fig. 6. Variophon with supplies (1979)
To be able to modify any parameter of the circuit for the purpose of analysing the influence of the different units, it was necessary to map the analogue circuit onto a digital representation by means of an analogue circuit simulation software. To this end the switching regulator design program by Linear Technology, SwitcherCAD III and the circuit simulation engine LTSpice were used. 3.2 Formalisation In the next step, the general framework of the pulse forming core on the circuit boards of the different instrument modules was first analysed, then formalised within the circuit simulation engine. The formalisation is necessary to transfer the non-realtime model created in LTSpice to a realtime playable model in NI Reaktor. The formalisation process is described in the following for the bassoon instrument module as an example. The pulse widths of the square pulses were measured at the timer device on the circuit board (NE555) for every pitch from A1 (55 Hz) to E4 (329,6 Hz). The dynamic value that is usually determined by the manually blown breath controller was controlled by an adjustable power supply unit (100 mV steps from 1000 mV [pp] to 3200 mV [ff]). As the first analyses revealed, the pulse width gradually changes with the dynamic value but there are only 3 pitch registers, where different pulse widths could be found. A low register from A1 to C#3, a mid register
Dynamic Excitation Impulse Modification
195
from D3 to G#3 and a high one from A3 to E4. The pulse height for every pitch and dynamic value was constant around 4,5V (see fig.7). The exact structure of the functions determines the timbre gradient and therefore the specific character of the instrument sound. The oboe for example has only two registers with completely different pulse widths for the corresponding dynamic values.
Fig. 7. Pulse widths and heights of the Variophon bassoon module
3.3 The Pulse Width Function By applying a polynomial regression for the measured values of each register, the following set of functions (3, 4 and 5) for the low register (f(x)low), the mid register (f(x)mid) and the high register (f(x)high) is obtained for 1 <= x <= 23 and x being the dynamic value. f(x)low = -0,00009x3 + 0,0053x2 – 0,1292x + 2,457 3
2
f(x)mid = -0,00001x + 0,0011x – 0,0419x + 1,4832 3
2
f(x)high = -0,000004x + 0,0004x – 0,0213x + 1,083
(3) (4) (5)
When implementing these equations in a NI Reaktor framework, the core function of the original Variophon bassoon sound module is already usable in realtime as a digitised model. But to realize an experimental platform it is necessary to abandon the concept of fixed registers and develop a function that includes pitch as a continuous parameter. In the following equation (6) i is the pulse width in ms, c is the pitch range in cent (for example c=3100 for a pitch range from A1 to E4), y is the played pitch in cent with the lowest pitch within the defined pitch range as 0 cent and k is an exponent for defining the kind of pitch dependency (for example a logarithmic or exponential dependency).
196
M. Oehler and C. Reuter
(6)
3.4 Application of the System By means of the developed environment it is possible to validate the pulse forming theory as a feasible explanation of the sound generating process of wind instruments as well as to simulate several novel experiments in the field of instrument acoustics. In a prototypical perception experiment the relevance of micromodulations and vibrato in oboe sounds was examined. On one hand, this example demonstrated the performance of the developed analysis- and synthesis-framework; one the other hand, it could be shown that timbre modulation, in other words, realistic pulse width und cycle duration modulation, is an important factor for the perceived naturalness of oboe sounds. A conducted ANOVA and Tukey HSD post hoc test2 showed (p<.01) that amplitude, frequency and combined amplitude and frequency modulation are all perceived significantly less natural than combined pulse width und cycle duration modulation. The complete study can be found in Oehler (2008). In an experiment using real and synthesized bassoon sounds instead of oboe sounds, similar results could be observed (see Oehler & Reuter 2007).
4 Discussion The specific functionality of the Variophon was unsolved for a long time, but by means of the newly-discovered material it is now possible to use the Variophon algorithms, based on the pulse forming theory, as foundation of a synthesis and analysis system for wind instrument sounds. For this reason, research can be continued, that abruptly ended 25 years ago. Besides the possibility of synthesizing more realistic excitation impulses by means of cosinusoidal or polygonal impulses, where the rising and falling edges of the impulses can be adjusted freely, the described formalisation is another example for the potential, resulting from the transfer to the digital platform and the use of the digitally implemented pulse forming principle. Furthermore some important features of the sound production process, as the multiplicative interconnection between pulse forming and breath noise or the relevance of micromodulations, can now be considered. The outcome of the prototypical perception experiments support the hypothesis that timbre modulation (evoked by the excitation function) is an important factor for the perceived naturalness of oboe vibrato sounds. The use of the currently constructed pulse forming based synthesis and analysis framework for wind instrument sounds is an alternative method to analyze modulation effects. Further investigations may be useful for exploring new sound synthesis algorithms as well as for other experiments in the field of timbre research.
2
ANOVA; analysis of variance; Tukey HSD (honestly significant difference) post hoc test: A post-ANOVA comparison.
Dynamic Excitation Impulse Modification
197
Acknowledgement The project was financially supported by the German Research Foundation (DFG).
References Auhagen, W.: Dreieckimpulsfolgen als Modell der Anregungsfunktion von Blasinstrumenten. Fortschritte der Akustik: Plenarvorträge und Kurzreferate der 13. Gemeinschaftstagung der Deutschen Arbeitsgemeinschaft für Akustik (DAGA), 23.-26.3.1987, Aachen, 709–712 (1987) Blens, J.: Die Zuordnung der von Blasinstrumenten bekannten Impulsfolgen zu ihren Spektren mit Hilfe der Fourieranalyse. Staatsarbeit, Musikhochschule Köln (1993) Enders, B.: Variophon. In: Enders, B. (ed.) Lexikon Musikelektronik, p. 342. Schott, Mainz (1985) Fricke, J.P.: Formantbildende Impulsfolgen bei Blasinstrumenten. Fortschritte der Akustik: Plenarvorträge und Kurzreferate der 13. Tagung der Deutschen Arbeitsgemeinschaft für Akustik (DAGA), 8.-10.4.1975, Braunschweig, 407–411 (1975) Fricke, J.P.: Die Impulsformung: ein Erklärungsmodell für Klangentwicklung und Klangideal bei Holzblasinstrumenten. In: Widholm, G., Nagy, M. (eds.) Das Instrumentalspiel. Beiträge zur Akustik der Musikinstrumente, Medizinische und Psychologische Aspekte des Musizierens, pp. 109–118. Doblinger, Wien (1989) Oehler, M.: Die digitale Impulsformung als Werkzeug für die Analyse und Synthese von Blasinstrumentenklängen. Peter Lang, Frankfurt/M (2008) Oehler, M., Reuter, C.: Virtual wind instruments based on pulse forming synthesis. Journal of the Acoustical Society of America 120(5), 3333 (2006) Oehler, M., Reuter, C.: Vibrato experiments with bassoon sounds by means of the digital pulse forming synthesis and analysis framework. In: Proceedings of the 123rd AES Convention, New York, 5.-8.10.2007 (2007) Reuter, C.: Der Einschwingvorgang nichtperkussiver Musikinstrumente. Peter Lang, Frankfurt/M (1995) Schumann, E.: 1929. Physik der Klangfarben II. Habilitationsschrift, Berlin (1929), Copy of the galley proof, Leipzi (1940) Voigt, W.: 1975. Untersuchungen zur Formantbildung in Klängen von Fagott und Dulzianen. Regensburg, Bosse (1975) Wagner, K.W.: Einfuehrung in die Lehre von den Schwingungen und Wellen. Dieterich, Wiesbaden (1947)
Non-linear Circles and the Triple Harp: Creating a Microtonal Harp Eleri Angharad Pound University of Leeds [email protected]
Abstract. With the increased usage of microtones by contemporary composers, research is needed into the various systems currently in use along with other potential ways of constructing musical temperaments. Considerations need to be made regarding the performance practicalities of such systems, especially when used on conventional orchestral instruments with limited means of producing unconventional intervals. The harp is an instrument which is confined to equal temperament and therefore is problematic for composers wishing to use alternative tuning systems. This article looks at the possibility of creating a microtonal harp using an older version of the instrument and describes a new tuning system proposed by Sturman (Sturman 2005) using a non-linear circle equation. The article outlines the details of this tuning system and how it fits with the triple harp. It then explains some of the practicalities involved in the process of composition using this unusual tuning in conjunction with the traditional instrument.
1 Introduction Western classical music from around the 18th century has been dominated by the use of twelve-tone equal temperament. During the past century several composers have experimented with expanding or moving away from the use of this temperament. Often this is done by introducing quartertones, that is doubling the number of pitches in the scale, whilst others have sought alternatives ranging from various equal temperaments to the use of older temperaments, mean-tone for example, to more exotic (re-)inventions such as Harry Partch’s 43-division scale (Partch 1979, 133). Compositions using these scales present certain challenges in performance. Practically all of our instruments have been designed with twelve-tone equal temperament in mind consequently making the performance of alternative tuning systems, even quartertones very difficult or in some cases impossible. Therefore, before embarking on composing using microtones a composer must consider the practicalities of its use on conventional instruments. Various composers have approached this dilemma in different ways: by designing new instruments for the new tuning systems, for example Partch and Alois Hába; or retuning conventional instruments such as the guitar or the violin, as in Brian Ferneyhough’s Kurze Schatten II (1990). There are also manuals emerging, created by performers with fingering charts for intervals such as the quartertone and eighth tone, making it ever easier to compose using smaller divisions of the equal tempered scale. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 198–203, 2009. © Springer-Verlag Berlin Heidelberg 2009
Non-linear Circles and the Triple Harp: Creating a Microtonal Harp
199
2 The Triple Harp The modern concert pedal harp is an example of an instrument essentially confined to twelve-tone equal temperament. Consisting of seven strings to the octave which can be altered a semitone at a time by pedals, it is problematic for a composer wishing to use alternative tuning systems. A solution can be found by looking to an older version of this instrument, the triple harp. This instrument was all but abandoned when the pedal version of the instrument – considered at the time to be superior to the triple harp due to the ease in which all twelve keys could be accessed – was developed (Ellis 1991). For composition today however, the triple harp could be considered a far more versatile instrument. Where the pedal harp has one set of strings the triple has three parallel sets, with the middle set placed slightly forward of the other two, creating several ‘v’ shapes, so as to permit access to those strings (see figure 1). Conventionally these strings would be tuned such that the two outside sets play a c major scale and the middle set c# major; consequently all pitches of the chromatic scale are relatively easily accessible. This arrangement means that there are several dupliFig. 1. Triple Harp Soundboard cate pitches within the twenty-one strings of each octave, although it would still not be possible to have a full set of quartertones, many other possibilities of retuning are available for a composer wishing to create a microtonal harp.
3 Non-linear Tuning Systems Rob Sturman1 has researched the possibility of using non-linear circles to create a closed tuning system, containing the octave. A perfect circle can be used to represent equal temperament; it is divided into twelve equal parts each representing a semitone. To move from one note to the next a rotation takes place within the circle by the same angle each time. For twelve-tone equal temperament, this angle is 1/12 of the circle, meaning after twelve rotations the circle is complete, reaching the starting point once more. This process can be described by the following circle equation: 1
Lecturer in Applied Mathematics, University of Leeds.
200
E.A. Pound
θ →θ +Ω
(mod 1)
…(1)
Where Ω is fixed and a rational number. This linearity means that transposition between all keys is possible although some of the intervals are not as near to the ‘perfect’ intervals as in some other tuning systems previously used. Pythagorean tuning, for example, is created by combining a series of perfect fifths, however where the series should lead us back to the original starting pitch (or the octave) the actual note reached differs by what is known as the ‘Pythagorean comma’. Essentially each interval is slightly too large to form a perfect circle and so continues in a spiral to create new pitches (Fauvel, Flood and Wilson 2003). Sturman has researched the use of an alternative version of equation (1), the nonlinear circle equation:
θ →θ + Ω −
k Sin(2πθ ) (mod1) 2π
It can be seen that the interval added each time ( Ω −
…(2)
k Sin(2πθ ) ) depends on 2π
the note ( θ ) itself. Let k = 0 and we obtain equation (1) and hence produce a perfect circle which can then be translated into an equal tempered scale. If k ≠ 0 then generally the circle will be non-linear. However for certain irrational values of k , equation (2) will still produce a perfect circle, a circle made up of unequal parts. Translating this into a tuning system provides us with a scale containing a limited number of pitches and containing intervals of varying sizes within the octave.
4 Microtonal Triple Harp Applying this principle, a twenty-one-division scale was created for the harp as shown in the table below. Sturman provided five twenty-one division scales, including twenty-one-division equal temperament; the one chosen uses a value for Ω of 0.055130 and a value for k of 0.175160. The table below outline the details of the tuning system showing the differences in the size of each interval in the scale and comparing these intervals to their counterpart in twenty-one division equal temperament. The size of the Fig. 2. New triple harp tuning intervals (see figures 2 and 3) range from 32.7 Interval sizes cents, approximately a sixth of a twelve-division equal tempered tone between centre F and left hand G strings, to 99.59 cents, not quite a semitone between the right hand and centre C strings.
201
ce nt s
Non-linear Circles and the Triple Harp: Creating a Microtonal Harp
120
100
80
60
40
20
0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
interval
Fig. 3. Interval sizes
5 Notation Notation for this system has the potential to cause confusion; it would be simple to use a basic scordatura, noting the retuning at the beginning of the score and then proceeding to write the strings to be played. This harp is traditionally played on the right shoulder, with the left hand playing the treble line, the right the bass. However, more recently it has become more common to find some harpists adopting the pedal harp practice of playing on the left shoulder. Normally this would not cause a problem, but the use of a scordatura would mean that a written pitch, for example a = 440Hz, would indicate the string on both the left and right hand side of the harp, i.e. a = 440Hz and a = 450.56Hz. This would essentially result in a different composition depending on the harpist’s choice of performance practice. The solution decided upon for this project has been to use a scordatura but with the addition of altered clefs which can indicate whether the notes are to be played by the left or the right hand. The system also uses a ‘v’ or an inverted ‘v’ shape to indicate the use of a string in the centre set of strings as opposed to using sharp and flat signs which it was felt correlates too closely with the idea of raising or lowering a pitch by 50 cents. This method allows the performer to use a notation which is in the main very familiar to them whilst offering a clear indication of how the music is to be played.
6 Composing for Microtonal Triple Harp Combined with the triple harp, this system offers a wide vocabulary of sounds not easily accessible with more conventional scales and instruments. The variety of intervals between pitches can create an unfamiliar feel to the music, however it also
202
E.A. Pound
Table 1. New Triple harp 21-division tuning compared to 21-division equal temperament Note on harp
Frequency 21-div B
Frequency 21-div Equal Temperament
Cents 21-div B
Cents 21-div Equal Temperament
Distance between 21-div B intervals (cents)
Value of
aleft
440
440
0
0
aright
450.56
454.77
41.04
57.14
41.04
0.03
0
acentre
462.78
470.03
87.37
114.29
46.33
0.07
bleft
477.27
485.8
140.75
171.43
53.38
0.12
bright
494.79
502.1
203.16
228.57
62.41
0.17
bcentre
516.18
518.95
276.43
285.71
73.27
0.23
cleft
542.15
536.37
361.43
342.86
85
0.3
cright
572.75
554.37
456.49
400
95.06
0.38
ccentre
606.67
572.97
556.08
457.14
99.59
0.46
dleft
641.13
592.2
651.73
514.29
95.65
0.54
dright
673.18
612.07
736.17
571.43
84.44
0.61
dcentre
701.24
632.61
806.87
628.57
70.7
0.67
eleft
725.29
653.84
865.28
685.71
58.41
0.72
eright
746.1
675.78
914.24
742.86
48.96
0.76
ecentre
764.53
698.46
956.49
800
42.25
0.8
fleft
781.36
721.9
994.19
857.14
37.7
0.83
fright
797.23
746.12
1028.99
914.29
34.8
0.86
fcentre
812.67
771.16
1062.2
971.43
33.21
0.89 0.91
gleft
828.17
797.04
1094.9
1028.57
32.7
gright
844.2
823.78
1128.09
1085.71
33.19
0.94
gcentre
861.28
851.43
1162.77
1142.86
34.68
0.97
aleft
880
880
1200
1200
37.23
1
contains a perfect equal tempered fourth (500 cents) and other intervals which are near to other equal tempered or ‘perfect’ intervals. The instrument itself offers techniques and sounds not available on the pedal harp. Due to the layout of the strings when they are damped any strings vibrating in the middle set will continue to do so, this can of course be used as a feature in a composition. Another interesting feature of this harp is that the tension of the strings is very low and that they are placed within relatively close proximity of one other. As a consequence, plucking certain low strings forcibly causes them to collide with the adjacent centre string. Following a period of research and experimentation with the sounds that the instrument could create, the first piece to be written using the microtonal triple harp is a trio with violoncello and percussion. The composer attempts to explore the perception of this tuning system by alternating between close clusters, thus emphasising the unusual and small intervals available in the scale, then using the extremes of the range of the instrument, obscuring the perception of the tuning system
Non-linear Circles and the Triple Harp: Creating a Microtonal Harp
203
more. The use of percussion provides a potential means to avoid equal temperament within the piece, combined with harmonics and other techniques on the ‘cello; the beginning of this piece certainly seems to attempt this. Further into the composition tuned percussion is introduced providing a more stable representation of twelve-tone equal temperament. It is, however, used in such a way as to attempt to create ambiguity in the listener’s perception of the two contrasting tuning systems. The piece is structured so that the instruments play certain techniques or gestures for a block of time determined by the Fibonacci Series and this allows for the exploration of extended techniques on all the instruments, whilst experimenting with the new tuning system and its combination with conventional equal temperament.
7 Conclusion This tuning system is fairly easy to implement on the triple harp due to the fact that once the strings are tuned to the correct frequencies they remain fixed throughout the performance of the piece. This is not the case with a large number of instruments and as previously mentioned they are often fairly restricted to equal temperament. Even with string instruments such as the violin or violoncello which, although are more flexible in the physical ability to achieve microtonal intervals, they are still restricted by the performer’s ability to hear the intervals and the ability to correctly place them on the instrument. When composing with this system it is of course necessary to consider the way in which the intervals will interact with other instrument playing equal temperament. This study is conceived as a model for a practical and potentially fruitful exploration of microtonal composition in the future.
References Ellis, O.G.: The Story of the Harp in Wales. University of Wales Press, Cardiff (1991) Fauvel, J., Flood, R., Wilson, R. (eds.): Music and Mathematics: from Pythagoras to fractals. Oxford University Press, Oxford (2003) Ferneyhough, B.: Kurze Schatten II. Peters Edition, London (1990) Partch, H.: Genesis of a music: an account of a creative work, its roots and its fulfilments. Da Capo Press, New York (1979) Sturman, R.: Tuning: A dynamical Systems approach. Poster presented at the Royal Institution of Great Britain Conference, March 18 (2005)
Applying Inner Metric Analysis to 20th Century Compositions Anja Volk Department of Information and Computing Sciences, Utrecht University [email protected]
Abstract. This paper compares metric analyses of three pieces by Skrjabin, Webern and Xenakis using Inner Metric Analysis. Inner Metric Analysis assigns to each note of a piece a metric weight. The analysis is based on the detection of regular pulses created by the onsets of notes. Metrically strict pieces, such as Renaissance madrigals or ragtime pieces, often result in metric weight profiles that correspond to the accent schema of the notated bar lines. Hence these pieces are characterised as being metrically coherent since the notes generate a metrical structure that is synchronous with the abstract grid of the bar lines. Compositions of the 20th century very often do not follow such a strict metricity. The metric profiles therefore give interesting insights into the time organisation of these pieces far beyond the notated bar lines. Furthermore, we apply a processive approach to meter in order to study how the metric structure evolves over time while listening to the music.
1
Inner Metric Analysis
This paper compares metric analyses of Skrjabin’s Op. 65 No. 3, Webern’s op. 27 and Xenakis’ Keren. We use Inner Metric Analysis (IMA) in order to describe the metric-rhythmic structure of these pieces. IMA describes the inner metric structure of a piece of music generated by the actual notes inside the bars as opposed to the outer metric structure associated with a given abstract grid such as the bar lines. The method is based on metric weight profiles generated for all notes. The underlying principle is the detection of regular pulses created by the onsets of notes. The details of the model have been described in Fleischer 2003 and Chew et al. 2005. Therefore we give only a short summary over the model in the following. The general idea is to search for all pulses (chains of equally spaced events) of a given piece and then to assign a metric weight to each note. The pulses are chains of equally spaced onsets of the notes of the piece called local meters. Let On denote the set of all onsets of notes in a given piece. We consider every subset m ⊂ On of equally spaced onsets as a local meter if it contains at least three onsets and is not a subset of any other subset of equally spaced onsets. Let k denote the number of onsets a local meter consists of minus 1. Hence k counts the number of repetition of the period (distance between consecutive onsets of the local meter) within the local meter. The metric weight T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 204–210, 2009. c Springer-Verlag Berlin Heidelberg 2009
Applying Inner Metric Analysis to 20th Century Compositions
205
of an onset is then calculated as the weighted sum of the length k of all local meters mk that have an incidence at this onset. Let M () be the set of all local meters of the piece of length at least . The general metric weight of an onset, o ∈ On, is as follows: W,p (o) = kp . (1) {m∈M():o∈mk }
A further refinement of the metric weight is the calculation of the spectral weight (Nestke and Noll 2001) which is based on the extension of the local meters throughout the entire piece, denoted as ext(ms,d,k ) = {s + id, ∀i}. Each local meter ms,d,k contributes to the spectral weights of the events t in its extension t ∈ ext(ms,d,k ). The spectral weight is defined as: SW,p (t) = kp . (2) {m∈M():t∈ext(m)}
The spectral weight reflects the most dominant metric characteristic of a piece because it ignores local changes and is in most cases very stable throughout the entire piece. In contrast to this the metric weight is sensitive to local changes in the metric structure of a piece. For the analyses in this paper we have chosen = p = 2.
2 2.1
Analytic Results Skrjabin’s op. 65 No. 3
Skrjabin’s op. 65 No. 3 is notated as 2/4. However, in many parts of the piece either the left or the right hand performs a continuous movement in eighth note triplets. Therefore the layers of the spectral weight profiles in Figure 1 show a subdivision of the two main beats into three eighth notes each with great weights on the first and second quarter notes of the bars and lower weights on the second, third, fifth and sixth eighth notes of the bars. Hence, the two main beats of the 2/4 gain the greatest metric accents. However, the spectral weight of both hands (left most picture in Figure 1) shows that in almost every bar the second beat gains an even greater weight than the first beat. This distinction
Fig. 1. Excerpt from the analysis of Skrjabin’s op. 65 No. 3 showing bars 1-16: spectral weights for both hands (left picture), right hand (middle picture) and left hand (right picture). The higher the line, the greater the corresponding weight. The background gives the location of the bar lines.
206
A. Volk
between the layers of the second and first beats becomes more obvious in the spectral weight of the right hand part (middle picture in Figure 1). In contrast to this, the relation of the first and second beat alters every other bar indicating a grouping into two bars in the spectral weight of the left hand part (right picture in Figure 1). In every second bar the first beat gains the greater weight, while in the other bars the weights tend to fall on a nearly equal level. Hence, from the global perspective of the spectral weight, the two main beats gain the greatest metric weights but the role of the first and second main beats differs in the left and right hand parts.
Fig. 2. Excerpts from the metric weights of the entire piece for the left hand (left) and the right hand (right)
The excerpts of the metric weights of the entire piece in Figure 2 confirm this difference in the metric structures of the right and left hand parts from the local perspective. The metric weight of the left hand part exhibits clearly a grouping into two bars with greater weights on the first beat of every other bar. The right hand part distinguishes only two layers: the weights of the second beats of the bars form the highest layer, all other weights form the second layer. Hence the left hand is characterised by a metric structure that is synchronous with the typical accent schema of a 2/4, while the right hand has a strong accent on the second beat. Skrjabin’s etude hence follows the tendency often observed in analyses of piano music, that the left hand part is responsible for creating a stable metric structure that corresponds to the structure implied by the notated time signature while the right hand part is granted more freedom. 2.2
Webern’s Op. 27, 2nd Movement
David Lewin argues (see Lewin 1962 and Lewin 1993) that this piece notated in 2/4 contains structural elements that support a 3/8. The metric weight of the piece shown in Figure 3 confirms that the notes do not generate weight layers
Fig. 3. Metric weights for Webern’s Op. 27, 2nd movement
Applying Inner Metric Analysis to 20th Century Compositions
207
that correspond to the typical accent structure of a 2/4. On the other hand, the interpretation of the same weight as 3/8 in Figure 4 reveals within the first part of the piece weight layers that confirm Lewin’s observation. Every first beat of these bars gains a greater weight.
Fig. 4. Metric weights for Webern’s Op. 27, 2nd movement, interpreted as 3/8
The processive approach according to Volk (2005) that models the unfolding of metric hierarchy over time while listening to the piece reveals another interesting aspect that is hidden in the analysis of the entire piece discussed so far. Figure 5 shows five excerpts of the processive perspective using the cumulative window approach. In this approach the metric process of unfolding over time is modelled by considering all relation of the current event to the past event. Hence for a piece consisting of n onsets ot ∈ On, t = 1, . . . , n we gain n − 2 analysis windows wt each containing the analysis of all onsets o1 , o2 , . . . , ot at the time point t. The upper most picture in Figure 5 shows that a periodicity in the weight layers with great metric weights on the second beat in the 3/8 bars occurs quite early along this process. This structure is later on reinterpreted: the middle and bottom pictures in Figure 5 show that the new incoming notes change the metric interpretation towards a great metric weight on the first beats of all bars. This is a process of gradually reinterpreting the events of the past with great accents on the second beat. The second and third pictures from above show a sudden drop of all weights in the last section of the weight profile. The fourth picture from above illustrates the upgrading of the weights in this last segment according to the reinterpretation of the metric structure. The lower most picture shows the new highest layer built upon the first onsets of all bars while the weights in the first segment of the weight profile have been downgraded. In contrast to Skrjabin’s etude, the inner metric structure of this piano variation by Webern does not show any correspondence with the typical weight layers of the notated 2/4 bar. In the first half of the piece the weight layers correspond to a 3/8, while the second half does not exhibit any weight layers. 2.3
Xenakis’ Keren
Xenakis’ Keren1 consists of segments of very different onset density. For instance, bars 23-28 and bars 35-42 contain a continuous chain of 32nd notes. Hence they 1
I would like to thank Chris Share (Sonic Arts Research Centre at Queen’s University Belfast) for encoding this piece.
208
A. Volk
Fig. 5. Excerpts from the Cumulative window approach to Webern’s Op. 27
form two very long local meters with respective great metric weights on these notes. The other areas in the piece do not contain such long local meters and have therefore in contrast to this very low metric weights. Figure 6 shows excerpts from these different areas with extremely different weight levels. In order to avoid the influence of the local meters in bars 23-28 and bars 35-42 on the entire piece we analyse the sections of bars 1-23 and bars 43-63 separately. The spectral weight of bars 1-23 in Figure 7 is characterized by a weak periodicity: slightly greater weights are located on every sixth sixteenth note. The metric weights of the bars 1-23 make clear that the regularity in the spectral weight derives from the influence of bars 12 and following. The metric weight within the bars 1-11 (Figure 8) shows hardly any periodicity, while the metric weight within the bars 12-23 (Figure 9) shows a greater metric weight on every sixth sixteenth note.
Applying Inner Metric Analysis to 20th Century Compositions
209
Fig. 6. Excerpts from metric weight of Xenakis’ Keren (left: bars 4ff, middle: bars 24ff, tight bars 37ff)
Fig. 7. Excerpt from spectral weight of bars 1-23 of Xenakis’ Keren, interpreted as 6/16
Fig. 8. Excerpt starting at bar 1 from metric weight of bars 1-23 of Xenakis’ Keren, interpreted as 6/16
Fig. 9. Excerpt starting at bar 12 from metric weight of bars 1-23 of Xenakis’ Keren, interpreted as 6/16
On the other hand, the weight profile of the segment of bars 43-63 shows a regular pattern (see Figures 10 and 11) that is even in synchrony with the notated bar lines: the first beat in each bar gains the greatest metric weight. However, the typical high weights on the second beats of the notated 4/4 bars are missing. Hence, the bar lines in this piece might serve more as a help for orientation for the performer rather than as an information about the metricity of this piece. Nevertheless we observe in the last bars a surprising correspondence between the inner metric structure as expressed by the notes and the bar lines.
210
A. Volk
Fig. 10. Metric weight bars 43-63 of Xenakis’ Keren
Fig. 11. Spectral weight bars 43-63 of Xenakis’ Keren
2.4
Comparison of the Results
The comparison between the analytic results of the pieces by Skrjabin, Webern and Xenakis shows that Skrjabin’s etude is metrically the most strict piece. In the first segment also Webern’s piece exhibits weight layers, but they are not in synchrony with the bar lines. This leads to the question, whether Webern intentionally created a conflict between the notated 2/4 and the intended 3/8 (for arguments in favour of the 2/4 see Lewin 1993). Xenakis’ piece shows the least weight layers indicating that this piece performs a considerable freedom from any strict metrical schema.
References Chew, E., Volk, A., Lee, C.-Y.: Dance Music Classification Using Inner Metric Analysis. In: Proceedings of the 9th INFORMS Computer Society Conference, pp. 355–370. Kluwer (2005) Fleischer (Volk), A.: Die analytische Interpretation. Schritte zur Erschließung eines Forschungsfeldes am Beispiel der Metrik. Ph.D. Dissertation. Berlin: dissertation.de - Verlag im Internet Gmbh (2003) Lewin, D.: A Metrical Problem in Webern’s op. 27. Music Analysis 12(3), 343–353 (1993) Lewin, D.: A Metrical Problem in Webern’s op. 27. Journal of Music Theory 6(1), 124–132 (1962) Mazzola, G.: The Topos of Music. Birkh¨ auser, Basel (2002) Nestke, A., Noll, T.: Inner Metric Analysis. In: Haluska, J. (ed.) Music and Mathematics, pp. 91–111. Tatra Mountains Publications, Bratislava (2001) Volk, A.: Metric Investigations in Brahms’ Symphonies. In: Mazzola, G., Noll, T., Lluis-Puebla, E. (eds.) Perspectives of Mathematical and Computer-Aided Music Theory, pp. 300–329. epOs-Music, Osnabr¨ uck (2004) Volk, A.: Modeling a Processive Perspective on Meter in Music. In: Proceedings of the ICMC 2005, Barcelona (2005)
Tracking Features with Comparison Sets in Scriabin’s Study op. 65/3 Atte Tenkanen Department of Musicology, University of Turku, Finland [email protected] Abstract. Comparison set analysis is a method with which, for instance, formal divisions of musical compositions can easily be perceived. Its applications are, to a certain extent, comparable to methods used in pattern matching. The purpose in comparison set analysis is to evaluate the prevalence of a chosen musical property through a musical piece. In comparison set analysis the basic musical units like pitches, pitch classes or durations are segmented into overlapping sets and these segments are then compared with the (pre)selected comparison set(s). The results, which are largely defined statistically, can be presented in different forms of graphs showing trends, mean points and, in the case of multiparametric comparison set analysis, connections between the parameters analysed. In this paper, I will demonstrate the method by analysing some piano pieces by Scriabin. Both pitch-class sets and rhythm sets are applied.
1
Comparison Set Analysis
One goal of traditional music analysis consists of identifying musical entities like chords (’C’, ’G7’ or ’6-32’, ’6-30A’ etc.), durational patterns (trochee, dactyl etc.) or melodic motives. Analytic conclusions are then drawn on the basis of their occurrences. In comparison set analysis (CSA) the main idea is (i) to select a musical characteristic that we are interested in, say ’whole-toneness’ or ’dactylness’ (ii) to find a ’prototype vector’ for the selected characteristic and (iii) to compare this vector or ’comparison set’ to the sets constructed from the note events in the entire composition. To illustrate by a simple analogy, if we substituted different fruits for sets, we could measure, say, the ’lemon-ness’ of these fruits using some similarity function constructed for that purpose (see figure 1). The result of a series of comparisons may be a curve graph which gives information about the prevalence of the selected characteristic at each moment of the composition. Comparison values are scaled between 0 and 1. The idea of the comparison set analysis can be applied to any time-series type of data. In some ways, CSA is about pattern matching in several stages. In this paper CSA is applied to symbolic representations of music and especially midi files are utilized. The first step of the analysis is to segment the elementary events, i.e. pitches, pitch classes or note onsets of music into overlapping sets of the same cardinality1 . Secondly, the overlapping segments are converted to prime form 1
Comparison functions work best with the sets of the same cardinality.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 211–219, 2009. c Springer-Verlag Berlin Heidelberg 2009
212
A. Tenkanen
Fig. 1. Comparison set analysis using a ’lemon-ness function’
Fig. 2. The main parts of comparison set analysis are here encircled with the dashed line. The left path illustrates a similar approach to that one described in chapter "The occurrences of the ’Mystic chord’ among Scriabin’s piano pieces". The right path associates with the approach described in chapter "Detecting Op. 65/3 with comparison sets".
and, in the case of pitch-class sets, possibly labelled by Forte names. This stage enables ’straight’ analysis based on frequency of the ’set classes’2 . I will later present an example which is premised on bare segment frequencies (cf. Fig. 4). 2
I use the term ’set class’ here in a generic sense to cover any collection of sets categorized as members of equivalent classes, whether they are based on durations, pitches, textures or some other parameter of music. Generally, a set class is a collection of sets which are considered equivalent under some canonical operation. However, if not separately stated, the term set class (abbr. SC) refers here to collection of pitch-class sets.
Tracking Features with Comparison Sets in Scriabin’s Study op. 65/3
213
The selection of comparison sets can be based on 1) intuition, 2) the most common sets found from the piece itself, 3) certain referential sets used universally or 4) some mathematically justified method like principal component analysis (PCA). In the latter case the comparison sets are principal components calculated by means of all segment vectors. After that, the calculation takes place, which means that all segment sets are compared to the selected comparison set(s) using a similarity function. The fuzzy values (0-1) thus obtained can be used for producing graphs of different kinds. In my research methodology CSA is focused on a symbolic music description where musical material is seen as a kind of ’note-event mass’: the main focus is on the observations concerning musical form. To demonstrate the different aspects of the CSA, I will next experiment with Scriabin’s piano pieces.
2
About the Tail Segmentation and Similarity Measures Used in the Analyses
Segmentation is a critical part of the process and in music analysis in general, and it is often critically discussed (Pople 1983, 151-153; Isaacson 1992, 195; Castrén 1994, 145 etc.). There are several possibilities for imbricated segmentation and similarity measures but the results seem to be similar in every case, because the bar-based calculation smooth details away and the approach is thus, in the short run, statistical by nature. One possible solution to segment pitch classes is demonstrated in a nutshell in figure 3, in which eight pitch-class sets of cardinality 4 are formed from the beginning notes of Scriabin’s Study op. 65/3. The method used here is called ’tail segmentation’ (Huovinen & Tenkanen 2008). It requires that we represent the score as a temporally ordered note list (reading all vertical simultanities upwards). From each note on the list, we read backwards until the fourth distinct pitch-class is reached. The result is a chain of imbricated tetrachords. Each of them is evaluated by a similarity function (here costotal 3 ) with respect to a comparison set, here SC 4-8. To get a final comparison value, for instance, for the encircled note e, we have to calculate the mean of the comparison values of those sets that include the note e (0.72, 0.72, 1 and 0.79 ). In this case, the note e gets the mean value 0.81. After calculating the comparison 3
Because the function used in rhythm comparisons is the cosine of the angle between two vectors, I have used cosine based function also to measure similarities with imbricated pitch-class sets. Other measures, like David Lewin’s REL work as well. I have extended the cosθ -function, proposed by Rogers (1999), by using the subsetclass vectors 2-10CV (Castrén 1994, 4) instead of using mere ICVs (2CVs) to have differences between Z- and Tn-type SCs. I call this measure costotal. If the subsetclass vectors of the two SCs sc1 and sc2 are denoted by subcv1 and subcv2 , the formula of the function is costotal(sc1 , sc2 ) =
subcv1 · subcv2 subcv1 subcv2
where the dividend is the dot product of the subset-class vectors and the divisor is the product of the euclidean norms of the subset-class vectors.
214
A. Tenkanen
Fig. 3. The tail segmentation method in a nutshell. The segmentation cardinality is here 4. The first eigth tetrachordal sets are marked with set-class names. The comparison function is cosine based costotal.
values for all notes in a piece, the mean value per bar can be calculated to get an overview of how the prevalence of SC 4-8 changes during the whole piece. As a result a curve similar to one seen in the lower image of the figure 6 can be plotted. To produce a two-parametric CSA, I developed fairly similar procedures for rhythm analysis. I will discuss it in the context of the analysis later.
3
The Occurrences of the ’Mystic Chord’ among Scriabin’s Piano Pieces
Scriabin’s ’Promethean’ chord got its name from the composer’s orchestral work Prométhée - Le Poéme du feu, op. 60, because of its extensive and systematic use in the piece. Prométhée was completed in 1911 but as James M. Baker (1980, 17) points out, ’This chord actually occurs frequently throughout the transitional works’, referring to compositions dating from 1903-09 (Baker 1980, 18). To see how the Tn-type SC 6-34A (0,2,4,6,8,9), associated with the chord, occurs in compositions by Scriabin, I collected a small sample of piano pieces from the Internet in midi format4 . After segmenting the pitch-classes to overlapping hexachords, the proportions of SC 6-34A in all 22 pieces5 were calculated. The 4 5
To avoid the effect of misencodings as much as possible only such midi files were selected which seemed to be the most accurate for calculations. The pieces are numbered according to their composition year: 1. op. 2/1 (1887-9); 2. op. 8/4 (1894); 3. op.16/1 (1894-5); 4. op. 16/2 (1894-5); 5. op. 16/3 (1894-5); 6. op. 11/2 (1888-96); 7. op. 11/11 (1888-96); 8. op. 11/14 (1888-96); 9. op. 13/2 (1896); 10. op. 13/4 (1896); 11. op. 25/1 (1899); 12. op. 28 (1900); 13. op. 30 (1903); 14. op. 42/1 (1903); 15. op. 42/5 (1903); 16. op. 42/6 (1903); 17. op. 49/1 (1905); 18 op. 56/4 (1907); 19. op. 59/2 (1910); 20. op. 65/3 (1912); 21. op. 69/2 (1914); 22. op. 73/2 (1914).
Tracking Features with Comparison Sets in Scriabin’s Study op. 65/3
215
Fig. 4. Proportions of the ’Mystic chord’ (SC 6-34A) in Scriabin’s piano pieces. The result graph is based on overlapping hexachords formed from the sample of 22 compositions mentioned in footnote 4.
frequencies of occurrence are seen in figure 4. Though the sample is small, the picture reveals the overall trend concerning the proportion of 6-34A among the pieces. On the grounds of the bare set occurrences, the Deux Poémes op. 69/2 seems to emphasize SC 6-34A more than any other composition in this sample.
4
Detecting Op. 65/3 with Comparison Sets
In his dissertation, Eric Isaacson lists several relevant questions for guiding set theoretical analysis (Isaacson 1992, 199-200). He asks, for example, ’Within clear formal divisions, can large scale similarity paradigms be identified?’ and ’Do any similarity paradigms found group in such a way as to suggest a formal organization based on recurrence at some level? Do these patterns coincide with the formal patterns suggested by other musical parameters such as meter, rhythm, texture, and instrumentation?’ By using the two-parametric (pitch-class and rhythm-class set based) CSA some answers can be achieved. I had a hypothesis that in Scriabin’s op. 65/3 SC 6-34A is emphasized in places where the rhythm of the right hand melody remains steady (and the texture as a whole is repetitive in nature like in bars 1-4 and 63-66). It is a fairly common practice, for instance, in improvisation, to prolong a chord by embellishing it with a toccata-like figure. In order to test my hypothesis I segmented the pitch
216
A. Tenkanen
classes using a hexachordal tail segmentation and SCs 6-34A, 6-1, 6-32 and 828 as comparison sets.6 The hexachordal segmentation is justified, because the mean cardinality per bar is 6.16 and it remains fairly plain (stand. dev.=2.00). The results are seen in the lower image in figure 6. To attain a similar type of graph for the rhythm analysis, I chose the comparison rhythm set from the first bar of the melody (right hand part in bar 1, see figure 3). It is thus derived from the two steady eight note triplets and in consequence the comparison rhythm-set class (rSC) used in analysis is [1-1-1-11-1]7 . The cosine distance presumes that the rhythm vectors to be compared are of equal length. Thus, I segmented all the durations between the note onsets of the melody to the rhythm sets of cardinality 6 and converted them to the prime formed rSCs.
Fig. 5. Two types of rhythm-set classes. In both cases a) and b) the rSC remains the same because of the selected cardinality and the properties of the prime form function.
6 7
SCs 6-1, 6-32 and 8-28 are used here only as referential sets so that the reader can see the differences between the results given by different comparison sets. To find the prime form for the rhythm sets, I used a procedure similar to those used with pitch-class sets. In this context the rhythm-set class (rSC) equals the ordered duration-set class defined by Edward Pearsall (1997, 212). In prime form vector, the longest durational proportions are packed to the right. If the proportional representation of the dactyl is (2,1,1), its prime form is a rhythm-set class [1-12]. I simply call the former representation ’rhythm set’. The smallest duration in both representations have been set to 1 to make the rhythm proportions clear (after Pearsall 1997, 211). Such a cyclic permutation of the rhythm set vector (x1 , x2 , ..., xn ) n i which returns the greatest value in expression i=1 xi is considered a prime form of the rhythm set. If we have three rhythm sets [2-1-1], [1-2-1] and [1-1-2], the last one is also rSC: (21 + 12 + 13 ) = 4 < (11 + 22 + 13 ) = 6 < (11 + 12 + 23 ) = 10.
Tracking Features with Comparison Sets in Scriabin’s Study op. 65/3
217
Fig. 6. Two types of trend curves: CSA is here applied to pitch-class sets (lower graph) and rhythm sets (upper graph)
The cosine distance was used as a comparison measure to create the similarity graph for rhythm analysis (see the upper image in figure 6). In the beginning of the piece (bars 1-4), all the imbricated rhythm segments are based on similar triplets and thus of type [1-1-1-1-1-1]. As a result the comparison value is plainly 1. The curve is plain also between bars 63-66 in the beginning of the recapitulation (marked with A2). However, the similarity value is now lower than 1: the cosine between the comparison rSC [1-1-1-1-1-1] and the repeating dactyl-type rSCs [1-1-2-1-1-2] is 0.94. But now back to my hypothesis, which proved to be wrong: the plain rhythm in the melody and the prevalence of 6-34A seems to meet only momentarily in bar 618 . In the other parts of the composition the connection seems to be quite the opposite: when the melodic rhythm patterns are repeated, 6-34A prevalence is quite low. The response of the comparison set 8-28 is fairly plain in the case of op. 65/3. However, it is well known that, within some other ’non-tonal’ pitch-class collections9 , the octatonic 8-28 notably emerged in Scriabin’s late works (see e.g. Callender 1998, 219). As Baker puts it, ’in Scriabin’s case the departure from 8
9
In bars 61-62 we find a certain kind of dynamical turning point. According to the graphs in figure 6, this piece can be divided into 6 sections: A1-B1-B2-C-A2Conclusion. The turning point falls on the end of the C-section before recapitulation. The golden section falls also, from the durational sight of view, to bar 62. Among the all actual vertical sonorities in the piece, there is only one instance of pentachord, SC 5-24A, which can be found in bar 61 and as a glissando chord in bar 62. Other vertical sonorities are of cardinality 4 or lower, among them several types of trichords and two types of tetrachords, 4-16A and 4-27A. All the abovementioned SCs are subsets of 6-34A. In addition to 8-28, Callender (1998, 219) mentions 6-35 (whole-tone), 6-34 (mystic), 6-Z49, 7-34 (acoustic) and 7-31 as the most prominent pitch-class collections in Scriabin’s later works. Among these, 6-Z49 and 7-31 are also subsets of the octatonic collection.
218
A. Tenkanen
21
0.65
22
6 20
0.50
0.55
15 10 18 12 48 1613 11 9 1 14
0.80
7
0.75
32
6−34A
17
0.70
5
0.45
8−28
0.60
19
0.65
0.40
0.60 0.4
0.55 0.5
0.6
0.7
0.8
0.9
6−32
Fig. 7. The 22 meanpoints in a costotal6−32/6−34A/8−28 -space, drawn from the works mentioned in the footnote 5. Op. 65/3 is marked with nr. 20. The tail segmentation cardinality used is 6.
tonality was not abrubt - rather, he made a gradual transition from tonality to atonality over a period of years, beginning as early as 1903’ (Baker 1980, 1)10 . To see how this transition appears among the piano pieces mentioned earlier, I segmented all the pieces to hexachordal sets, computed the comparison values using the comparison SCs 6-32, 6-34A and 8-28, and calculated the means. The result is seen in figure 7, in which 22 mean points, denoting the piano pieces, appear in three-dimensional comparison set space. They, roughly speaking, seem to divide into four groups according to the ’diatonic’ (6-32) and ’Promethean’ (6-34A) dimensions. The ’Promethean’ chord is most strongly present in the pitch-class set structure of op. 65/3 (no. 20), which constitutes a group of its own. The most ’diatonic’ piece among all is op.16/3 (no. 5). The overall trend matches well with the notions by Callender and Baker.
5
Conclusions
I have briefly presented the possibilities that the comparison set analysis offers to an analyst. The occurrences of Scriabin’s ’mystic chord’, associated with SC 6-34A, were detected using both the large scale analysis with 22 piano pieces and 10
For another method to quantify this transition and an investigation of the continuum between tonality-atonality, see Parncutt (in this Volume).
Tracking Features with Comparison Sets in Scriabin’s Study op. 65/3
219
within a single composition, op. 65/3. The results matched well with the notions presented by the different analysts. The method is still under development and it is far from finished or perfect. This is, in fact, good news: the CSA is prone to new ideas in the form of new applications.
References Baker, J.M.: Scriabin’s Implicit Tonality. Music Theory Spectrum 2, 1–18 (1980) Callender, C.: Voice-Leading Parsimony in the Music of Alexander Scriabin. Journal of Music Theory 42(2), 219–233 (1998) Castrén, M.: RECREL: A Similarity Measure for Set-Classes. Ph.D. dissertation, Sibelius Academy (1994) Huovinen, E., Tenkanen, A.: Bird’s Eye Views of the Musical Surface - Methods for Systematic Pitch-Class Set Analysis. Music Analysis 26(1-2), 159–214 (2008) Isaacson, E.: Similarity of Interval-class content Between Pitch-class Sets: The IcVSIM Relation and its Application. Ph.D. dissertation, Indiana University (1992) Parncutt, R.: Tonal implications of harmonic and melodic Tn-types. In: Klouche, T., Noll, T. (eds.) MCM 2007. CCIS, vol. 37, pp. 211–219. Springer, Heidelberg (2009) Pearsall, E.: Interpreting Music Durationally: A Set-Theory Approach to Rhythm. Perspectives of New Music 35(1), 205–230 (1997) Pople, A.: Skryabin’s Prelude, Op. 67, No. 1: Sets and Structure. Music Analysis 2(2), 151–173 (1983) Rogers, D.W.: A Geometric Approach to PCSet Similarity. Perspectives of New Music 37(1), 77–90 (1999)
Computer Aided Analysis of Xenakis-Keren Kamil Adilo˘glu1 and G. Ada Tanir2 1
Berlin University of Technology [email protected] 2 Universit¨at der K¨unste Berlin [email protected]
Abstract. The similarity neighbourhood model is a computer aided mathematical approach to the paradigmatic analysis of the melodic content of a piece of music. It makes use of statistical, semiotical and computational approaches to perform an exhaustive search on melodies in given music pieces. It has been designed to help music theoreticians to perform melodic analysis of given music pieces. This paper, presents the analysis of Keren, which is a solo trombone piece of Xenakis, based on the results obtained from the similarity neighbourhood model. The similarity neighbourhood model identifies melodic similarities depending on the contour similarities of melodic segments. Keren does not contain similar melodic segments in the sense of the Baroque or classical period of western music. However this paper shows that the results of the similarity neighbourhood model are interpreted in a way considering where similarities exist, so that they contribute to a music-theoretical analysis of the concerned piece.
1 Introduction In the similarity neighbourhood (2) model, the melodic segments are considered to be of equal sized. Several representation methods were tested to represent melodic segments, mainly based on relative and absolute pitch, and they were compared with each other. These tests revealed that a relative pitch representation scheme yields the based results in the sense that they are closest to the music theoretical analysis results. In these tests, rhythmic features of melodic segments were ignored. The similarity relationships of melodies were investigated using several different similarity measures, and these similarity measures were compared with each other as well. The similarity measures basically account for standard symmetry transformations like translation, inversion. Since the rhythm was not considered, symmetry transformations like diminution and augmentation, and any other rhythmisations of the same melody are considered to be the same. Based on the results obtained from these similarity tests, the correlation coefficient was found to be the most appropriate similarity measure, because by using this measure the inversion of a melodic segment can be identified as well as the translations. Using these similarity relationships, the similarity neighbourhood model calculates the significance of the melodies by pursuing the basic idea of Ruwet (4), where he stated that longest repeated passages should be identified. Therefore similarity neighbourhood model simply counts the number of repeats of a given melody and its close variations T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 220–229, 2009. c Springer-Verlag Berlin Heidelberg 2009
Computer Aided Analysis of Xenakis-Keren
221
in the piece, and weights the number of repetitions with the length of the melodic segment. The melodies which appear more often than a threshold value, are extracted for further refinement of the melodic structure of the given piece. These so-called prototype melodies are then grouped, and - for every group - the melody which is repeated most often (including repeats of its close variations) is selected as the group’s representative. After identifying the paradigmatic elements of each piece, they are analysed using the groups of prototypes and the representative melodies found by the new method. The melodic structure of a given piece is principally considered to be the whole collection of the prototype melodies. Therefore the melodic structure is extracted and shown based on this idea as a vector each element of which corresponds to each length of melodic segments separately. This vector is called the prominence profile of the given piece. Since only equal length melodic segments and their similarities have been considered, the sub- and super-segment relationships of melodies cannot be identified by the mechanisms used for calculating the similarity relationships of equal length melodic segments. Therefore the model was extended so that these relationships can be considered as well (1). The similarities of equal length melodies, and the relative places, where these melodic segments appear within a given piece underlie the recognition of the sub- and/or super-segment relationships. The principle of identifying the longest repeated passages by Ruwet (4) was also utilised to identify the melodic segments (3), which do not appear to be as important as some other melodies from a music theoretical viewpoint. These melodic segments are classified to be not significant for the analysis of a given piece, even-though they had been identified as prototypes during the first step of the analysis process. Hence, in the current step, they are detected to be not significant, and eliminated from the list of significant melodic segments.
2 Xenakis – Keren The similarity neighbourhood model was tested on a solo trombone piece called Keren, composed in 1986 by Iannis Xenakis. The results of the similarity neighbourhood model will be presented in a music-theoretical context, combined with a music theoretical analysis, in order to easily observe the music-theoretical relevance of the results. Keren is a solo trombone piece. Although it is a solo piece, there are still some polyphonic measures, where two tones are played at the same time. Besides, it is not a tonal piece. Hence, there are no classical harmonic movements, like cadences, and no counterpoint. As a result, this piece does not have melodic segments, which are repeated a certain number of times, in the classical sense. This makes the analysis process different than analysing a Baroque piece for instance. Hence another approach should be applied to analyse the piece. Therefore, the results obtained by the similarity neighbourhood model, are going to be interpreted in a slightly different way, in order to understand the piece better. The prominence profile in Figure 1 shows the prominence values for each length of prototypes, the weighted linkage-clusters and the linkage-clusters with maximum number of prototypes for Keren. The clusters are weighted by multiplying the number of prototypes contained within the biggest cluster with the length of the melodies contained.
222
K. Adilo˘glu and G. Ada Tanir
Fig. 1. The Prominence Profile of Keren is shown: Missing prominence values indicate that no prototypes (hence no linkage-clusters) exist for the particular length. Parameters were c1 = 0.075, c2 = 1.0.
The longest prototype, which can be identified by the model is 49 notes long. As it can easily be seen on the profile, the prominence values of the longer melodic segments are remarkably less than the prominence values of the short melodic segments. This fact indicates that Keren is not a melodic piece in the sense that melodies are repeated and subdivided into sub melodies, which are also repeated within the piece a lot of times. Instead of this melodic decomposition, Keren shows that the generally shorter segments play a more important role in the piece, and especially where these melodic segments appear, and where they do not appear is important to understand the structure of the piece. Figure 2 shows a comparison between the prominence profiles obtained before and after the reduction. Generally reduction process reduces the number of prototypes identified by the model. The remaining prototypes after reduction are the melodic segments, for which it is more likely that they play an important role within the piece. As one can see in this figure, the reduced profiles contain a little less number of prototypes than before reduction. However there is one interesting length, for which the prominence value does not decrease after the reduction. These melodic segments deserve more attention. Furthermore, the weak and strong reductions differ for short melodic segments. However as the length of the melodic segments increases, the difference disappears. In general, the piece can be considered in five parts. The first part, namely introduction (A), beginning with the first measure until the middle of measure 23, is quite
Computer Aided Analysis of Xenakis-Keren
223
Fig. 2. The Comparison of the Prominence Profiles with and Without Reduction of Keren is shown: Missing prominence values indicate that no prototypes (hence no linkage-clusters) exist for the particular length. Parameters were c1 = 0.075, c2 = 1.0.
variable. There are quite a number of large intervals and prolonged notes. Furthermore, some long downwards movements can be observed as well. Hence this part is not a melodic part in a classical sense. However towards the end of this part one can see some movements of sixteenth and thirty-second notes. These movements lead the piece to the second part (B), which starts with the middle of measure 23, after the prolonged half notes chord, until the end of measure 29. This part is shaped quite rhythmic. Mainly thirty-second notes dominates this part until the end of measure 28. Therefore it can also be said that this part ends at the end of measure 28 as well. In the second part, the intervals also become smaller, and several intervals are repeated interchangeably. Hence it is possible to interpret these interchanging interval series as melodic movements. The second part is followed by a short changeover (C), which ends in measure 36, after the second sixteenth note. After the changeover, which corresponds to the third part, the fourth part of the piece begins. There is a surprising similarity between the second part and the fourth part, considering the rhythm and the melody. The fourth part ends at the end of measure 42, where the thirty-second notes end. The fifth part starts with measure 43, and continues until the end of the piece in measure 63. However, measures 45, 46 and 61, 62, 63 depict a similar movement, when the starting and end notes of the movements in these measures are considered. In measures 45 and 46, an increasing movement starts with an E and ends with a B at the beginning of measure 46. Measures 61, 62, 63 depict a decreasing movement from the B to the E, where the
224
K. Adilo˘glu and G. Ada Tanir
increasing movement in measures 45 and 46 ends and begins respectively. Hence, measure 45 can be identified as the beginning of the fifth part as well. This fifth part depicts a similar manner to the first part (A), however using different movements, and intensive usage of micro-tonality. Hence it is not possible to find some interval-based similarities between these two parts. However the prolongations, downwards and upwards are common. Considering this brief summary of the piece, the formal appearance of the piece is (A-B-C-B-A). This pattern indicates that the piece was composed as a retrograde, in general, because the form is a reflection of the first two parts (A-B) around the third part (C). The first part of the piece is the variable part. However, this part has naturally a structure, which is based mainly on the tritone interval. Especially the first movement, which continues until the first point d’orgue has a tone material of a tritone. The span of this movement is shown in Figure 3. Furthermore, Xenakis used the tritone interval also as a single interval between two consecutive notes. Some examples of the usage of the tritone interval within this first part is shown in Figure 4.
Fig. 3. The interval between the highest and the lowest note in the first movement of the piece is a tritone
In Figure 4 at the begininng of the second page (See also Figure 8), it can be seen that the consecutive notes fall apart of each other for a short period. After several notes, they come again closer. This part indicates a hidden two-voiceness within the piece, because the distance between the consecutive notes makes it impossible to build a melodic line. On the other hand, each other note generates a melody together. This movement prepares the second part of the piece, which is composed to be rhythmic. The similarity results of the model are concentrated mainly on the second and fourth parts (B), which are repeated around the third part (C). This observation indicates the relation between the second and fourth parts, which are both called B. The prominence profile without the reduction stresses melodies of length 17. However the reduced prominence profiles indicate the length of 15 notes, and absorb the other length. Therefore the first melody that I will focus on, is of length 15, and is located at the beginning of the second part. This melody is shown in Figure 5. The second part of the piece is mainly comprised of small intervals upwards and downwards interchangeably. The melody in Figure 5 depicts this behaviour very well. This melody is repeated 14 times, mainly in the second and fourth parts of the piece as expected. Some repetitions even overlap each other, which is due to these small interval jumps. If the melody (Figure 5) is further decomposed, another representative melody can be observed. This sub-segment is simply the first part of the longer melody. In fact, the longer melody is composed of this sub-segment and its repetition. Hence, the subsegment that I want to indicate is shown in Figure 6. This subsegment appears not only in the second part of the piece but also in the fourth part, as in the case of the melody
Computer Aided Analysis of Xenakis-Keren
225
Fig. 4. Tritone intervals in the first part are shown in the rectangles
Fig. 5. Bar 23 of Keren is shown. The rectangle highlights the representative melody of length 16.
Fig. 6. Bar 23 of Keren is shown. The rectangle highlights the representative melody of length 8.
shown in Figure 5. Hence, these findings support the idea that the second and fourth parts are related to each other. Another representative melody, which appears in the fourth part, is 22 notes long. This melody is interesting in that it keeps the B 10 times, and interchanges downwards jumps to F- and G. By means of this composition style, the composer creates a twovoiceness in a solo piece. If the melody shown in Figure 7 is viewed horizontally, two melodies can be recognised. The first melody is the so called B pedal, and the second melody is composed of the interchanging F-’s and G’s. These kind of composition techniques can be seen by contrapuntal pieces quite often. Especially Bach used this technique in his pieces to obtain this latent two-voiceness effect.
226
K. Adilo˘glu and G. Ada Tanir
Fig. 7. Bar 39 of Keren is shown. The rectangle highlights the representative melody of length 22.
Fig. 8. Bar 16 of Keren is shown. The rectangle highlights the representative melody of length 22.
The representative melody shown in Figure 7 appears in one other place in the piece, which is especially interesting. This place is in the sixteenth measure, which is shown in Figure 8. It is easy to see in this figure as well, how these two melodies created by the same composition technique. However, in the melody shown in Figure 8, a pedal note is not used. Instead of that both the upper and the lower melody do a movement of interchanging minor seconds. This appearance prepares the second part of the piece. The piece develops quite irregularly so far from melodic as well as rhythmic point of view, until this melody appears. After six measures, namely at the end of the twentysecond measure, the second part of the piece begins. Another appearance of the melody in Figure 7 is the very beginning of the second part of the piece, which can be partly seen in Figure 5. The melody appears several times at this part of the piece even in an overlapping fashion. The fourth part of the piece ends with a melody, which depicts the hidden twovoiceness as well as the melody shown in Figure 8, and indicated in the analysis of the first part of the piece. As in the preparation of the second part, consecutive notes become separated apart from each other, so that every other note create a melodic line. Therefore, it can be claimed that the fourth part ends with this hidden two-voiceness, which prepares the fifth part of the piece (See Figure 9). The fifth part is the closing part of the piece. This part is another variable part of the piece. Besides, it emphasises the tritone interval as well as the first part. The tritone interval appears as a single interval, and also as the spanning interval of a sequence of notes. In the measures 47 and 48, the starting and ending notes of the increasing passage, beginning with a B at the beginning of the measure 47 and ending with the F in the middle of measure 48, the second beat, span a tritone (See Figure 10). Within the whole increasing passage, there are other shorter spanning tritones, like the first four notes until the comma B-F, then until the next comma E-A, etc. There are also tritone intervals between two consecutive notes in these two measures. Some of these
Computer Aided Analysis of Xenakis-Keren
227
Fig. 9. This figure shows the parts of the piece, where the hidden two-voiceness can be observed. These two parts are the preparation of the second part in the first part of the piece, and the end of the fourth part.
Fig. 10. The spanning tritone interval in measures 47 and 48 is shown
places, where the tritone intervals appear in the whole fifth part can be seen in Figure 11. Considering these features, it can easily be said that the first part and the fifth part are similar to each other. However there is a significant difference between these two parts of the piece. In the fifth part, the composer used many micro-tonal intervals, which we do not see in the first part of the piece, except for a couple of times, and these appearances are not significant. The two length values, for which the prominence values of the reduced profiles do not decrease are 42 and 49. The melodies corresponding to those two length values are respectively at the beginning of the second part of the piece and in the middle of the fourth part. The 42 notes long melody contains the melody shown in Figure 6. This melody is at the very beginning of the second part of the piece, and appears only two times in the piece, both of which are in the second part, and they overlap. The other melody, namely the 49 notes long melody appears in the middle of the fourth part of the piece. It also appears two times, and these two appearances also overlap. These two melodies have not reduced by the model, because longer melodies do not contain them.
228
K. Adilo˘glu and G. Ada Tanir
Fig. 11. The appearances of the tritone interval are shown
However, the overlapping of them can be considered that they are parts of the same longer melody respectively. Hence, the 42 notes long melody indicates the beginning of the second part. In the same way, the 49 notes long melody indicates the end of the fourth part. The similarity neighbourhood model has not been able to find melodic similarities in the first, third, and the fifth part of the piece. It only shows the similarities between the melodies in the second and fourth part. Similarities between these melodies indicate the relationship between these two parts. In other words, these similarities help to divide the piece into five parts. Another feature, which is used to divide the piece into five parts is the tritone intervals, which appear in the first and fifth parts. These two features support the form analysis (A-B-C-B-A) of the piece. However, the same pattern of those parts can be interpreted in a slightly different way. The middle part of the piece, namely (B-C-B) part, can also be considered as a single part. The small changeover part (C) between two B parts can be regarded as a short subpart, and hence as a part of the whole part B. In this way, the formal segmentation of the piece can be seen as (A-B-A). The measures corresponding to this segments are 1-22, 23-43, 44-63 respectively, which is an almost equal distribution of the measures, each part is approximately 20 measures long. This distribution of the measures is a support for the three-part-segmentation of the piece. In this case, we have only (A-B-A) form for this piece.
Computer Aided Analysis of Xenakis-Keren
229
The result of the similarity neighbourhood model indicates the places, where similar melodies appear. In the present paper, these places are considered to segment the piece into separate parts. In summary, using this strategy, where the similarities occur, and where they do not occur, help to decide on the parts of the piece. As a consequence, although Keren is an atonal piece, the model can help to identify the form of the piece, by interpreting the results in a different way.
References [1] Adiloglu, K., Obermayer, K.: Finding Subsequences of Melodies in Musical Pieces. In: Proceedings of ICMC, Barcelona, Spain (2005) [2] Adiloglu, K., Noll, T., Obermayer, K.: A Paradigmatic Approach to Extract the Melodic Structure of a Musical Piece. J. of New Mus. Res., 221–236 (2006) [3] Adiloglu, K., Obermayer, K.: A Reduction Method for the Paradigmatic Melodic Analysis. In: Mathematics and Computation in Music, Berlin, Germany (2006) (in print) [4] Ruwet, N.: Methods of Analysis in Musicology. Music Analysis 6, 11–36 (1987)
Automated Extraction of Motivic Patterns and Application to the Analysis of Debussy’s Syrinx Olivier Lartillot Finnish Centre of Excellence in Interdisciplinary Music Research, University of Jyväskylä, Finland [email protected]
Abstract. A methodology for automated extraction of repeated patterns in discrete time series data is presented, dedicated to the discovery of musical motives in symbolic music representations. The basic principle of the approach consists in a search for closed patterns in a multi-dimensional parametric space, comprising various features related to melodic and rhythmic aspects, which can be organized into note-based and interval-based descriptions. The pattern description is further reduced through a lossless pruning of the sequence description. This requires in particular a detailed estimation of the specificity relations between patterns. For instance, a pattern is more specific than its suffix, and a melodic-rhythmic pattern is more specific than its rhythmic component. A notion of cyclic pattern is introduced, enabling an adapted filtering of a different form of combinatorial redundancy caused by successive repetitions of patterns. The use of cyclic patterns implies a necessary chronological scanning of the musical sequence. The resulting algorithm offers compact motivic analyses of simple monodies. As an illustration of the analytic capabilities of the computational system, a complete analysis of Debussy’s Syrinx is presented.
1 General Framework 1.1
Motivic Pattern Extraction
The objective of this study is to design a computational system for automated motivic analysis. The basic principle consists of an unsupervised detection of repeated motives, i.e. identifying several short extracts or subsequences as instances, or occurrences, of a same series of description called motivic pattern. The approach is focused here on monodic sequences: music is considered as a series of notes without superpositions. Patterns are formalised as chains of states – called pattern chains. As patterns can accept multiple alternative continuations, patterns chains may be extended by multiple branches. Hence, patterns are aggregated into one single tree, called pattern tree, and each pattern chain is a branch of the tree. Similarly, each pattern occurrence can accept multiple alternative continuations. Hence, the set of all pattern occurrences that are initiated by one note forms a tree, called pattern occurrence tree. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 230–239, 2009. © Springer-Verlag Berlin Heidelberg 2009
Automated Extraction of Motivic Patterns
1.2
231
Musical Dimensions
Patterns are detected along multiple musical dimensions1, are indicated below. See (Lartillot and Toiviainen 2007) for a detailed explanation of the dimensions.
Fig. 1. Descriptions of the beginning of Syrinx. Repeated sequences of values, forming patterns, are enclosed in boxes.
The musical dimensions are grouped into two distinct categories: − −
note-based descriptions – pitch, register, position and sub-position – are associated to each note individually. interval-based descriptions – diatonic and chromatic intervals, gross contour and duration (or more precisely inter-onset interval) – are related to intervals between successive notes.
1.3 Matching Strategy In a fuzzy definition of pattern matching, a numerical distance is defined, and a matching is made when the similarity distance is lower than a pre-specified threshold. Yet no heuristic for precisely fixing this value has been proposed. Hence, the determination of the threshold value relies entirely on the user’s intuitive choices. Another solution consists of restricting more simply to exact matching along multiple musical dimensions (Conklin and Anagnostopoulou 2001). We propose a generalisation of this multiple viewpoint approach that allows some variability in the set of musical dimensions used during the construction of each musical pattern. This enables us to consider a more general type of pattern, called heterogeneous pattern, 1
The purpose of this study is not to reduce the multidimensional data sets to lower dimensions, but to conduct on the contrary very detailed motivic analyses. The dimensions considered in this paper do not result hence from any statistical pre-processing such as Principal Component Analysis.
232
O. Lartillot
that despite its structural complexity seems to catch an important aspect of musical structure. Interesting examples of heterogeneous patterns can be found in Debussy’s Syrinx, as shown in the next section. For instance, motive a features a complete melodic-rhythmic description of its first bar (exactly repeated in the two occurrences), and a solely rhythmic description of the reminder of the pattern (one dotted eighthnote and two 32th-notes, associated to different melodic lines in each occurrence). 1.4 Analysis of Debussy’s Syrinx Before entering into technical details related to the combinatorial explosion of the motivic pattern configurations in the multi-dimensional parametric space, we would like to exemplify the concept with an analysis of Debussy’s Syrinx generated by the algorithm.2 Figure 2 shows the analysis of the beginning of the piece. The piece begins with a two-bar phrase (a) repeated twice, where the second occurrence is further developed into a new phrase (b) repeated twice as well. This overlapping of phrase repetitions is a common trend in Debussy’s composing style. Phrase a is composed of a melodic-rhythmic cell (e) that is transformed little by little, forming the variants e’, e’’ and e’’’, and leading to a new melodic cell (f) used throughout phrase b. Again, the second occurrence of phrase b ends with a new development (h) that is repeated a second time just afterwards. Then, once again, a transposition one octave down of phrase a (a’) is repeated twice and its second occurrence shows an ending (c) that is repeated a second time afterwards. Phrase c is composed of a melodic scale (i) and a short cell (g). Subsequently, a new melodic-rhythmic cell k is repeated twice, and followed by successive repetitions of another cell l and its variants l’ and l’’. The last occurrence initiates a new two-bar phrase (d), repeated twice. Phrase d could actually be divided into two symmetric halves. However, due to the ornamentations, the algorithm cannot detect this internal symmetry. The similarly improvisational development goes on with a repetition of two phrases (o and p), but this repetition is not detected by the algorithm. Then finally, the initial theme (a) comes back four times: in its first reoccurrence, the first note is elongated (a’’), whereas the last two occurrences show a new rhythm (a’’’), and is concluded by repetitions of a rhythmic variant of cell p (p’), and a final descending scale q.
2 Controlling the Combinatorial Redundancy 2.1 Maximal Patterns and Closed Patterns The core pattern extraction process results in a large set of candidates, which, due to its large size and its poor quality, does not offer direct interest for musicology or music information retrieval. In order to control the combinatory explosion, filtering heuristics are generally added that select a sub-class of the result based on global criteria such as pattern length, pattern frequency (within a piece or among different pieces), etc.3 The main limitation of this method comes from the lack of selectivity of these global criteria. Another approach is based on the search for maximal patterns, i.e. patterns that are not included in any other pattern (Zaki 2005; Agrawal and Srikant 1995). This approach still leads to an excessive filtering of important 2 3
A more detailed analysis of the piece is presented in (Lartillot in press). See (Lartillot and Toiviainen 2007) for a complete review.
Automated Extraction of Motivic Patterns
Fig. 2. Motivic analyis of Debussy’s Syrinx
233
234
O. Lartillot
structures. Closed patterns, finally, are patterns whose support is higher than the support of the pattern in which they are included (Zaki 2005). A filtering of nonclosed patterns ensures a compact representation of the pattern configuration without any loss of information. 2.2 Multidimensionality of Music A musical application of the closed pattern paradigm requires taking the parametric multidimensionality into account. For instance a precise description of Syrinx’ motive a is not evident. In particular, as shown in Figure 3, some of the occurrences of motif a share identical partial descriptions, annotated in plain lines on the score. And in the reverse, certain particular occurrences do not share the descriptions (represented in dotted lines in Figure 3) shared by all the other occurrences.
Fig. 3. List of occurrences of motive a. Plain lines indicate features shared with other occurrence(s) whereas dotted lines indicate characteristics specific to the current occurrence. The subclasses related to each occurrence, as defined in Figure 4 are indicated on the right.
Automated Extraction of Motivic Patterns
235
There exists a combinatorial set of possible descriptions of the occurrences of motive a, such that the different variants would be represented explicitly. Hopefully, there exists an optimal and compact set of descriptions containing the complete pattern configurations. This compact taxonomy of motive a, represented in Figure 4, consists of the whole set of closed patterns, defined here in the multi-dimensional melodic-rhythmic space. For clarity reasons, only three dimensions are indicated: pitch, octave register, and rhythmic values.
Fig. 4. Description of motive a subclasses. Annotations introduced in Figure 3 are repeated here.
2.3 Formal Concept – Representation of Patterns More formally, the generalization of the closed pattern paradigm to the multidimensionality of music can be expressed using the terminology introduced by formal concept analysis (Ganter and Wille 1999). The pattern description of a sequence of notes S is expressed as a formal context (N(S), , I) (Ganter and Wille 1999) where:
236
O. Lartillot
− − −
the set of objects is N(S): the set of notes in S, the set of attributes is : the set of elementary musical descriptions of all the intervals preceding each note, and I is the binary relation between N(S) and , called incidence, defined by: (ni, ) belongs to I if and only if the description is correct.
The derived description C' of a set of notes C from N(S) is defined as the common description of all these notes: C' = { in , such that, for all n in C, (n, ) belongs to I} The derived class D' of a complex description D of is dually defined as the set of notes complying with this description: D' = {n in N(S), such that, for all in D, (n, ) belongs to I} The pattern discovery task consists in finding exhaustive class D' sharing a same description D. The trouble is, lots of different descriptions Di may lead to same classes Di'. These operations based on derivators establish a Galois connection between the power set lattices on N(S) and (Ganter and Wille 1999). The Galois connection leads to a dual isomorphism between two closure systems, whose elements, called formal concepts of the formal context (S(S), , I) corresponds exactly to the close patterns P=(C,D), verifying: C in N(S), D in , C'=D and D'=C. For a close pattern P=(C, D), C is called the extent of D and D the intent of C. We may simply call C and D respectively the class and the description of P. Hence, for a set of patterns Pi = (Di’, Di) of same class Di’ = C, the close pattern P=(C, D) is described using the derived operator C' defined in the above equation: it contains all the elementary descriptions common to all notes of the class C. In other words, closed patterns are described as precisely as possible. 2.4 Specificity Relations Closed patterns, or formal concepts, are naturally ordered by the subconceptsuperconcept relation (C1, D1) < (C2, D2) corresponding to an inclusion of C1 into C2 or, equivalently, an inclusion of D2 into D1 (Ganter and Wille 1999). This subconceptsuperconcept relation can also be called specificity relation: (C1, D1) is more specific than (C2, D2). For instance, in Figure 4, pattern a1 is more specific than pattern a1’. These specificity relations can be drawn directly between patterns in the pattern tree, forming a directed edge from the node related to the less specific pattern to the node related to the more specific one. The set of all these edges form a specificity graph called pattern specificity graph. These specificity relations can also be applied to pattern occurrences. The algorithm implemented in our model (Lartillot 2005) is funded on a single chronological pass through the whole musical sequence. Hence, for each successive note ni, all the occurrences it comprises are considered together. This set of pattern
Automated Extraction of Motivic Patterns
237
occurrences can be ordered along the specificity relation, by copying the subset of the pattern specificity graph associated with the patterns related to these occurrences. For each successive note ni, the specificity graph hence constructed is called pattern occurrence specificity graph. 2.5 Cyclic Patterns Combinatory explosions can be caused by successive repetitions of the pattern itself (Cambouropoulos 1998). See Figure 5 for an illustration taken from Syrinx. The redundancy problem induced by the periodicity of patterns can be resolved in a simple and efficient manner by directly integrating a concept of cyclic pattern in the multidimensional closed pattern discovery framework. This requires a generalisation of specificity relations to cyclic patterns, and an adaptation of the concept of cyclic pattern to the multidimensionality of musical parameters. Figure 6 shows the taxonomy of the cyclic pattern l from Syrinx.
Fig. 5. The lower table describes this excerpt of Syrinx along various musical dimensions. Occurrences of motive l are highlighted by plain brackets above the score. The description of motive l is shown on the upper table in bold. The successive repetition of motive l logically implies an extension of the motive (in italics in the upper table, occurrences highlighted by dotted brackets over the score).
238
O. Lartillot
Fig. 6. Subclasses of motive l, and their occurrences in the score. Arrows indicate specificity relations: for instance, l1’ is more specific than l1, and so on.
3 From Monody to Polyphony Our previous research was focused on the extraction of repeated sequences of contiguous descriptions from a single chain of multi-dimensional descriptions. In order to take into account more complex musical transformations, it is necessary to generalize the problem and furthermore to tolerate particularly note insertions and deletions. For this purpose, the initial chain of descriptions – commonly called syntagmatic chain in linguistics – is transformed into a syntagmatic graph showing all the possible connections (or syntagmatic relations) between neighbouring notes. New patterns are formed through a progressive traversal of the syntagmatic graphs. We plan to generalize our approach to polyphony following the syntagmatic graph principle. We are developing algorithms that are able to construct, from polyphony, syntagmatic chains that represent distinct monodic streams. These chains may be intertwined, forming complex graphs to that the pattern discovery algorithm will subsequently be applied. The additional factors of combinatorial explosion resulting from this generalized framework will require further adaptive filtering mechanisms. Patterns of chords may also be considered in future research.
References Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, pp. 3–4. IEEE Computer Society, Los Alamitos (1995)
Automated Extraction of Motivic Patterns
239
Cambouropoulos, E.: Towards a general computational theory of musical structure. Ph.D diss., University of Edinburgh (1998) Conklin, D., Anagnostopoulou, C.: Representation and discovery of multiple viewpoint patterns. In: Proceedings of the International Computer Music Conference, Cuba, pp. 479– 485. International Computer Music Association (2001) Ganter, B., Wille, R.: Formal concept analysis: Mathematical foundations. Springer, Heidelberg (1999) Lartillot, O.: Multi-dimensional motivic pattern extraction founded on adaptive redundancy filtering. Journal of New Music Research 34(4), 375–393 (2005) Lartillot, O., Toiviainen, P.: Motivic matching strategies for automated pattern extraction. Musicae Scientiae Discussion Forum 4A, 281–314 (2007) Zaki, M.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering 17, 462–478 (2005) Lartillot, O.: Taxonomic categorisation of motivic patterns. Musicae Scientiae Discussion Forum 4B (in press)
Pitch Symmetry and Invariants in Webern's Sehr Schnell from Variations Op.27 Elaine Chew University of Southern California Viterbi School of Engineering, Epstein Department of Industrial and Systems Engineering, Hsieh Department of Electrical Engineering, Los Angeles, California, USA
Abstract. We use the Argus algorithm as outlined in (Chew 2005, 2006) to measure (single or clustered) pitch changes in Webern's Sehr schnell, the second piece in his Variations Op.27. In previous analyses employing the Argus algorithm, the computational results have been used to determine points of statistically significant change, which correspond to key or section changes. Instead of focussing only on the peaks (points of significant change), this paper considers symmetries and invariants revealed by the numerical results, paying particular attention to the stationary points as reflected by the zero values on the graph. These zero points signify places with identical or symmetric mappings of the pitch(es) in consecutive time windows. We analyze the results for small window sizes of one, two, and three eighth notes. The findings give rise to a pitch geometry map inside the Spiral Array, centered on the radius through pitch A, that explains Webern's pitch choices in Sehr schnell.
1 Introduction The Argus algorithm uses the Spiral Array (Chew 2000) pitch class representation, which consists of a helical organization of pitch class representations in threedimensional space such that pitch classes related by Perfect Fifths are adjacent to each other at each quarter turn of the spiral, and vertical neighbors are related by Major Thirds. The algorithm compares the pitch(es) in consecutive windows of time. The pitch(es) in each window are mapped to their corresponding representations in the Spiral Array, and the center of effect (c.e.), the aggregate position of the pitch classes weighted by their relative durations, calculated. Then, the algorithm computes the Euclidean distance between the c.e.'s from the two consecutive windows. The resulting number quantifies the distance the c.e. has moved. We illustrate the pitch c.e. distance measure with two examples, shown in Figure 1. In these examples, the window size has been set to one eighth note's duration, i.e. w = 1. In Example (a), each window contains one eighth note, {B } and {G#} respectively. In Example (b), each window contains three simultaneous notes for the full duration of one eighth note each: {F#, C, F} and {C#, F#, C}.
䅊
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 240–246, 2009. © Springer-Verlag Berlin Heidelberg 2009
Pitch Symmetry and Invariants in Webern's Sehr Schnell
241
(b)
(a)
Fig. 1. Two examples of pitch distance
The resulting pitch distances are shown geometrically in Figure 2. The distance between {B } and {G#} is 4.1633, which is a fair bit larger than the distance between {F#, C, F} and {C#, F#, C}, 0.9737. The distances are shown as bold line segments in Figures 2(a) and 2(b).
䅊
D!
D!
G!
G! B
E
B F!
C!
E
G C
G D
A
C
F
D"
(a)
D
A
E" A"
F!
C!
E" B"
A"
B"
F
D"
(b)
Fig. 2. Geometric interpretation of the two pitch distance examples
For analysis of Webern's Sehr schnell from his Variations Op.27, the input to the Argus program is the list of pitch names present at each eighth note. Grace notes are grouped with the eighth note with which they are associated. No number is reported if the set of pitches in one or more of the windows is the empty set.
2 w = One Eighth Note We have given a brief overview of the Argus algorithm with two illustrations. Now, let us consider the significance of the distances measured, especially when the window size is one eighth note long. The Spiral Array, like the harmonic network or tonnetz that pre-dates it and inspired its design, places in proximity pitches, and similar pitch clusters, related by (Perfect) Fifths and by (Major or Minor) Thirds. In addition to the inherited proximity of Fifths and Thirds, the three-dimensional organization of the Spiral Array also places pitches in neighboring triads (triads sharing two pitches) and in closely related tonalities (including compound combinations of Fifths and Thirds, such as Seconds) near one to another. Returning to Figures 1 and 2, (a) compares the pitches {B } and {G#}, ten Perfect Fifths, or three Major Thirds and two Perfect Fifths (or other combinations thereof),
䅊
242
E. Chew
apart from each other. However one slices it, the two are quite distant from each other by any measure, as reflected by their c.e. distance of 4.1633. In previous uses of the Argus method, the window sizes have been chosen to be as large as possible to reveal sectional boundaries. It is thus interesting to now examine the case when the window is as small as one eighth note. This approach is particularly appropriate for examining the pointillistic figures in Webern's Op.27. Figure 3 shows the pitch distances for all consecutive windows of one eighth note length for the entire piece (including all repeats.) The x-axis shows the index of the last eighth note in the left analysis window, just before the beginning of the right analysis window. For example, the y-axis value at x = 1 gives the c.e. distance between the pitch(es) in the first and the second eighth notes.
Fig. 3. Pitch distance measure for window size of one eighth note
The dotted vertical lines (in red) mark the section boundaries as indicated in the score by the repeat signs. This graph is rather sparse and sketch-like; however, the repeats in the piece (the first and second halves are each played twice) – and the [A][A][B][B] structure – are apparent in the graphical display. The high points are indicated by the dots (in red) at the peaks in the uppermost part of the chart. With the exception of the peaks at the middle of the [B] sections, these high points are all at 4.1633, marking the large distance between {B } and {G#}. {B } and {G#}, precedes the beginning of each section (in bars 1, 1' and 11, 11', using Lewin's convention of indicating the second time a bar is traversed by a prime (Lewin 1993), and its mirror, {G#} and {B } (in the middle of bars 5 and 5'), punctuates the middle of the [A] sections. The middle of the [B] sections are also punctuated by a distant interval, marked by the apex of the inverted ``v,'' the result of the comparison of {F#} and {D } (in the second half of bars 16 and 16'), with a slightly larger c.e. distance of 4.2583. Apart from the peaks in the graph, the other distinctive feature is the existence of stationary points where y = 0, which occur numerous times in the chart. These zero points mark the appearance of the repeated pitch classes {A}{A} and {G}{G}. The {A}{A}'s are the isolated dots on the zero line (corresponding to bars 2, 2', 9, 9', 13, 13', and 19, 19'), and the {G}{G}'s are the lower vertices of the ``v'' shapes (corresponding to bars 12 and 12'). Unlike the two {G}'s, the two {A}'s are not only repeated pitch classes, they are identical pitches. Hence, the isolated dots on the zero
䅊
䅊
䅊
䅊
Pitch Symmetry and Invariants in Webern's Sehr Schnell
243
line point out the only occurrences of repeated pitches and pitch classes, a clearly audible feature in the piece. One more feature stands out when one examines the graph in Figure 3. The vshaped contours in the graph – at the beginning of the [B] sections, and near the end of the [A] sections – highlight the inter-pitch distance symmetries in the four-note sequences, {C#}{F}{C}{F#} (in bars 10–11 and 10'–11') and {B}{G}{G}{B} (in bars 12–13 and 12'–13') respectively.
3 w = Two Eighth Notes Figure 4 documents the numerical results when the windows are two eighth notes long. As before, the [A][A][B][B] structure of the piece is apparent from the plot, as is the 4.1633 peak signifying the isolated (because it is preceded and followed by eighth note rests) {B }{G#} interval before each section boundary. The first occurrence of {B }{G#} does not show up in the chart because this implementation of the program only computes distances when both analysis windows are fully contained in the score.
䅊
䅊
5
4
3
2
1
0
Fig. 4. Pitch distance measure for window size of two eighth notes
Highlighted points on the zero line signify the times when the next two-eighth-note window generates exactly the same c.e. in the Spiral Array space as its preceding twoeighth-note window. Zero c.e. distance can occur when the note material is the same in the two consecutive windows. These stationary points are color-coded (from left to right): (green, green, red)2, (blue, green, green)2. As with the significant {B }{G#} interval, the audibly distinct repeated {A}'s in bars 2, 9, 13, and 19 (and 2', 9', 13', and 19') are also isolated (preceded and followed by eighth note rests). The {A}{A}'s form the reason for numerous zero's in the graph, indicated by the green points. The symmetric {B, G}{G, B} pattern at the beginning of the [B] sections (in bars 12 and 12') lead to two more zeros, shown in deep blue. Identical pitch sets in consecutive analysis windows is not the only reason for a zero c.e. distance value. For example, the zero points marked in red are the result of the comparisons of {A} (last note in bar 9 and 9') and {C#, F} (in bars 10 and 10'
䅊
244
E. Chew
respectively). Upon further examination, it is not surprising that {A} and {C#, F} should generate the same c.e.; after all, {A} is midpoint between {F} and {C#} in the Spiral Array space, a Major Third from each, as highlighted by the longer dotted line in Figure 6. Finally, a feature made prominent by its rarity in this chart is the significant gap at the end of the [B] sections, showing the unusually large gap in events created by the presence of the two quarter note rests in bars 21, 22 and 21', 22'.
4 w = Three Eighth Notes In Lewin's analysis (Lewin 1993), he refers to the overarching three-eighth-note grouping within the piece, in spite of the 2/4 time signature. Thus, it would be interesting to see if there are pitch patterns discernible at the three-eighth-note level. Figure 5 shows the c.e. comparison results for w = 3. As with the two-eighth-note window chart, a significant gap occurs near the end of the [B] sections, showing the relatively long silence preceding the section's end. 4
3
2
1
0 3
7
11
15
19
23
27
31
35
39
43
47
51
55
59
63
67
71
75
79
83
87
91
95
99 103 107 111 115 119 123 127 131 135 139 143 147 151 155 159 163 167 171 175
Fig. 5. Pitch distance measure for window size of three eighth notes
What is most surprising is the preponderance of stationary points, that so many different pitch sets map to the same c.e.'s in the Spiral Array. The zero points are color-coded (left-to-right): (red, black, red, purple)2, (blue)2. Table 1 summarizes the pitch set comparisons. Table 1. Pitch sets at stationary points
Color Code
Pitch Set 1
Pitch Set 2
red
{A, A}
{C#, F}
black
{E, D}
{G#, B }
purple
{C, F#}
{B , G#}
blue
{G, B}
{B, G}
䅊
䅊
Pitch Symmetry and Invariants in Webern's Sehr Schnell
245
The blue points compare {G, B} and {B, G} (at bars 12-13 and 12'–13'), a trivial match; this is the only case where the pitch sets are identical in the two analysis windows. We now examine the remaining cases. The red points show the comparison results for {A, A} and {C#, F}, the equivalence of which has been established in the previous section. The black points show the comparison results for {E, D} and {G#, B }, and the purple points that for {C, F#} and {B , G#}. The black zero points correspond to the comparisons of the pitch sets {E, D} (in bars 4 and 4') and {G#, B } (in bars 5 and 5'). {E} and {D} are two steps apart on the Spiral Array pitch class spiral, and hence are diagonally opposite each other (see Figure 6). {G#} is a Major Third up from {E}, and {B } is a Major Third down from {D}. As can be seen in Figure 6, {G#} and {B } are also diagonally opposite each other, but one Major Third step wider each way. The purple stationary points correspond to the comparisons of the pitch sets {C, F#} and {B , G#}. Now, {C} is a Major Second (or two Perfect Fifths) up from {B }; and, {F#} is a Major Second down from {G#}. Thus, stationary points can be caused by pitch pairs of the type {xa, xb}, {xa+I, xb–I}, where I is some interval, and the operator ``+'' indicates a step up by I, and ``– '' indicates a step down by I.
䅊
䅊
䅊
䅊
䅊
䅊
䅊
5 Center on A Returning to the Webern piece, the symmetries highlighted by the stationary points in the Argus analysis provide evidence for the carefully constructed pitch geometry as shown in Figure 6. The pitch symmetries collectively point to the importance of the pitch {A} in this composition. Each of the pairs of pitches {A, A}, {C#, F}, {B, G}, {G#, B },{E, D}, and {C, F#} are mirror images of each other about the axis {A}.
䅊
D! G!
B E
C!
F!
O G C
D
A E"
A"
F
B"
D"
Fig. 6. Pitch symmetries in Sehr schnell
246
E. Chew
䅊
Another center exists at the confluence of the line segments {G#, B }, {E, D}, and {C, F#} in the Spiral Array (shown as a white ball in Figure 6). This second center, marked ''O,'' is also the point at which a perpendicular from {A} touches the spine of the spiral. Additionally, the three line segments define a vertical plane separating {A} from {B, G}; each pair of pitches are also mirror images about A. If one draws a line from {A} through O, it would divide the line segment {B, G} into two equal halves. Further examination of the score reveals that each and every interval or pitch cluster pair, as grouped by the prevailing rests, is either centered on {A}, O, or the intersection of {A}O and {B, G}. Pitch pairs can be quickly verified on Figure 6. All grace notes are {xa+I, xb–I}-type extensions of their parent notes {xa, xb}, and thus generate the same c.e. Therefore, we only need to check the four existing pitch cluster pairs. One can quickly verify, by inspection, that the corresponding c.e. pairs each have midpoints at O, and that each cluster is an inversion of its mate about the pitch A. Some may argue that similar kinds of analyses may be derived from the onedimensional line of Fifths. An advantage of the three-dimensional Spiral Array pitch class representation is the spatial visualization of pitch (set) inversions about some axis of symmetry, in this case the {A}O line. The model also incorporates all the properties of the line of Fifths, with additional properties that result from the proximity of Thirds and other intervals.
Acknowledgements I am grateful to Jeanne Bamberger for our discussions on the important features of this Webern piece: the repeated {A}'s, the silences. This material is based upon work supported by the National Science Foundation (NSF) under grant No. 0347988. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author, and do not necessarily reflect the views of NSF.
References Chew, E.: Towards A Mathematical Model of Tonality. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA (2000) Chew, E.: Regards on Two Regards by Messiaen: Post-Tonal Music Segmentation Using Pitch Context Distances in the Spiral Array. Journal of New Music Research 34(4), 341–354 (2005) Chew, E.: Slicing it all ways: Mathematical models for tonal induction, approximation, and segmentation using the spiral array. INFORMS Journal on Computing 18(3), 305–320 (2006) Lewin, D.: A Metrical Problem in Webern’s Op. 27. Music Analysis 12(3), 343–354 (1993)
Computational Analysis Workshop: Comparing Four Approaches to Melodic Analysis Chantal Buteau1 , Kamil Adilo˘glu2, Olivier Lartillot3 , and Christina Anagnostopoulou4 1 Brock University [email protected] 2 TU-Berlin [email protected] 3 University of Jyv¨askyl¨a [email protected] 4 Universtiy of Athens [email protected]
Abstract. We compare four computational approaches of melodic analysis according to diverse approach aspects: input type (monophonic or polyphonic), pattern identification type (strict or similar), analysis segmentation, aim of approach, motivic pattern representation, and type of result representations. The considered four computational approaches are the following: a similarity neighbourhood approach by Adiloglu (Adiloglu and Obermayer 2006a,b), a multiple viewpoint representation and discovery approach by Anagnostopoulou (Anagnostopoulou, Share and Conklin 2006), a topological approach by Buteau (2005), and an approach based on multidimensional closed pattern mining by Lartillot (Lartillot and Toiviainen 2007).
1 Comparing Four Approaches to Melodic Analysis We briefly describe the comparison of four computational approaches of motivic analysis according to diverse approach aspects. The four computational approaches considered for the comparison are the following: (1) a similarity neighbourhood approach by Adiloglu [ADI] (Adiloglu and Obermayer 2006a; Adiloglu and Obermayer 2006b), (2) a multiple viewpoint representation and discovery approach by Anagnostopoulou [ANA] (Anagnostopoulou, Share and Conklin 2006; Conklin and Anagnostopoulou 2006), (3) a topological approach by Buteau [BUT] (2004 and 2005), and (4) an approach based on multidimensional closed pattern mining by Lartillot [LAR] (2008; Lartillot and Toiviainen 2007). The aim of this paper is to capture the most obvious differences and commonalities between the involved computational approaches to melodic and motivic analysis. It serves as a very first pragmatic approximation to the problem of comparative analysis. It does not yet address the conceptual basis for such an enterprise. A comparison of different definitions and associated data models for terms like ”polyphonic” is not subject of this short paper. For the following description, we refer the reader to the table below summarizing our comparison. We start our comparison with the music that can be analyzed by the T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 247–249, 2009. c Springer-Verlag Berlin Heidelberg 2009
248
C. Buteau et al. Table 1. Comparison of Four Computational Approaches of Melodic Analysis
Adiloglu Anagnostopoulou Approach Aspects [ADI] [ANA] monophonic music X polyphonic music X multi-dimensional pattern identity X similarity of patterns X contiguous motives only X using specific holes in motives X non contiguous motives predetermined segmentation X (rest) X no segmentation X aim: shortest significant motives X aim: largest significant pattern X X multi-dimensional pattern representation X in one analysis one-dimensional pattern representation X in several analyses linear reduction strategy (combinatorial issues) type of result representations numerical numerical
Buteau [BUT]
Lartillot [LAR] X
X X X X X X X
X X X X
X
several (2D and 3D)-graphic
X numerical
and other representations in OpenMusic
different approaches. [ADI and LAR] currently restrict their method to monophonic music, whereas [ANA and BUT] can also consider polyphonic music. Alternative approaches have been considered for the detection of motivic variations, either based on numerical similarity threshold along different successive musical parameters [ADI and BUT], or on identification within the multidimensional parametric space using statistical learning [ANA] or logical generalization inferences [LAR]. Another significant difference between the approaches is the collection of motives in the score that is analyzed: [ADI and LAR] only consider motives with consecutive notes, [ANA] also admits motives with some specific holes, and [BUT] in addition considers non-contiguous motives. Furthermore, the method can make use of a score segmentation [ADI, ANA, and BUT], whether this is manual or dictated by the score, or may produce a type of segmentation along with the analysis by discovering patterns [ANA and LAR]. The four approaches are based on different theoretical background from formal music analysis, e.g. Ruwet (1987) or R´eti (1951), and result in different purposes: the method may aim at identifying the shortest significant motives (for [ANA, BUT, and LAR]) and/or the largest significant motives (for [ADI, ANA, and LAR]). There is a significant aspect that differs in the approaches: an approach yields either one analysis considering multi-dimensional motivic pattern representations [LAR and ANA], or yields several analyses each considering one-dimensional pattern representation [ADI and BUT]. We mention that [LAR] uses a linear reduction strategy to deal with combinatorial issues. Finally, most of the implemented approaches [ADI, ANA, and LAR] yield numerical results, except for [BUT] which offers a diversity of result representations (Motivic Evolution Trees, weight graphs, dynamic clustering tables, and other spatial representations implemented in the software environment OpenMusic).
Computational Analysis Workshop
249
References Adiloglu, K., Obermayer, K.: A paradigmatic approach to extract the melodic structure of a musical piece. Journal of New Music Research 35(3), 221–236 (2006a) Adiloglu, K., Obermayer, K.: Melodic topologies. In: Proceedings of the International Computer Music Conference (2006b) Anagnostopoulou, C., Share, C., Conklin, D.: Xenakis’ Keren (1986): A computational semiotic analysis. In: Definitive Proceedings of the International Symposium Iannis Xenakis, Athens (2006) Buteau, C.: Motivic Spaces of Scores through RUBATO’s MeloTopRUBETTE. In: Lluis-Puebla, E., Mazzola, G., Noll, T. (eds.) Perspectives in Mathematical and Computational Music Theory, pp. 330–342. epOs-Music, Osnabr¨uck (2004) Buteau, C.: Topological Motive Spaces, and Mappings of Scores Motivic Evolution Trees. In: Fripertinger, H., Reich, L. (eds.) Grazer Mathematische Berichte, Proceedings of the Colloqium on Mathematical Music Theory, Graz, Austria, May 6-9, pp. 27–54 (2005) Conklin, D., Anagnostopoulou, C.: Discovery of segmental patterns in music. Informs Journal on Computing 18(3), 285–293 (2006) Lartillot, O., Toiviainen, P.: Motivic matching strategies for automated pattern extraction. Musicae Scientiae, Discussion Forum 4A, 281–314 (2007) Lartillot, O.: Automated extraction of motivic patterns and application to the analysis of Debussys Syrinx. In: Klouche, T., Noll, T. (eds.) MCM 2007. CCIS, vol. 37, pp. 247–249. Springer, Heidelberg (2009) R´eti, R.: The Thematic Process in Music. Greenwood Press, Connecticut (1951) Ruwet, N.: Methods of Analysis in Musicology. Music Analysis 6(1-2), 4–39 (1987)
Computer-Aided Investigation of Chord Vocabularies: Statistical Fingerprints of Mozart and Schubert Eva Ferková, Milan Zdímal, and Peter Sidlík Academy of Music and Drama Arts, Zochova 1, Bratislava, 81301 Slovakia [email protected]
Abstract. We introduce and demonstrate an original software tool for determination of some style fingerprint (from the point of view of harmony) of two world famous composers – W. A. Mozart and F. Schubert. The new version of the ANALYSIS software (previously CACH, Ferkova 1982) for automatic analysis of classical harmonic structures provides a powerful extension of its previous music data processing.1
Presentation The paper/presentation is divided into two parts 1. 2.
Theoretical topics on problems of algorithmisation of harmonic analysis and on not yet resolved problems (work in progress) of tonal analysis Trial of practical use of the first part of ANALYSIS software – automatic chordal analysis and its statistical-evaluation results, applied on set of Mozart’s and Schubert’s Piano sonatas.
In particular, we have innovated the software to be more suitable for users to allow for automatic MIDI input of classical music (Ferkova and Zdimal, 2004). ANALYSIS also supports statistical evaluation as an application of mathematical enumeration of appearances of style features in harmony. Among the statistical outputs there are (1) the number of appearances of certain chordal structures and their occurrence, (2) the appearance of couples and triples of chords.2 We briefly recall the problems when using MIDI data in harmonic analysis. This popular format for structured music-processing makes especially difficulty to create an algorithm for detection of tonal key. The detection of chordal structure is easy, some problems are in extracting of nonchordal (or melodical) tones. 1
2
All the algorithms and software are used in educational practice at Academy of Music Art in Bratislava Slovakia and the response from students seems very encouraging. For snapshorts of the graphical user interface see our online summary from the mcm2007. In next version of the program other types of information will be available: the number of appearances of cadences as specific progressions of harmonic functions, and the localization of certain keys in the global strucure of the whole composition.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 250–256, 2009. © Springer-Verlag Berlin Heidelberg 2009
Computer-Aided Investigation of Chord Vocabularies
251
Structure of the chord Type of the chord (name of the chord)
Sign of the
(in number of semitones
chord
from the root tone)
output)
4–3
Major triad
+
3–4
Minor triad
-
4–4
Augmented triad
++
3–3
Diminished triad
--
4 – 3 –3
Dominant seventh chord
D7 (Maj-7)
3–3–3
Diminished seventh chord
Dim7
3–3–4
Diminished/minor seventh chord
Dm7
4–3–4
Major seventh chord
Maj+7
3–4–3
Minor seventh chord
Min-7
4 – 4 –3
Augmented seventh chord
Aug+7
3–4–4
Minor/maj seventh chord
Min+7
(in
Fig. 1. Structures, types, names and signs of chords
But in tonal analysis (and successively in Functional, too) there are someproblems regarding the interpretation of MIDI-files. MIDI doesn’t support a distinction of the basic tonal key of a composition as the tonal key signature of sharps or flats on the beginning of the score (as it is possible in traditional scores). Every pitch is represented only by a number, there are no other information about tonal key signatures, accidentals or leading (altered) tones. The paper will address the question, how to enumerate and automatically evaluate manifestations of harmonic dynamics. We have proposed an approximating measuring tool for numeric expression of dynamic potentiality of harmonic structures, in accordance with Filip (1997). The statistical results will bring also a new look on this approximately measured dynamics in three levels – chordal, tonal and functional, as is proposed in Ferkova 1999. ANALYSIS is devided into three parts: 1. 2. 3.
Chordal analysis: Tonal analysis Functional analysis
The presentation will show results of using the first part of software – chordal analysis- on a set of Mozarts and Schuberts Piano Sonatas. The statistical enumeration
252
E. Ferková, M. Zdímal, and P. Sidlík
Fig. 2. Information window before processing analysis with button for opening window for analysis results
Fig. 3. Window of analysis-results, scores and 3 lines of analysis: 1. line –chordal signs, 2. line – tonal key, 3. line functional signs (J.S.Bach: Invention a minor)
Computer-Aided Investigation of Chord Vocabularies
253
Fig. 4. Analysis results of chordal analysis of a composition with more melodic (nonchordal) tones (W.A.Mozart: Piano Sonata KV 332, 2nd Movement)
is based on computing of probability of occurance of basic chords (see Figure 1), or not determined chords ( sign of it is X) or compound chords (other, than basic, but built up of thirds – asigned by CC). The table shows the structures of 11 chords, as are defined in Piston 1987, which are automathically detected and computed. RESULTS There are 16 influences on variables (single and couple of chords) - in percentage of frequencies - of Mozart-Schubert in the table 1. They are sorted according to Pearson’s parametric correlation from the strongest one in the top row to the weakest one in the bottom row. Their influences are higher than 1% level of signification. The positive values mean higher occurrence in Mozart‘s music, the negative ones are for higher occurrence in Schubert‘s music. For comparison, • • •
in the 2nd column there are Spearman‘s correlations, in the 3rd column - values of F parametric analysis of dispersion, n the 4th column - values of Z nonparametric analysis of dispersion – the socalled Mann-Whitney‘s test.
254
E. Ferková, M. Zdímal, and P. Sidlík Table 1.
Analysis of Pearson(R) Spearman(R) dispersion (F)
MannWhitney (Z)
1. /D7/
-0.47
-0.47
16.78
-3.60
2. /D7,D7/
-0.47
-0.46
16.78
-3.57
3. /Maj+7/
0.44
0.44
14.21
3.38
4. /--,Maj+7/
0.41
0.37
11.93
2.86
5. /--/
0.41
0.38
11.61
2.92
6. /+,Dm7/
-0.37
-0.43
9.21
-3.30
7. /Dm7/
-0.37
-0.35
8.97
-2.68
8. /Maj+7,Min-7/
0.36
0.33
8.87
2.55
9. /--,+/
0.36
0.29
8.82
2.25
10. /Maj+7,+/
0.35
0.33
8.26
2.53
11. /D7,Dim7/
-0.34
-0.37
7.62
-2.83
12. /--,--/
0.34
0.28
7.68
2.14
13. /-,Dm7/
-0.33
-0.39
7.06
-3.03
14. /-,--/
0.33
0.29
7.01
2.21
15. /Dm7,D7/
-0.30
-0.37
5.92
-2.83
16. /-,-/
-0.27
-0.34
4.68
-2.59
Arithmetic averages and medians of the same 16 categories of single and couples of chords are compared for two compositional styles – Mozarts and Schuberts in the table n. 2.
Computer-Aided Investigation of Chord Vocabularies
255
Table 2.
Average (median) Mozart (%)
1. /D7/
10.73 (10.74)
average (median) Schubert (%)
15.74 (15.89)
2. /D7,D7/
3.35 (2.68)
7.13 (5.95)
3. /Maj+7/
7.90 (7.11)
4.01 (3.38)
4. /--,Maj+7/
0.69 (0.34)
0.11 (0.00)
5. /--/
8.88 (6.45)
4.83 (4.05)
6. /+,Dm7/
0.04 (0.00)
0.47 (0.00)
7. /Dm7/
0.88 (0.90)
1.88 (1.68)
8. /Maj+7,Min-7/
0.60 (0.00)
0.06 (0.00)
9. /--,+/
1.84 (1.55)
0.83 (0.65)
10. /Maj+7,+/
2.85 (1.68)
1.12 (0.85)
11. /D7,Dim7/
0.19 (0.00)
0.59 (0.33)
12. /--,--/
4.03 (3.57)
1.87 (1.10)
13. /-,Dm7/
0.11 (0.00)
0.38 (0.21)
14. /-,--/
1.21 (0.76)
0.58 (0.29)
15. /Dm7,D7/
0.05 (0.00)
0.32 (0.00)
16. /-,-/
8.26 (6.16)
11.81 (9.66)
The statistical differences of both composers are evident: Mozart´s music is typical with higher occurrence of Major 7th chord (Maj+7) and diminished triad (--) both single and in couples.
256
E. Ferková, M. Zdímal, and P. Sidlík
Schubert´s music is typical with higher occurrence of Dominant 7th chord (D7) and diminished-minor 7th chord (Dm7) both single and in couples. CONCLUSION. These evident differences could be interpreted as possible differences between musical style of Vienna classicism and early romantism, or could be seen as a tendency of evolution of harmony in usage of certain types of chords, too.
References Ferková, E., Zdímal, M.: Computer Aided Harmonic Analysis from Midi Input. In: Conference on Interdisciplinary Musicology, pp. 76–77. University of Graz, Graz (2004) Ferková, E.: Computer Analysis of Classic Harmonic Structures. In: Selfridge-Field, E., Hewlet, W.B. (eds.) Computing in Musicology, pp. 85–89. CCARH, Menlo Park (1992) Ferková, E.: Harmonic -Tonal Motion as a Reflection of the Development of Musical Form. In: Diderot Forum on Mathematics and Music. ed..., pp. 169–177. Oestereichische Computer Gesellschaft, Vienna (1999) Filip, M.: Vyvinové zákonitosti zákonitosti klasickej harmónie. Národné Hudobné Centrum, Bratislava (1997) Hofstetter, F.T.: The Nationalist Fingerprint in Nineteenth Century Romantic Chamber Music. Language Resources and Evaluation 13(2), 105–119 (1979) Piston, W.: Harmony, 5th edn. W.W.Norton&Company, New York (1987); revised by Mark DeVoto
The Irrelative System in Tonal Harmony Miroslaw Majchrzak Institute of Art, Polish Academy of Sciences in Warsaw [email protected]
Abstract. This paper is addressed to music theorists and musicologists specialising in harmony topics. It presents a computational approach to the investigation of tonal structure in musical pieces. With the use of this analytical system, the quantitative prevalence of chords classified by ranges of a given key in a musical piece can be determined.
1 Introduction My paper concerns a statistical analytical method that can be carried out basing upon a harmonic material classification system which I refer to as the ‘Irrelative System in Tonal Harmony’. This analytical method consists in assignation of chords appearing in a piece of music to individual key ranges being keys in their respective natural variety. In harmonic analysis according to Hugo Riemann (1893) , a chord has a given harmonic function depending upon the key of a piece. (E.g.: the C major chord may act as the keynote for the key of C major or the dominant for F major.) Under the Irrelative System in Tonal Harmony, the C major chord is always classified under the key range 0 (zero), being the range for both the C major and A minor keys. Hence, the system is described as ‘irrelative’. Dominance of a given key over the others is one of the main criteria for determining the main key in a musical piece. The method of analysis enables to arrange in a hierarchical order a given set of keys under which chords have been classified versus the main key in which the piece is maintained. The key determining methods published (for example Krumhansl 1990, Temperley 2001, Chew 2006) appear to generate certain difficulties whenever a modulation occurs in the piece. As we carry out our analysis based on “The Irrelative System in Tonal Harmony”, let us embark on a holistic view on the tonal structure, taking into consideration all the harmonic processes in a musical piece, modulations included.
2 Algorithm Enabling Classification of Chords Using the analytical method in question, a diagram of tonal structure of a piece can be produced, such tonal structure being understood as a quantitative relation of key ranges for which specific chords have been classified. Let us mark the keys with the consecutive integers: the sharp keys with positive numbers, the flat keys – with negative numbers. The absolute value of the integer designates the number of accidentals in the key. For example the number (2) marks the keys of D major and B minor; the number (-1) – the keys of F major and D minor. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 257–265, 2009. © Springer-Verlag Berlin Heidelberg 2009
258
M. Majchrzak
For any tone, we can precisely determine the keys it appears in. For instance, the tone G appears in these keys: (2, 1, 0, -1, -2, -3, -4), whereas the tone G does not appear in e.g. -7, 4, or 5 keys. The successive columns in the table 1 represent: A - selected tones; B - the keys in which those tones appear. Table 1.
A … F sharp B A D
B ... (7, 6, 5, 4, 3, 2, 1) (6, 5, 4, 3, 2, 1, 0) (4, 3, 2, 1, 0, -1, -2) (3, 2, 1, 0, -1, -2, -3)
A C F B flat D flat …
B (1, 0, -1, -2, -3, -4, -5) (0, -1, -2, -3, -4, -5, -6) (-1, -2, -3, -4, -5, -6, -7) (-4, -5, -6, -7, -8, -9, -10) …
This is similarly so for any and each chord. For example, the tones of the A minor chord appear individually in the following keys: (4, 3, 2, 1, 0, -1, -2), (1, 0, -1, -2, -3, -4, -5), (5, 4, 3, 2, 1, 0, -1). Prior to making an attempt at classifying the chords, let us name the main related axiom: The entire approach is based on octave identification of all tones. The substratum for our chord classification is the arithmetic mean of keys wherein the tones of a given diatonic chord appear. As part of the Irrelative System of Tonal Harmony, we will define it using the following formula: arithmetic mean =
x1 + x2 + x3 + ... + xn n
where: x1, x2, x3, ..., xn are the keys wherein the tones of a given diatonic chord appear, and where n is the number of all key occurrences. Examples 1) E-G sharp: arithmetic mean =
(5 + 4 + 3 + 2 + 1 + 0 − 1) + (9 + 8 + 7 + 6 + 5 + 4 + 3) =4 7+7
2) G-B-D-F: arithmetic mean = (2 + 1 + 0 − 1 − 2 − 3 − 4) + (6 + 5 + 4 + 3 + 2 + 1 + 0) + (3 + 2 + 1 + 0 − 1 − 2 − 3) + (0 − 1 − 2 − 3 − 4 − 5 − 6) 7+7+7+7
= -0,25
The Irrelative System in Tonal Harmony
259
3) G-C sharp: arithmetic mean =
(2 + 1 + 0 − 1 − 2 − 3 − 4) + (8 + 7 + 6 + 5 + 4 + 3 + 2) =2 7+7
Remark. The idea to evaluate a chord by the arithmetic mean of the keys to which it belongs is generally applicable to various concepts of key. In the particular diatonic case, however where a key is a consecutive chain of fifths, the formula can be equivalently written as an expression of the chord tones t1, t2, …, tm alone: (t1 + t2 + …+ tm)/m –2. This Janus-faced constitution of our formula is worth to be discussed in a separate investigation. The particular role of the diatonic line of fifths in the present approach also offers a link to Chew’s (2002) spiral array, which deserves to be explored as well. Now, let us refer to all numeric values derivable from the above arithmetic-mean formula, as the ‘arithmetic mean space’. The arithmetic mean space is further divided into key ranges (KRs), each of which is a key range with a given number of clef signs. This will enable us to derive a statistical distribution series. The centre of each KR is the integer standing for the key, which is the center of this KR. Below, pictured is a detail of a distribution series:
a)
b)
-2
-2
-1
0
1
–1
0
1
2
2
where: a – center of the KR; b – KR. For example the key range of one-flat keys (F major, D minor) encompasses the arithmetic mean space’s open-ended range, spanning between -0,5 and –1,5. The values linking the two adjacent ranges belong both to the first and the second key range. E.g., 1,5 belongs to KR 1 and KR 2 (G major, E minor, and, D major, B minor). Let us take any chord, e.g. the C major chord, and calculate the arithmetic mean. The said chord, operating as the keynote or tonic in the C major key and is mostly identified with this particular key, has the arithmetic mean equalling –0,33…. It thus fits within the KR 0, being the key range for the C major and A minor keys. Let us also call ‘a ‘2KR chord’ any chord whose arithmetic mean belongs to two adjacent KRs. E.g., the arithmetic mean of the C-E-G-B chord is 0,5; the chord belongs to both KR 0 and KR 1.
260
M. Majchrzak
3 Chords In order to be able to fully understand and interpret the final outcomes, it is necessary to represent diatonic chords’ affiliation to individual ranges of a key. Table no. 2 discloses all the chords classified in the key range 0. Hence, the table’s chords appear without a transposition. An exception to the rule is those chords whose arithmetic Table 2.
Chord D C-E A-C (2KR) E-G (2KR) G-A C-B (2KR) F-E (2KR) B-F D-A (2KR) G-D (2KR) C-E-G A-C-E A-D-G B-D-F F-A-B G-B-F Bb-E-A G-C-F# C-D-A E-G-D C-G-B A-E-F C-D-E D-E-F C-D-B G-B-D-F B-D-F-A D-G-A-C (2KR) A-D-E-G (2KR) C-E-G-A F-A-C-E (2KR) C-E-G-B (2KR) C-D-E-A
Arithmetic mean 0 0 -0,5 0,5 0 0,5 -0,5 0 0,5 -0,5 -0,33... 0,33... 0 0 0,33... -0,33... -0,33... 0,33... -0,33... 0,33... 0 0 -0,33… -0,33… 0,33... -0,25 0,25 -0,5 0,5 0 -0,5 0,5 0,25
Key range 0 0 -1 and 0 0 and 1 0 0 and 1 -1 and 0 0 0 and 1 -1 and 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 and 0 0 and 1 0 -1 and 0 0 and 1 0
The Irrelative System in Tonal Harmony Table 2. (continued)
C-D-E-G G-C-D-F# Bb- D-E-A Bb-G-A-E (2KR) F-D-E-B (2KR) C-F-A-B F-G-B-E G-B-E-F C-D-G-B F-G-A-B C-E-F-B F-B-D-E C-D-F-B (2KR) G-A-C-F# (2KR) C-G-A-B F-G-A-E F-G-D-E (2KR) C-D-A-B (2KR) C-D-E-G-A B-D-F-G-A Bb-D-E-G-A C-D-F# -G-A C-D-E-F-A C-D-E-G-B C-E-F -A-B C-E-F-G-B G-B-D-E-F C-D-F-A-B C-F-G-A-B F-G-A-B-E C-D-E-F-B F-G-A-D-E C-E-F-G-B C-D-F-G-A-B F-G-A-B-D-E F-G-A-C-D-E (2KR) C-D-E-G-A-B (2KR) C-E-F-G-A-B F-A-B-C-D-E F-G-B-C-D-E C-D-E-F-G-A-B
-0,25 0,25 -0,25 -0,5 0,5 0 0,25 0,25 0 0 0 0,25 -0,5 0,5 0,25 -0,25 -0,5 0,5 0 0 -0,4 0,4 -0,4 0,4 0,2 -0,2 0,2 -0,2 -0,4 0,4 0 -0,2 -0,4 -0,33… 0,33... -0,5 0,5 0 0,16 -0,16 0
0 0 0 -1 and 0 0 and 1 0 0 0 0 0 0 0 -1 and 0 0 and 1 0 0 -1 and 0 0 and 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 and 0 0 and 1 0 0 0 0
261
262
M. Majchrzak
mean is located on the borderline of two key ranges. E.g., the arithmetic mean of the C-E-G-B chord is 0,5; the chord belongs to both KR 0 and KR 1. We can find one more chord of the same structure, belonging to KR 0 – namely, the one consisting of F-A-C-E tones, which belongs to both KR -1 and KR 0. In case any of the chords appears transposed, the KR will change accordingly. Table no. 3 depicts the affiliation of selected major chords to specific KRs. Table 3.
Chord
C major
D major
KR
0
2
A flat major -4
F major
B major
-1
5
Indicating the chords being affiliated to individual KRs, we can notice that while fulfilling the same harmonic functions, the chords may sometimes be distanced, at a varying degree, from the keynote range, conditional upon the mode. For instance, as for the C major key, the tonic is included in KR 0, and the dominant, in KR 1. Hence, the chords appear in the neighbouring KRs. In the case of A minor, the minor tonic appears in KR 0, the major dominant being classified under KR 4 (this being the E major chord, which may serve as the keynote for the E major key). However, description of the differences in classification of chords depending on the mode would require being studied on a separate basis.
4 Metrical Units Upon analysing a musical piece, we calculate arithmetic means for all the diatonic chords, and subsequently, insert the number of the chords’ metrical units in the distribution series. Whilst determining the number of KRs in a distribution series, the metrical units for individual rhythmic values should be determined, in any manner, so that chords with longer rhythmic values have a greater number than chords with shorter rhythmic values – on a proportional basis: For example: 1) If we consider the chords: G major – minim, C major – crotchet, F major – quaver, then the rhythmic values can be ascribed the following metric units: 4 – for the G major chord, 2 for the C major chord and 1 for the F major Chord, or 2) 8 – for the G major chord, 4 for the C major chord and 2 for the F major Chord. Table 4.
Chord G major C major F major
Rhythmic values minim crotchet quaver
Metrical units 4 or 8 2 or 4 1 or 2
The Irrelative System in Tonal Harmony
263
If a quaver triplet is added to the above-quoted example, then the rhythmic values can be ascribed for example the following metric units: 12 – for the G major chord, 6 for the C major chord, 3 for the F major Chord and 6 (2+2+2) for quaver triplet. Table 5.
Chord G major C major F major …
Rhythmic values minim crotchet quaver quaver triplet
Metrical units 12 6 3 6 (2+2+2)
In the event that several chords appear to be of the same duration time, then the same numerical values should be assigned to each. The number of rhythmic values of all 2KR chords is divided by 2. Thus, if the number of metrical units in the C-E-G chord is 2 for KR 0, then the number of metrical units in the C-E-G-B chord equals 1 for KR 0 and 1 for KR 11.
5 Record Table Having a ready-prepared distribution series, we can define the percentage domination rate for each of KRs against the background of other key ranges in a given piece of music. This is done by applying the formula describing the frequency of occurrence of a given characteristic: The frequency of occurrence of a given characteristic = ni / n where: ni – the number of metrical units of a given KR, n – number of sample (the sum of all the metrical units). Our analysis of pieces can be displayed in the form of record tables. The consecutive tables will namely comprise: 1) General information regarding the piece in question, incl.: the name of the composer, the title, the key, metrical units of selected rhythmic values, and, the number of all the metrical units; 2) The number of metrical units in the diatonic chords being subject to analysis – (column ‘+’); the sum of the number of metrical units in non-diatonic chords, metrical units of rests, and metrical units of melodic lines deprived of harmony, which are not taken into consideration for our analytic purposes (column ‘-’); the number of metrical units of non-diatonic chords (column ‘N-D‘); the number of metrical units of melodic runs deprived of harmony, and, the number of metrical units of rests (column ‘U/R’); The ( ) frequency is calculated as a quotient of the metrical units from a given column and of all the metrical units in a given piece. 1
Assuming that these chords have the same rhythmic values.
264
M. Majchrzak
3) Distribution series – the number (expressed in the form of metrical units) of chords from individual KRs and the frequency of their appearance against the background of other KRs; The ( ) frequency is also calculated as a quotient of the metrical units from a given column and of all the metrical units in a given piece. 4) Diagram of frequencies of metrical units of the distribution series. Example of record table Composer: Chopin Piece: Mazurka in A flat major, Op. 24 No. 3 Key: -4 (A flat major) Metrical units (m. u.): quaver = 6, crotchet = 12 Total number of metrical units: 5712
+ 3894 68,2%
m. u.
1818 31,8%
N-D 624 10,9%
U/R 1194 20,9%
CR m.u.
KR -7 24 0,42%
KR -6 324 5,67%
KR -5 600 10,5%
KR -4 2058 36%
CR m.u.
KR -3 216 3,78%
KR -2 0 0%
KR -1 96 1,68%
KR 0 576 10,1%
40 30 20 10 0 -7
-6
-5
Where - tonic key range, - other key ranges.
-4
-3
-2
-1
0
-
N-D U/R
The Irrelative System in Tonal Harmony
265
The analytical method enables to resolve the following questions concerning a given musical piece: 1) Quantitative relation of the tonic key range versus other ranges of the key; Occurrence of tonal centres, appearing at a distance from the tonic’s key range. 2) Occurrence of tonal centres, appearing at a distance from the tonic’s key range. Basing on the above tonal structure chart, the following information can be derived, inter alia: A distribution series is composed of eight KRs. The main range with regard to the frequency of appearance is KR -4, being the key range for A flat major and F minor keys, i.e. a key range being a counterpart of the Mazurka’s clef key. More than 35% of chords appearing in this Mazurka rank within KR -4. The latter is not situated in the center of the distribution series where three and four key ranges can be found, moving up and down the Circle of Fifths, respectively. The span of the key ranges is set by KR -7 (i.e. C flat major and A flat major) and KR 0 (i.e. C major and A minor). KR -4 is clearly dominant in the piece. As we go down the Circle of Fifths, further and further away from KR -4, the numerousness of the subsequent key ranges is successively reduced (except for KR -1 and KR 0). KR -5, the D flat major and F minor key range, is ranked second. The third rank is attached to the KR 0; this KR is much distanced from KR -4. This range comprises, among others, the C major chord which can be considered as the dominant for F minor key, i.e. the key parallel to A flat major.
Acknowledgement I thank Institute of Art, Polish Academy of Sciences in Warsaw for a conference grant.
References Chew, E.: The spiral array: An algorithm for determining key boundaries. In: Anagnostopoulou, C., Ferrand, M., Smaill, A. (eds.) ICMAI 2002. LNCS, vol. 2445, p. 18. Springer, Heidelberg (2002) Chew, E.: Slicing it all ways: Mathematical models for tonal induction, approximation and segmentation using the spiral array. Informs Journal on Computing 18(3) (2006) Krumhansl, C.L.: Cognitive Foundations of Musical Pitch. Oxford Psychology Series, vol. 17. Oxford University Press, Oxford (1990) Riemann, H.: Vereinfachte Harmonielehre oder die Lehre von der tonalen Funktionene der Akkorde, London, New York (1893) Temperley, D.: The Cognition of Basic Musical Structures. MIT Press, Cambridge (2001); Krumhansl, C.L.: Cognitive Foundations of Musical Pitch. Oxford Psychology Series, vol. 17. Oxford University Press, Oxford (1990)
Mathematics and the Twelve-Tone System: Past, Present, and Future* Robert Morris Eastman School of Music, University of Rochester [email protected]
1 Introduction Certainly the first major encounter of non-trivial mathematics and non-trivial music was in the conception and development of the twelve-tone system from the 1920s to the present. Although the twelve-tone system was formulated by Arnold Schoenberg, it was Milton Babbitt whose ample but non-professional background in mathematics made it possible for him to identify the links between the music of the SecondViennese school and a formal treatment of the system. To be sure, there were also important inroads in Europe as well,1 but these were not often marked by the clarity and rigor introduced by Babbitt in his series of seminal articles from 1955 to 1973 (Babbitt 1955, 1960, 1962, 1973). This paper has four parts. First, I will sketch a rational reconstruction of the twelve-tone system as composers and researchers applied mathematical terms, concepts, and tools to the composition and analysis of serial music. Second, I will identify some of the major trends in twelve-tone topics that have led up to the present. Third, I will give a very brief account of our present mathematical knowledge of the system and the state of this research. Fourth, I will suggest some future directions as well as provide some open questions and unproven conjectures. But before I can start, we need to have a working definition of what the twelve-tone system is, if only to make this paper’s topic manageable. Research into the system eventually inspired theorists to undertake formal research into other types of music structure; moreover of late, such research has now actually identified twelve-tone music and structure as special cases of much more general musical and mathematical models, so that for instance serial music and Riemannian tonal theory are both aptly modeled by group theory, rather than demarked as fundamentally different or even bi-unique. Thus I will provisionally define the twelve-tone system as the musical use of ordered sets of pitch-classes in the context of the twelve-pitch-class universe (or aggregate) under specified transformations that preserve intervals or other features of ordered-sets or partitions of the aggregate. Thus the row, while it once was thought to be the nexus of the system, is only one aspect of the whole—even if the row embraces all of the characteristics I’ve mentioned: the aggregate, ordered pc-sets and, not so obviously, partitions *
This paper was originally published in Perspectives of New Music 45(2): 76–107. The editors of these proceedings are grateful to the editors and publishers of Perspectives of New Music for their kind permission to reprint this article in this collection. 1 Luigi Verdi has documented some of this early European research (Verdi 2007). T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 266–288, 2009. © Springer-Verlag Berlin Heidelberg 2009
Mathematics and the Twelve-Tone System: Past, Present, and Future
267
of the aggregate. Thus an object treated by the twelve-tone system can be a series or cycle of any number of pitch-classes, with or without repetition or duplication, as well as multi-dimensional constructs such as arrays and networks, or sets of unordered sets that partition the aggregate.
2 The Introduction of Math into Twelve-Tone Music Research Schoenberg’s phrase, “The unity of musical space,” while subject to many interpretations, suggests that he was well aware of the symmetries of the system (Schoenberg 1975). In theoretic word and compositional deed he understood that there was a singular two-dimensional “space” in which his music lived—that is, the space of pitch and time. Indeed, the basic transformations of the row, Retrograde and Inversion, plus Retrograde-Inversion for closure (and P as the identity) were eventually shown to form a Klein four-group (see Example 1).
268
R. Morris
That this space is not destroyed or deformed under these operations gives it unity. Yet, from today’s standpoint, the details of this symmetry are quite unclear. What kinds of pitch-spaces? Pitch, or pitch-class, or merely contour? Is I mirror inversion or pitch-class inversion? Is RI a more complex operation than I or R alone? What about transposition’s interaction with the P, I, R, RI group? And so forth. The lack of clarity, which is actually more equivocal than I’ve mentioned (because there is no acknowledgement of the different conceptions of intervals between pitch entities), fostered misconceptions about the aural reality of the system on one hand and the justification of its application to structuring other so-called parameters of music on the other. Future research would correct this ambiguity, differentiating it into different musical spaces and entities. Schoenberg nor his students, or even the next generation of European serial composers ever addressed these questions. It was detailed analysis of the music of Schoenberg, Webern, and to some extent Berg, that led to clarity and rigor. The results of such studies beginning circa 1950 revealed that the first generation of twelvetone composers had principled reasons for deploying rows in music. First, the system itself was shown to preserve musical properties such as interval and interval-class; Babbitt (1960) called this twelve-tone invariance. Ex. 2. Twelve-tone invariance among ordered intervals in rows and pairs of rows. Let the 12-element array P model a row. The interval between Pa and Pb is written as the function i: i(Pa,Pb) = Pb – Pa; – i(Pa,Pb) is the inversion of i(Pa,Pb). – i(Pa,Pb) = i(Pb,Pa) i(TnPa,TnPb) = i(Pa,Pb) i(TnIPa,TnIPb) = i(Pb,Pa) Let the array INT(P) be the interval succession of P; INT(P) = (i(P0,P1), (iP1,P2),...i(PA,PB)) INT(TnP) = INT(P) INT(TnIP) = I(INT(P)) INT(RTnP) = R(INT(IP)) INT(RITnP) = R(INT(P)) i(Pa,TnPa) = n Pa + TnIPa = n i(Pa,RTnPa) = – i(PB-a, RTnPB-a)
(Tn preserves the interval succession of P, for all n.) (TnI inverts of the interval succession of P, for all n.) (RTn inverts and retrogrades the interval succession of P, for all n.) (RTnI retrogrades the interval succession of P, for all n.) (The interval from pcs in P and TnP at order position a is n, for all a and n.) (The sum of pcs at order number a in P and TnIP is n, for all a and n.) (The interval from pcs in P to RTnP at order position a is the inversion of the interval from pcs in P to RTnP at order position B–a, for all a and n.)
Mathematics and the Twelve-Tone System: Past, Present, and Future
i(Pa,RTnIPa) = i(PB-a, RTnIPB-a)
269
(The interval from pcs in P to RTnIP at order position a is the interval between pcs in P to RTnIP at order position B–a, for all a and n.) (After Babbitt(1960) and Martino(1961))
In Example 2, I have distilled identities from Babbitt (1960) and Martino (1961); these identities can easily be derived from the definition of rows, ordered intervals and the twelve-tone operators Tn, I, and R. (This example uses an array to model a row but Babbitt (1960) uses a different concept: A row is a set of unordered pairs, each pair consisting of a pc and its order position in the row). Second, in addition to twelve-tone invariance, Babbitt and others showed that the rows used by Schoenberg, Berg, and Webern were not chosen capriciously, but would depend on features such as shared ordered and unordered sets. Babbitt (1962) called this set-structure invariance (see Example 3). Ex. 3. Rows related by shared unordered and ordered sets Unordered sets shared by related rows. P is the row of Schoenberg’s Violin Concerto, op. 36. P: RT5IP = P: T4IP:
0 1 6 2 7 9 G 9 0 6 7 1 2 G
3 4 A B 5 8 H 8 A 3 B 4 5 H
2 7 9 X 2 9 7 X
0 1 6 W 4 3 A Y
3 4 A Y 1 0 6 W
B 5 8 Z 5 B 8 Z
Ordered sets shared by related rows. RT5IP:
TBP:
9 J: K: L: M: 9 B J: B K: L: M:
0 6 7 1 2 8 A 3 B B 0 6 2 A 1 8 3 7 0 5 1 6 8 2 3 9 A 5 0 6 2 A 1 8 9
4 5 5 4 4 7 4 7
Row succession by complementation or row linking is another aspect of this thinking (see Example 4).
270
R. Morris
Ex. 4. Row succession by complementation and linking P is the row of Webern’s Variations for Orchestra, Op. 30. Row succession by complementation (underlined pcs form aggregates from one row to the next P: 0 1 4 3 2 5 6 9 8 7 A B / RT6P: 5 4 1 2 3 0 B 8 9 A 7 6 / T5IP: 5 4 1 2 3 0 B 8 9 A 7 6 / RTBIP: 0 1 4 3 2 5 6 9 8 7 A B / P: 0 1 4 3 2 5 etc.
Row succession by linking Via TnRI succession (a twelve-tone invariant) P: 0 1 4 3 2 5 6 9 8 7 A B / T8P: 8 9 0 B A 1 2 5 4 3 6 7 RT9IP: A B 2 1 0 3 4 7 6 5 8 9 /
Via T5 succession (a special invariance of this row) P: 0 1 4 3 2 5 6 9 8 7 A B / T3P: 3 4 7 6 5 8 9 0 B A 1 2 T5P: 5 6 9 8 7 A B 2 1 0 3 4 / T8P: 8 9 0 B A 1 2 5 4 3 6 7 TAP: A B 2 1 0 3 4 7 6 5 8 9
These examples demonstrate that musical objects and relations were supported and cross-related from one row to another to build musical continuity, association and form. Early pre-mathematical research also concerned itself with the relations of the system to tonality. Here are some of the specific questions that arose: was the first pc of a row a kind of tonic? Or was a row tonal if it contained tonal material such as triads and seventh chords? Or did the P and I rows participate in a duality like that of tonic and dominant? In general, tonality was either seen as opposed to the system or both were transcended by a Hegelian sublation into aspects of the same musical and universal laws. But a lack of clarity that conflated reference, quotation, suggestion, analogy, and instantiation made the question impossible to define, much less answer. This obsession with tonality retarded work on the vertical or harmonic combination of rows in counterpoint. Even after the set theories of Howard Hanson (1960), Hubert Howe (1965), and Allen Forte (1964, 1973) had become established, it was not until the 1980s that the problem was generalized to all types of rows and set-classes (Morris 1983). By this point in time, clarity about the nature of musical systems and their models helped make the tonality issue manageable. Benjamin Boretz’s “Meta Variations,” published serially in Perspectives of New Music from 1969 to 1973 is the seminal work on this topic. Understanding tonality as recursive but invariant among levels made it possible to conceive of the multiple order number function rows (Batstone 1972) that implement such properties to various degrees. And it was Babbitt who revealed that Schoenberg’s later “American” twelve-tone practice was founded on hierarchic principles so that an entire passage of music would be controlled by a
Mathematics and the Twelve-Tone System: Past, Present, and Future
271
quartet of rows from a row-class preserving partial ordering and hexachordal partitions (Babbitt 1961). David Lewin mathematically elaborated and extended this notion in his early work on row nestings. (Lewin 1962) The early research focused on entities. The row was considered the core idea of the system and specific types of rows, such as order-invariant rows or the all-intervalrows (called AIS) were invented (or discovered) and discussed (see Example 5). For example, the all-interval-row of Berg’s Lyric Suite (and also used in other of his works) and its T6R invariance provides an example. Studies of various types of rows continued up until the 1980s, and I will provide examples below. Ex. 5. Berg’s Lyric Suite row and other All-interval rows (AIS) Berg Lyric Suite row: C: 0 B 7 4 2 9 3 8 A 1 5 6 INT(C): B 8 9 A 7 6 5 2 3 4 1
C = RT6C
Wedge row: D: 0 1 B 2 A 3 9 4 8 5 7 6 INT(D): 1 A 3 8 5 6 7 4 9 2 B
D = RT6D NB: D = r6T9M5C
Mallalieu Row: E: 0 1 4 2 9 5 B 3 8 A 7 6 INT(E): 1 3 A 7 8 6 4 5 2 9 B
E = RT6E
rRM7-invariant Row. F: 0 5 8 9 3 A 2 4 1 B 7 6 INT(F): 5 3 1 6 7 4 2 9 A 8 B
F = r4RT9M7F
Krenek’s AIS row without special invariance G: 0 3 A 4 9 B 8 7 5 1 2 6 INT(G): 3 8 6 7 2 9 B A 8 1 4 The trope system of Hauer (1925), contemporaneous with the invention of the twelve-tone system, proposed a system of hexachordal partitions from which music could be made, but the use of tropes was not suitably differentiated from the use of rows, so they seemed to be rival systems rather than different, non-opposed ways of creating music structure. (Up until perhaps 1980 there was similar lack of distinction between Perle’s cyclic sets and the twelve-tone row (Perle 1977).) Questions of enumeration also were raised. How many rows? How many distinct related rows under transposition, inversion and/or retrograde (since some rows are invariant)? How many unordered sets of pitch-classes? How many tropes? How many chord-types? Not until the 1960s was it understood that answers to such questions were determined by what transformations one included as canonic—as defining equivalence-classes. (This involves changes in the cyclic index of Burnside’s method of counting equivalence classes).
272
R. Morris
As I have pointed out, it was the lack of adequate formal descriptions and models that limited early work on the twelve-tone system. The introduction of mathematical tools changed all that. By the 1970s it became clear that the system was not only about things, but also about the ways in which these things were changed or kept invariant within the system. In 1978 Daniel Starr enunciated the entity/transformational distinction that is so familiar to us today. It took some time however before the difference between a binary group and an transformational group was appreciated; or to put it another way, that the set of transformations that formed a group was distinct from the objects it acted on; and that these objects might be not only pitch-classes, but sets, arrays, networks, etc., which in turn might suggest a variety of types of transformation groups. (Lewin 1978, Morris 1978.) This widened the scope of twelve-tone theory to encompass non-twelve-tone things such as tonal chords, scales, and the like. The intervention of mathematical tools occurred in three stages. First was the use of mathematical terminology and symbols including the use of numbers to identify pitchclasses, order numbers, and transpositional levels. Variable names (with subscripts) such as Sn or Pn, In, Rn, and RIn were used to name rows. However, this practice conflated the difference between a label denoting an entity versus a transformation. A second stage was the use of mathematical and logical concepts such as equivalence and relation, and the use of mathematical terms borrowed from real math or computer science such as “invariance” or “function.” Sometimes, strange terminology from the mathematical point of view resulted: such as the names “set-class” or “interval-vector”; or using the term “complement” to mean “inverse.” But at least these ideas and functions were more or less contextually well defined. At this stage, concepts were generally used to describe the properties of musical entities. Perhaps the most important insight was Babbitt’s claim that the twelve-tone system was inherently permutational rather than combinational (Babbitt 1960). While this assertion is perhaps too categorical,2 Babbitt opened the door to the use of group theory in musical research. Researchers also adopted the language of set theory to describe musical properties and relations among sets of musical things. Nevertheless, confusion remained because the same terms were used for different kinds of things. For instance, in the 1970s the term “set” meant row at Princeton and unordered-pc-set at Yale. Moreover, technical labels did not address all the important differences. The distinction between interval and interval-class was not explicitly defined; later, the intervalclass would finally be understood as the “distance” between pitch-classes or pitches, while the term “interval” would define a directed distance between two pitch-classes or a transformation of one to another. Sets and set-classes were still not adequately distinguished in the literature until around 1975, even after the publication of Allen Forte’s important book (1973), which does not explicitly make the distinction. The third stage involved the use of mathematical reasoning in music theory. At first this reasoning would be alluded to, or presented in words, or in symbols in ad hoc ways. Sometimes this work was done behind the scenes, as in the proof of the complement theorem, which was asserted in the late 1950s but not explicitly proven in the literature until the 1980s.3 (See Example 6.) 2
Tonality and theories of chords involve permutation and aspects of the twelve-tone system involve unordered sets. 3 The theorem was enunciated as early as Hanson (1960), and sketches for a proof were given in Regener (1974) and Starr (1978). An elegant proof appears in Lewin (1987).
Mathematics and the Twelve-Tone System: Past, Present, and Future
273
Ex. 6. Common tone and complement theorems 1. Transpositional Common Tone Theorem: #(A ∩ TnB) = MUL(A,B,n). The function MUL(A,B,n) is the multiplicity of i(a,b) = n for all a and b where A and B are pcsets and a ∈ A, b ∈ B. 2. Inversional Common Tone Theorem: #( A ∩ TnIB) = SUM(A,B,n). The function SUM(A,B,n) is number of sums a+b = n, for all a and b where A and B are pcsets and a ∈ A, b ∈ B. 3. Complement Theorem: MUL(A´,B´,n) = 12−(#A + #B) + MUL(A,B,n). # is the cardinality operator; #X is the cardinality of X. ´ is the complement operator; A´ is the complement of A. But it didn’t take long before there were ways to do something like professional mathematics in the body of a music theory paper. This led to some consensus about the nature of the terminology and formalisms used in music theory today—but sometimes these do not correspond one-to-one with mathematical treatment. With the use real mathematics in music theory, theorists realized that there are branches of mathematics that could be applied to their problems; up to then many theorists constructed the mathematics needed from the ground up. Sometimes this work was original or unique from the mathematical point of view. Indeed, Milton Babbitt has remarked that John Tukey “observed casually that [Babbitt] had produced, inadvertently, some hitherto unknown and, apparently, not uninteresting theorems in group theory” (Babbitt 1976). The transition from stage two to three was aided by the use of computers to model and/or enumerate aspects of the twelve-tone system. Starting circa1970, many graduate programs—most notably at Princeton, Yale, and the Eastman School of Music— introduced faculty and students to computer programming via seminars and courses. The result was an appreciation of the need for correct and apt formalization of music theoretic concepts and reasoning. This paved the way for researchers to go directly into the math that underlay the design and implementation of the computer programs. Moreover, the output of programs posed new puzzles. What was the stucture underlying the output data? These three stages actually overlapped in the literature depending on the mathematical sophistication of both authors and readers. Some mathematical treatments of serial topics remained virtually unread until music theory as a whole caught up. For instance, Walter O’Connell (1963) wrote a mathematically interesting and prescient article in Die Reihe 8; however, theorists and composers have generally overlooked it even though it is the first published account of the multiplicative pitch-class operations, the order-number/pitch number exchange operator, and networks of pitchclasses and transformations in multiple dimensions. Sometimes such work was not even published or, if published, criticized as irrelevant to music study—as unwanted applied mathematics. The prime example involves the classical papers by David Lewin on the interval function. Lewin’s sketch of the mathematical derivation of the function via Fourier analysis, published in JMT in 1959 and 1960 was not appreciated and developed until recently by young theorists such as Ian Quinn (2006).
3 Important Results and Trends Perhaps the most important development in twelve-tone theory was the invention of invariance matrices of Bo Alphonce at Yale in 1974.
274
R. Morris
Ex. 7a. T- and I-matrices for a row and a hexachord/trichord pair T-matrix E: Ei,j = Pi + IPj
I-matrix F: Fi,j = Pi + Pj
P = 0 1 6 2 7 9 3 4 A B 5 8
P = 0 1 6 2 7 9 3 4 A B 5 8
01627934AB58 0|01627934AB58 B|B05168239A47 6|6708139A45B2 A|AB4057128936 5|56B7028934A1 3|3495A067128B 9|9A3B46017825 8|892A35B06714 2|23849B56017A 1|12738A45B069 7|781924AB5603 4|45A6B1782390
01627934AB58 0|01627934AB58 1|12738A45B069 6|6708139A45B2 2|23849B56017A 7|781924AB5603 9|9A3B46017825 3|3495A067128B 4|45A6B1782390 A|AB4057128936 B|B05168239A47 5|56B7028934A1 8|892A35B06714
T-matrix G: Gi,j = Xi + IYj
I-matrix H: Hi,j = Xi + Yj
X = {012478}; Y = {348}
X = {012478};Y = {348}
012478 9|9AB145 8|89A034 4|4568B0
012478 3|3457AB 4|4568B0 8|89A034
Here T- and I-matrices were shown to display properties of pairs of ordered or unordered sets. In addition, Alphonce used them to analyze one passage of music in terms of another. Since the row-table (probably invented by Babbitt in the 1950s) is a special case of the T-matrix, the complex of rows was shown to be related to its generating row in ways supplementing those already formalized by earlier research such as the common-tone and hexachord theorem. Ex. 7a shows the T- and I-matrices for a row and a hexachord/trichord pair. Ex. 7b. T- and I-matrices generate Lewin’s IFUNC T-matrix G: Gi,j = Xi + IYj
I-matrix H: Hi,j = Xi + Yj
X = {012478}; Y = {348}
X = {012478}; Y = {348}
012478 9|9AB145 8|89A034 4|4568B0
012478 3|3457AB 4|4568B0 8|89A034
IFUNC(X,Y) = [210132102222]
IFUNC(X,IY) = [200232112122]
Mathematics and the Twelve-Tone System: Past, Present, and Future
275
IFUNCn(X,Y) is the number of ns in the T-matrix. IFUNCn(X,IY) is the number of ns in the I-matrix. IFUNCn(X,Y) = MUL(X,Y,n) IFUNCn(X,IY) = SUM(X,Y,n) Corollary of Transpositional Common Tone Theorem: #(X ∩ TnY) = IFUNCn(X,Y). Corollary of Inversional Common Tone Theorem: #( X ∩ TnIY) = IFUNCn(X,IY). (# is the cardinality operator; #X is the cardinality of X.) From one point of view, the T-matrix is a complete list of the directed intervals between the entities that generate it. Thus, as illustrated in Ex. 7b, T- and I-matrices generate Lewin’s I-func. Such matrices have many other functions and uses, such as spelling out the verticals in Stravinsky’s rotational arrays, since those array’s columns are the diagonals of the T-matrix. Ex. 7c shows that the Tonnetz is a T-matrix and illustrates the derivation of a rotational array from a T-matrix, and the transpositional combination of two unordered sets. Ex. 7d displays permutations between two rows and their interaction to form a determinate contour. Ex. 7c. T-matrix, derived rotational array, transpositional-combination and Tonnetz T-mat H Hi,j = Xi + IXj Rotational Array derived from T-matrix H. (Diagonals of H become columns of rotational array) X = 0 A 7 9 2 8* 2 and 4 are I-related.
Columns 0 and 3 are I-invariant; columns 1 and 5 and
0A7928 209B4A 530271 31A05B A85706 42B160
0A7928 09B4A2 027153 05B31A 06A857 042B16
Transpositional combination of {025} and {289A} {023456789A} = {025} * {289A} set-classes 10-3[012345679A] = 3-5[025] * 4-5[0126] X = {025}; Y = {289A} 025 2|A03 8|469 9|358 A|247
276
R. Morris
The Tonnetz is the T-matrix for X = 0369 and Y = 048 0369 0|0369 4|47A1 8|8B25 *X is the first hexachord of Stravinsky’s “A Sermon, A Narrative, and a Prayer.”
In my 1987 book, invariance matrices underlie and unify many different aspects of serial theory including the relations of sets of transformations and mathematical groups. This is because a T-matrix is a group table or a part thereof.
Mathematics and the Twelve-Tone System: Past, Present, and Future
277
There are many other important landmarks on the way to the present, and I will mention some of them as part of an identification of the important trends starting from about 1950. I believe the books and articles I shall cite are still of interest to us today and should be read by students who wish to go further into mathematical music theory—even if they are not studying anything directly connected to what I call “composition with pitch-classes.” Ex. 8. Some combinatorial arrays P is the row of Schoenberg, Op. 36. Each array column is an aggregate. Each array row is a transformation of P. Below each column is a multiset, identifying the 12-partition of the column. 8a. P RP
0 1 6 2 7 9 8 5 B A 4 3
3 4 A B 5 8 9 7 2 6 1 0
62 8b. P RP T3IP RT3IP
0 8 3 7
1 5 2 A
6 B 9 4
62 2 A 1 5
34 8e. P T3P RT3P
7 4 8 B 34
0 1 6 2 7 3 4 9 5 A B 8
4 7 B 8
A 2 5 1
B 6 4 9
34
5 1 A 2
8 0 7 3
34
7 1 2 8 B 6 0 A 5 9 4 3 75
0 1 6 2 7 9 3 4 A B 5 8 4 3 A 2 9 7 1 0 6 5 B 8 (T11P)B 0 5 1 6 8 2 3 9 A 4 7 8 5 B A 4 3 9 7 2 6 1 0 632
8h. P T4P T9IP RT3IP RT7IP
3 9 0 6
9 3 4 A B 5 8 0 6 2 1 7 732
522 8f. P T4IP RP
9 3 6 0
0 4 9 7 B
1 6 5 A 8 3 2 3321
632
632 2 7 9 3
6 7 A 8
B 2 4 9
1 0 5 3
34
7 8 2 3 9 6 B 4 A 0 5 1 5212
5 B A 4 0 6 8 1 43
632 4 A B 5 8 0 1 9 2 3 6 7 53212
278
R. Morris
I’ve already mentioned Babbitt’s important articles on serial music. His earliest work, as documented in his 1947 Princeton dissertation (but not recommended for acceptance until 1992) was to develop the compositional practices of Schoenberg and Webern into a system of combinatoriality, that is, aggregate preservation among contrapuntal combinations of rows. (It is quite an achievement, for Babbitt was able to make much progress without the explicit distinction of pitch and pitch-class, operator and entity, and set and set-class, and without any explicit invocation of group theory.) Babbitt (1961, 1973), Donald Martino (1961), Starr and myself (Starr and Morris 1977-78) continued to develop the theory of combinatoriality. This work, generalized as two-dimensional arrays with rows or other types of pitch-class entities in the array rows and with aggregates or other pitch-class “norms” in the columns, de-emphasized the row. Ex. 8 provides some combinatorial arrays. It was established that while small combinatorial arrays depended on the properties on the generating row, larger and more elaborate arrays depended on more global principles, as in the rotational array of Stravinsky’s serial music. Consequently, the emphasis shifted from the row to the array so that the array might be considered the more basic musical unit. (Winham 1970, Morris,1983, 1987) This was inherent in Babbitt’s serial music, which, while unnoticed for quite a time in the literature, had been composed from pairs of combinatorial rows rather than rows alone. This meant, as with the arrays, that musical realization was a matter of choosing and ordering out of the partial ordering given by the array columns. Ex. 9. lattice, poset, and order matrix derived from an array column Array column
2 7 9 3 5 B A 4 0 6 8 1 Lattice derived from Array column
*
2
7
9
3
5
B
A
4
0
6
8
1
*
Poset derived from lattice { (2,7) (2,9) (2,3) (7,9) (7,3) (9,3) (5,B) (5,A) (5,4) (B,A) (B,4) (A,4) (0,6) (0,8) (0,1) (6,8) (6,1) (8,1) }
Mathematics and the Twelve-Tone System: Past, Present, and Future
279
Order matrix derived from poset 0123456789AB 0|010000101000 1|000000000000 2|000100010100 3|000000000000 4|000000000000 5|000010000011 6|010000001000 7|000100000100 8|010000000000 9|000100000000 A|000010000000 B|000010000010 Thus various types of posets of the aggregate and their possible realization as rows became the focus of this research. Lewin (1976) and Starr (1984) were the first to specify and formalize the use of posets in twelve-tone theory. (Morris 1983, 1987, 1995a) An order lattice, posets, and order-matrix derived from an array column are given in Ex. 9. Ex. 10. Non-aggregate combinatorial array 0 578 A 9
245
79
037
25 46
03A 18B
Array rows are members of set-class 6-32[024579] (C all-combinatorial hexachord) Array columns are members of set-class 6-8[023457] (B all-combinatorial hexachord) Eventually the array concept became detached from aggregates and rows so that it could model the preservation of harmonic relations among simultaneous linear presentations of any kinds of pitch or pitch-class entities. Such non-aggregate combinatoriality was useful in formalizing and extending aspects of the music of Carter and others. The topic extends into set-type saturated rows, two-partition graphs and the complement-union property. See Morris 1985, 1987. A simple example of nonaggregate combinatorial array occurs in Ex. 10. Research on partitions of the aggregate form a related trend to combinatoriality. Babbitt was the first composer to use all 77 partitions of the number 12 in his music by inventing the all-partition array. (Babbitt 1961, 1973) See Ex. 11. The earliest emphasis on partitions is that of Hauer, whose tropes are collections of 6/6 partitions grouped by transposition. There are 44 such tropes, but if we use transposition and inversion to group the partitions there are 35 tropes—the number of pairs of complementary hexachordal set-classes. Martino’s article of 1961 is an early development of the partitions of the aggregate followed more than 25 years later by Andrew Mead (1988), Harald Fripertinger (1992), Brian Alegant (1993), and Alegant and Lofthouse (2002).
280
R. Morris
Ex. 11. The 77-partitions of the aggregate (Self-conjugates are in boldface.) number of parts → 1 2 3 4 5 max. 1 length 2 of 3 part ↓ 4
43
5
522 543
6
62 651 642 632 75 741 732
7 8 9 10 11 1212
6
7
8
9
10 11 12 112 26 2512 2414 2316 2218 2110 4 2 3 3 3 2 3241 32313 32215 3217 319 3321 322212 32214 3216 3313 4322 424 42312 42214 4216 418 4222 43221 43213 4315 4231 43212 4214 42212 5212 5413 52213 5215 517 5421 53212 5314 5321 5231 5322 6412 6313 6214 616 6321 62212 623 7312 7213 715 7221 8212 814
84 831 822 93 921 913 10 2 10 12 11 1
Ex. 12a Types of rows derived all combinatorial all interval rows order invariant P = r9RT4IP: 0 3 7 B 2 5 9 1 4 6 8 A (Berg, Violin Concerto)
Mathematics and the Twelve-Tone System: Past, Present, and Future
281
all-trichord Q: 0 1 B 3 8 A 4 9 7 6 2 5 _____ _____ _____ _____ 3-1 3-9 3-7 3-3 _____ _____ _____ 3-6 3-8 3-2 _____ _____ _____ 3-11 3-5 3-4 (Babbitt, Images) trichords:
multiple order function rows S: 0 1 B 4 8 5 9 A 7 3 2 6 X: Y: Z: RTAIS: 4 8 7 3 0 1 5 2 6 B 9 A Y: 4 8 5 9 A Z: 7 3 2 6 X: 0 1 B S embedded in successions of RTAIS: 487301526B9A487301526B9A487301526B9A set-type saturated rows T: hexachords:
0 1 4 7 8 A B 2 5 6 9 3 ( 0 1 4 7 8) ___________ ___________ 6-28 6-49 ___________ _____________ 6-49 6-28 ___________ _____________ 6-28 6-49 (Morris, Concerto for Piano and Winds)
As I’ve mentioned, studies of kinds of rows have led to generalities beyond rows. The next example provides a brief survey of some of these special rows. These types shown in Ex. 12a are not mutually exclusive so that a row might reside in all of these categories. (Ex. 12a does not show derived, all-combinatorial, and all-interval rows because I’ve already given examples of these types.) Ex. 12b illustrates the use of a selfderiving array with one of my short piano compositions. This kind of array allows each aggregate to be ordered according to the rows in the row-class of the generating row. These examples point out that research on derived, all combinatorial rows, all interval rows, order invariant rows, all-trichord rows, multiple order function rows (Batstone 1972, Morris 1976, 1977, Mead 1988, 1989, Scotto 1995), set-type saturated rows (Morris 1985) and self-deriving array rows (Starr 1984, Kowalski 1985) reflected new orientations to the use and function of the twelve-tone system, which developed, in turn, into considerations of various kinds of saturation in addition to aggregate completion, the embedding of one musical thing in itself or another, the preservation of properties among like entities such as ordering, transformations, and
282
R. Morris
set-structure. These topics are grounded in the cycles of transformations considered as permutations and the orbits of the permutation groups. These questions of preservation often hinge on whether pairs of transformations commute, and if their orbits and cycles are invariant under interval preserving transformations. Mead’s (1988) elaboration on the pitch-class/order number isomorphism introduced by Babbitt and O’Connell is another signal contribution to this topic for it allows any subset of an ordered pc entity to be characterized as batches of pcs at batches of order numbers or
Mathematics and the Twelve-Tone System: Past, Present, and Future
283
vice versa; in this way, all partitions of the aggregate are available in each and every row and the difference between rows is based on the distributions of these partitions over the class of all rows. The development of ways to extend adequately the relationships among pitchclasses to time and other musical dimensions was an unsolved problem until the advent of Stockhausen’s article “...how time passes...” (1959) and Babbitt’s (1962) time-point system. Such elaborations were further developed by Rahn (1972), Morris (1987) and especially David Lewin (1987), who constructed non-commutative temporal GISs that do not preserve simultaneity, succession or duration. Another line of research concerns the construction of networks of pitches or other musical entities connected by succession, intersection or transformation. Perle’s (1977) elaboration of his cyclic sets together with Lansky’s (1973) formalization via matrix algebra, and the further generalizations to K-nets (Lewin 1990) represents one strand in network theory. Another strand is the use of networks of protocol pairs to create poset lattices for generalizing order relationships in serial music (Lewin 1976, Starr 1984). Yet another strand begins with similarity graphs among pcsets and setclasses (Morris 1980), two-partition graphs (Morris 1987), transformation networks (Lewin 1987) and some types of compositional spaces (Morris 1995a). In the interest of time and space, I’ve left out a great deal of important research including the application pc theory to musical contour and time.
4 Present State of Research Today, the nature of the twelve-tone system is well understood. In a few words, the field is supported by an application of mathematical group theory, where various kinds of groups act on pcs, sets, arrays, etc. The most important group is the affine group including the Tn, and Mm operations acting on Z12 or simply Z. Other subgroups of the background group S12 have been used to relate musical entities; these fall into two categories; the so-called context sensitive groups some of which are simply-transitive, and groups that are normalized by operations in the affine group. Other branches of math having strong connections with group theory such as semi-groups and fields, number theory, combinatorial analysis and graph theory are often implicated in twelve-tone research. In addition to the unification provided by group theory, topics in different branches of pitch-class theory have been connected and revealed to be instances of the same concept or model. For instance, I’ve shown that Forte’s K and Kh-relations are related via two-partitions to hexachordal combinatoriality (Morris 1987) or that Forte’s genera are actually the result of operations on his set-complexes (Morris 1997). In any case, our understanding of the twelve-tone system is so general so that to divide the topic into set theory (à la Forte), cycles (à la Perle) and transformations (à la Lewin) is no longer viable. If there are any differences between these three orientations to the system it is due to the applications. What is more, when it became obvious that serial theory was actually an application of group theory, research shifted over from modeling serial composition and analysis to other aspects of music that involved symmetry. David’s Lewin’s (1987) work on general interval systems (GIS) and transformation networks represents this
284
R. Morris
change of orientation. Thus, the development of the twelve-tone system has been so extended and ramified that there is no longer a need to distinguish this line of work from other mathematically informed branches of theory. Neo-Riemannian, scale theory, networks, and compositional spaces, unify and interconnect music theory in hitherto unexpected ways. Thus the distinction between tonal and atonal may no longer very meaningful; rather, distinctions between types and styles of music are much more context-sensitive and nuanced thanks to the influence of mathematics.
5 Future While the twelve-tone system is no longer isolated from other aspects of music theory such as models for tonality, there are many research projects that can be identified to carry on previous work, One obvious direction is to ask what happens when we change the “twelve” in twelve-tone system? Carlton Gamer (1967a and b) was one of the first theorists to raise such issues. He showed that equal tempered systems of other moduli not only have different structures; they allow different types of combinatorial entities to be built within them. Another aspect that individuates mod-n systems is that its (multiplicative) units need not be their own inverses as they are in the twelve-tone system. Moreover, when n is a prime, all integers mod n are units. Jumping out of any modular system into the pitch-space, there are other ways of conceptualizing and hearing pitch relations, as in spectral composition. Here I list a few more specific research issues. What are the ranges for models of similarity between and among ordered sets (including rows)? A few models have been introduced: order-inversions (Babbitt 1961), BIPs (Forte 1973), and the correlation coefficient (Morris 1987). At the time of this writing, Tuukka Ilomaki is working a dissertation on row similarity. Generating functions and algorithms have been useful in enumerating the number of entities or equivalence classes such as rows, set-classes, partition-classes, and the like. Are there mathematical ways of generating entities of certain types such as allinterval series or multiple order function rows? Some preliminary results are found in Fripertinger (1992). Babbitt has pointed out that the famous multiple order function Mallalieu row can be generated by the enumeration of imprimative roots. Can most or all multiple order function rows be similarly generated? Caleb Morgan has been working on this question and will soon publish the results. There is much work to be done on the generation of combinatorial and other arrays. For instance, it is an unproven conjecture that any row can generate a twelverow, all 77-partition array, but only special rows can generate a 4-row, all 34-partition array. However, in the later case, even the necessary criteria are not known. Bazelow and Brickle carried out an initial probe into this problem in 1976. A host of other similar problems surround the creation and transformation of arrays. In twelve-tone partition theory, the Z-relation is generally understood, but what about in systems of other moduli? David Lewin (1982) showed that there were Ztriples in the 16-tone system. Does the Z-phenomenon have one root cause or many? Multisets are of use for modeling doubling and repetition in voice-leading and weighted arrays. Even the most basic questions of enumeration and transformation of multi-pcsets have yet to be investigated.
Mathematics and the Twelve-Tone System: Past, Present, and Future
285
Existence proofs have been lacking to explain—for instance—why there are no allinterval rows that are also all-trichordal.4 Another open question: do 50-pc rings that imbricate an instance of each of the 50 hexachordal set-classes exist?
6 Conclusion The introduction of math into music theoretic research has had a number of important consequences. At first, the work simply became more rigorous and pointed in the questions that it could ask and in the generality of the answers. On one hand, this led to the identification of different types of twelve-tone music and the models for each type within the twelve-tone system. On the other hand, group theory eventually unified what seemed to be different aspects of music so that the twelve-tone system could no longer completely be conceptually differentiated from tonality, modality, and even aspects of non-Western music. I say “not completely differentiated,” for there are other mathematical bases for music besides group theory. For instance, Schenkerian tonal theory is not modeled by groups of transformations; here we have tree structures. Thus there might be a distinction between serial and tonal music by transformation graphs that have no loops or cycles and those that do. Of course, graph theory would model these two ways of organizing transformations and entities accordingly—as different types of graphs. But since every graph whatsoever has at least one shortest spanning tree, graphs that do not have loops and cycles are involved with ones that do. It can be easily be shown that “motivic” association among structural levels in Schenkerian theory is not hierarchical, i.e., tree-like, and that relationships among serial entities like rows can be recursive and/or hierarchically configured. The moral here is not that graph theory might be a better general model for music than group theory, but that no one mathematical theory is going to partition musical structures into closed categories or completely unify it. Thus, links between mathematical categories will have relevance for modeling the diversity of music, as Mazzola (2002) and his co-workers have begun to demonstrate. In any case, while there can be no doubt that the way we regard music has been transfigured by the use of math in music theory, the music we study remains or remains to be written. But math is not music, and one must remember to distinguish between a task and the tools used to accomplish it. Nevertheless, a traditional verity has been falsified: the way we explore a given composition’s particularity is no longer different in kind from the way we associate and/or group different pieces, genres, and styles.
References Alegant, B.: The 77 Partitions of the Aggregate. Ph.D. dissertation, University of Rochester (1993) Alegant, B., Lofthouse, M.: Having Your Cake and Eating It, Too: The Property of Reflection in Twelve-Tone Rows (Or, Further Extensions on the Mallalieu Complex). Perspectives of New Music 40(2), 233–274 (2002) Alphonce, B.: The Invariance Matrix, Ph.D. dissertation. Yale University (1974) 4
Here we mean all-trichordal in Babbitt sense of the term: a row that imbricates an instance of each of ten different trichordal set-classes, leaving out [036] and [048].
286
R. Morris
Babbitt, M.: Some Aspects of Twelve-Tone Composition. The Score and IMA Magazine 12, 53–61 (1955) Babbitt, M.: Twelve-Tone Invariants as Compositional Determinants. Musical Quarterly 46, 245–259 (1960) Babbitt, M.: Set Structure as a Compositional Determinant. Journal of Music Theory 5(2), 72– 94 (1961) Babbitt, M.: Twelve-Tone Rhythmic Structure and the Electronic Medium. Perspectives of New Music 1(1), 49–79 (1962) Babbitt, M.: Since Schoenberg. Perspectives of New Music 12(1-2), 3–28 (1973) Babbitt, M.: The Function of Set Structure in the Twelve-Tone System. Ph.D. dissertation. Princeton University, Princeton (1992) Batstone, P.: Multiple Order Functions in Twelve-Tone Music. Perspectives of New Music 10(2) 11(1), 60–71, 92–111 (1972) Bazelow, A.R., Brickle, F.: A Partition Problem Posed by Milton Babbitt (Part I). Perspectives of New Music 14(2), 280–293 (1976) Benjamin, B.: Meta-Variations: Studies in the Foundations of Musical Thought (I). Perspectives of New Music 8(1), 1–75 (1969) Benjamin, B.: Sketch of a Musical System (Meta-Variations, Part II). Perspectives of New Music 8(2), 49–112 (1970a) Benjamin, B.: The Construction of Musical Syntax (I). Perspectives of New Music 9(1), 23–42 (1970b) Benjamin, B.: Musical Syntax (II). Perspectives of New Music 9(2) 10(1), 232–270 (1971) Benjamin, B.: Meta-Variations, Part IV: Analytic Fallout (I). Perspectives of New Music 11(1), 146–223 (1972) Benjamin, B.: Meta-Variations, Part IV: Analytic Fallout (II). Perspectives of New Music 11(2), 156–203 (1973) Forte, A.: A Theory of Set-Complexes for Music. Journal of Music Theory 8(2), 136–183 (1966) Forte, A.: The Structure of Atonal Music. Yale University Press, New Haven (1973) Forte, A.: Pitch-Class Set Genera and the Origin of Modern Harmonic Species. Journal of Music Theory 32(2), 187–334 (1988) Fripertinger, H.: Enumeration in Music Theory (1992), http://www.unigraz.at/~fripert/musical_theory.html Gamer, C.: Deep Scales and Difference Sets in Equal Tempered Systems. Proceedings of the American Society of University Composers 2, 113–122 (1967a) Gamer, C.: Some Combinatorial Resources in Equal Tempered Systems. Journal of Music Theory 11(1), 32–59 (1967b) Haimo, E., Johnson, P.: Isomorphic Partitioning and Schoenberg’s Fourth String Quartet. Journal of Music Theory 28, 47–72 (1984) Hanson, H.: The Harmonic Materials of Twentieth-Century Music. Appleton-Century-Crofts, New York (1960) Hauer, J.M.: Vom Melos zur Pauke: Eine Einfürung in die Zwölftonmusik. Universal Edition, Vienna (1925) Howe, H.S.: Some Combinatorial Properties of Pitch-Structures. Perspectives of New Music 4(1), 45–61 (1965) Kowalski, D.: The Array as a Compositional Unit: A Study of Derivational Counterpoint as a Means of Creating Hierarchical Structures in Twelve-Tone Music. Ph.D. dissertation. Princeton University, Princeton (1985) Lansky, P.: Affine Music. PhD dissertation. Princeton University, Princeton (1973)
Mathematics and the Twelve-Tone System: Past, Present, and Future
287
Lewin, D.: Intervallic Relations Between Two Collections of Notes. Journal of Music Theory 3(2), 298–301 (1959) Lewin, D.: The Intervallic Content of a Collection of Notes, Intervallic Relations Between a Collection of Notes and its Complement: An Application to Schoenberg’s Hexachordal Pieces. Journal of Music Theory 4(1), 98–101 (1960) Lewin, D.: A Theory of Segmental Association in Twelve-Tone Music. Perspectives of New Music 1(1), 276–287 (1962) Lewin, D.: On Partial Ordering. Perspectives of New Music 14(2)-15(1), 252–259 (1976) Lewin, D.: Some New Constructs Involving Abstract Pcsets, and Probabilistic Applications. Perspectives of New Music 18(2), 433–444 (1980a) Lewin, D.: On Extended Z-Triples. Theory and Practice 7, 38–39 (1982) Lewin, D.: Generalized Musical Intervals and Transformations. Yale University Press, New Haven (1987) Lewin, D.: Klumpenhouwer Networks and Some Isographies that Involve Them. Music Theory Spectrum 12(1), 83–120 (1990) Lewin, D.: Musical Form and Transformation: 4 Analytic Essays. Yale University Press, New Haven (1993) Lewin, D.: Special Cases of the Interval Function Between Pitch-Class Sets X and Y. Journal of Music Theory 45(1), 1–30 (2001) Martino, D.: The Source Set and Its Aggregate Formations. Journal of Music Theory 5(2), 224– 273 (1961) Mazzola, G., Müller, S.: Stefan Goller, contributors. The Topos of Music: Geometric Logic of Concepts, Theory, and Performance. Birkhäuser, Basel (2002) Mead, A.: Some Implications of the Pitch–Class/Order–Number Isomorphism Inherent in the Twelve-Tone System: Part One. Perspectives of New Music 26(2), 96–163 (1988) Mead, A.: Some Implications of the Pitch–Class/Order–Number Isomorphism Inherent in the Twelve-Tone System: Part Two. Perspectives of New Music 27(1), 180–233 (1989) Messiaen, O.: Technique de mon langage musical. Leduc, Paris (1944) Morris, R.D.: More on 0,1,4,2,9,5,11,3,8,10,7,6. Theory Only 2(7), 15–20 (1976) Morris, R.D.: On the Generation of Multiple Order Number Twelve-tone Rows. Journal of Music Theory 21, 238–263 (1977) Morris, R.D.: A Similarity Index for Pitch-Class Sets. Perspectives of New Music 18(1-2), 445–460 (1980) Morris, R.D.: Combinatoriality without the Aggregate. Perspectives of New Music 21(1-2), 432–486 (1983) Morris, R.D.: Set-Type Saturation Among Twelve-Tone Rows. Perspectives of New Music 22(1-2), 187–217 (1985) Morris, R.D.: Composition with Pitch-Classes: A Theory of Compositional Design. Yale University Press, New Haven (1987) Morris, R.D.: Compositional Spaces and Other Territories. Perspectives of New Music 33(1-2), 328–359 (1995a) Morris, R.D.: Equivalence and Similarity in Pitch and their Interaction with Pcset Theory. Journal of Music Theory 39(2), 207–244 (1995b) Morris, R.D.: K, Kh, and Beyond. In: Baker, J., Beach, D., Bernard, J. (eds.) Music Theory in Concept and Practice. University of Rochester Press, Rochester (1997) Morris, R.D.: Class Notes for Advanced Atonal Theory, vol. 2. Frog Peak Music (2001) Morris, R.D., Alegant, B.: The Even Partitions in Twelve-Tone Music. Music Theory Spectrum 10, 74–103 (1988)
288
R. Morris
Morris, R., Starr, D.: The Structure of All-Interval Series. Journal of Music Theory 18(2), 364– 389 (1974) O’Connell, W.: Tone Spaces. Die Reihe 8, 34–67 (1963) Perle, G.: Twelve-Tone Tonality. University of California Press, Berkeley (1977) Polansky, L.: Morphological Metrics: An Introduction to a Theory of Formal Distances. In: Proceedings of the International Computer Music Conference. Compiled by James Beauchamp, Computer Music Association, San Francisco (1987) Polansky, L., Bassein, R.: Possible and Impossible Melodies: Some Formal Aspects of Contour. Journal of Music Theory 36, 259–284 (1992) Quinn, I.: General Equal-Tempered Harmony. Perspectives of New Music 44(2) 45(1), 144– 159, 4–63 (2006) Rahn, J.: On Pitch or Rhythm: Interpretations of Orderings of and In Pitch and Time. Perspectives of New Music 13, 182–204 (1974) Rahn, J.: Basic Atonal Theory. Longman Press, New York (1980a) Rahn, J.: Relating Sets. Perspectives of New Music 18(2), 483–502 (1980b) Regener, E.: On Allen Forte’s Theory of Chords. Perspectives of New Music 13(1), 191–212 (1974) Roger, J.: Toward a System of Rotational Arrays. Perspectives of New Music 7(1), 80–102 (1968) Schoenberg, A.: Composition with Twelve Tones (I). In Style and Idea: Selected Writings. Univ. California Press, Berkeley (1975) Scotto, C.G.: Can Non-Tonal Systems Support Music as Richly as the Tonal System? D.M.A. dissertation, University of Washington (1995) Starr, D.V.: Sets, Invariance, and Partitions. Journal of Music Theory 22(1), 1–42 (1978) Starr, D.V.: Derivation and Polyphony. Perspectives of New Music 23(1), 180–257 (1984) Starr, D., Morris, R.: A General Theory of Combinatoriality and the Aggregate. Perspectives of New Music 16(1) 16(2), 3–35, 50–84 (1977-1978) Stockhausen, K.: how time passes..... Die Reihe 3, 4–27 (1959) Verdi, L.: The History of Set Theory for a European Point of View. Perspectives of New Music 45(1), 154–183 (2007) Winham, G.: Composition with Arrays. Ph.D. dissertation. Princeton University, Princeton (1970)
Approaching Musical Actions* John Rahn School of Music, University of Washington Seattle, WA 98195, USA
So, an improvisation has been going on for some time, but its impetus is dying out, at first in a good way, all getting more quiet in a nice contrast to what has gone before, but soon, in fact now, we need a new idea, of course (inescapably) related to what we have been playing already, but one that will have a fresh effect and that can carry us into a fertile territory that will in some way complement what has gone before. I gather up into my mind and intuition some threads that have been woven into everything else so far, and form a tentative image of some new pattern to weave, and I act. The act is the public manifestation of my inner representation of my projection of the music we have played onto the screen of the future. The other musicians respond to this new musical context with their own representations, projections and actions, in a spreading web of new musical relations, represented individually and to some extent variously in each musician, and manifested publicly in our shared acoustic space, which serves as our blackboard – the space in which we communicate to each other. Is the picture so different for a composer? She is sitting in a room. Nice trees wave in the breeze outside. She is digesting her breakfast egg over a cup of coffee. On her desk is a pile of pages of musical score, some 128 pages of her new piece for orchestra. She is half-way through the second of three large sections in the piece. She runs through the piece so far composed in her mind, pausing here and there to chew over some bit or other and revise her mental representation of the flow of the piece. She tries to form an image of a good way to continue further on into her second section. She sees the glimmer of an idea. But, it won’t work after all that loudness 30 measures earlier. She puts down her coffee and acts: she erases the brass from a 20bar stretch, leaving only the strings. She sketches in an oboe part in the blank score…. I have referred in an earlier paper to this “wildfire of the musical swerve and flow” as “a sort of playful path in time through a field of temporally invariant relations.”1 David Lewin’s point of departure for the development of his transformational networks was that “...since music is something you do, and not just something you perceive (or understand), a theory of music cannot be developed fully from a theory of musical perception....”2 And again, Lewin says: ““If I am at s and wish to get to t, what characteristic gesture... should I perform in order to arrive there?” The question generalizes...: “If I want to change Gestalt 1 into Gestalt 2...., what sorts of admissible transformations in my space will do the best job?”....This attitude is ... the attitude of *
This paper was originally published in Perspectives of New Music 45(2): 57–75. The editors of these proceedings are grateful to the editors and publishers of Perspectives of New Music for their kind permission to reprint this article in this collection. 1 Rahn (2004). 2 Lewin (1986), p. 377. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 289–302, 2009. © Springer-Verlag Berlin Heidelberg 2009
290
J. Rahn
someone inside the music, as idealized dancer and/or singer. No external observer (analyst, listener) is needed.”3 Lewin’s formulation talks about changing Gestalt 1 into Gestalt 2. Recent practice of transformational theory has focused on networks of pitch classes, the so-called Klumpenhouwer networks, or Knets, which themselves only transform individual pitch classes into other ones, and on the isographies that can obtain between such networks. The isographies are, typically, used analytically in a rather static fashion to associate small stretches of music represented as networks of a few pc, much as a traditional motivic analysis would associate two small motives which bear some resemblance to each other. We could quibble with Lewin’s basic formulation, too, as, taking some given (or found) thing and changing it into some other like thing. It seems altogether too focused on things. It may be an advance to think, as Lewin does, in terms of the transformation from one thing to the other, but the underlying granulation is rather grating. We want to think of music as growing. The thingness of music might lie in the magic metamorphosis from one thing, the music up to now as represented mentally and realized acoustically and in the score by acts, to a new and larger thing which is quite different, voilà, hey presto. Lewin-things are typically the same size, so they do not grow, and the transformations are structure-preserving, which means we are not surprised by the new thing because it is not really different. Are we convinced by an argument that the succession of the two similar things (in time or even in some ordered representation space) produces some really new thing, the one-then-theother? It is a kind of repetition, which has its place in music for sure, and one can take this pretty far – I have done so myself – but it can’t be the whole story.4 I want to recontextualize toward a representation of music which is more temporal, more complex, and located firmly within musical action. We can just begin by thinking about mathematical action, which in fact does model Lewin’s thing-to-thing transformational idea pretty well. You take one thing and transform it into another similar thing. The transformations form some algebraic entity such as a semigroup, monoid, group, etc. There is always an action involved in a Net, the action of the arrow labels on the node contents. Lewin, in GMIT, refers to the transformations he uses to label his arrows as elements of a semigroup; more properly they are elements of a monoid, due to the underlying definition of digraph in GMIT. The whole situation can be reconceptualized and redefined as I have in my recent JMM paper “Cool Tools,” to include polysemic and noncommutative Nets as well as Lewin-nets proper. There is still always at least an action of the arrow labels on the node contents.5 Let S be a monoid and A a non-empty set. There is standard mathematical definition of a right [left] S-act as a mapping from A × S to A where each pair of elements (a, s) maps to as, where a1 = a and a(st) = (as)t for all a in A and s, t in S. Where the identity is missing from S, this is called a semigroup act or “S-act”; where the identity element is present in S, one term for the action is unitary S-act, that is, a monoidal act. Since every group is a monoid, this also serves to define group action. 3
Lewin (1987), p. 159. Rahn (1993). 5 Rahn (2007). 4
Approaching Musical Actions
291
Definitions: A right [left] S-act as a mapping from A × S to A where each pair of elements (a, s) maps to as, where a1 = a and a(st) = (as)t for all a in A and s, t in S. Where the identity is missing from S, this is called a semigroup act; where the identity element is present in S, one term for the action is unitary S-act, that is, a monoidal act. An S-act is also called an S-automaton. It is a machine. A semiautomaton is an automaton without outputs. It is modeled as an act over a monoid in a natural way. In this case, A is the set of states, and S is the input monoid. In this way, in the theory of S-acts, we might as well speak of semiautomata instead of S-acts. 6 Clearly then, Lewin transformational networks are semiautomata. (They do not have output.) When I pointed this out in Rahn (1994), David Lewin emailed me that some factory in Japan had in fact used his transformational theory to set up the production system.7 One hopes that the model was adjusted to produce output. The advantage of semigroups and monoids over groups as a general model for machines is that not all machines can run backwards. Indeed, if we want to model musical acts as taking place in irreversible time, we will need to escape groups and inhabit monoids. Let’s review the specific situation for Knets before escaping Knets. Of course there is the action of the arrow-label group on the node-contents, but this is relatively uninteresting in Knets because the node contents, individual pcs, are so simple, with no internal structure. Can you imagine our Composer (let us call her Isobel) meditating on the pitch class Bb? Delving imaginatively into its internal structure so as to find a way to move to some other pitch class? I do not speak of its spectral evolution and so on, just its quality as a Bb pitch. As Jimmy Durante is reputed to have said of Bb, “That’s a Good Note,” but as a Good Note, it resides securely within itself, without necessity of change, a model of Parmenidean Being: “Being is without beginning and without end, whole, unique, imperturbable, and complete.”8 We need to follow up our idea of how to grow a piece larger from some representation of its earlier stage. For this, we need at least a representation that is complex enough to characterize a moment of music-til-now. Approaching this ideal, we could try to use as data objects sets of pc, orderings of pc, or sets or orderings of more complex data sub-objects such as “notes,” represented as n-tuples of dimensional values such as start time, pitch, and so on, or yet more complex entities. We will address some of these later on. Knets themselves have enough structure within them to begin to be interesting as data objects – they are a more specific representation of a tune or harmony than the set of pc that are their node contents, for example, since they assert a structure among the pc. Isobel might revolve in her mind some Knet-representable structure of pc so as to come up with a following or larger compositional motive or harmony – though the issue remains of what might motivate her choice of some particular new motivic instance among all those with similar structures. So, given a music-thing represented as a Knet, we need to define a way of getting from the initial Knet to another Knet as a mathematical action on Knets. That is, we need to construct a Net whose 6
Kilp, Knauer, and Mikhalev (2000), pp. 43-45. Rahn (1994). 8 Parmenides from Rahn (2004). 7
292
J. Rahn
node-contents are Knets, a kind of recursive Net, as Klumpenhouwer and Lewin have themselves discussed.9 I will suppose for Nets in general,10 not just for Knets or Lewin-nets, this Principle of Action on Nets: In order to define an action by some algebraic entity H on a Net as a whole it suffices to define an action on an arbitrary edge of the Net, that is, an action of any element h of H on the content of the first node x, the content of the second node y, and on the label (or color) of the arrow g from the first node to the second node (see example 1).
N$&O$& Example 1. Action of H on a Net whose arrows are labeled in G
In Knets, arrow- labels form a group, Tn/TnI. Groups may act on themselves by left multiplication, by right multiplication, or by conjugation. However, if we wish to preserve structure in the result, we need the action of the group on itself to be an isomorphism, that is, an automorphism. Recall that if H is a normal subgroup of G, it maps into itself as a set of elements under conjugation by any element of G. So, if H is 9
Lewin (1990). The discussion I develop below specifically about actions on Knets arrives at results consistent with those in Lewin’s Appendix A and B in this article, which lay out the inner automorphisms of the T/I and T/M group, but my discussion focuses on (dynamic) actions, not (static) isomorphic relations, and uses a different line of argument. The idea of conjugate action within the T/M group has a complex history in music theory. As far as I know, the specifics on conjugation within the T/M group, along with many other useful ideas, were first published by Daniel Starr, in his excellent article “Sets, Invariance, and Partitions” (Starr 1978). On p 28, Starr develops a formula for conjugation with a result equivalent to my Example 6, in a different format. But Starr uses this to discuss the invariance of sets of pc, focussing on the more usual action of the group on sets of pc rather than the action of the group on itself. I had forgotten about Starr’s earlier presentation, and am grateful to Robert Morris for reminding me of it. The demonstration and proofs here are independent. Robert Morris also presented the inner automorphisms of the TTOs in Morris (1987), p. 169. You would have to change the order of operations on Morris’s table heads and rows, flip around the main diagonal (so rows interchange with columns), change names of operators and their arguments, and adjust the arithmetic in the subscripts, to get my Example 7. Again, my development here is independent of Morris’s. See also Morris (2001). It is possible that Starr’s article also lurked, half-forgotten, in David Lewin, since the notation for the automorphisms in Lewin (1990) as “F” is so close to the notation of Starr uses (Starr 1978, p. 11ff, and Table 4), which is “F = ” such that the action on pc is F(x) = ax + b, with the multiplier written first and the transpositional subscript second in the ordered pair, like Lewin’s notation. (Lewin cites Morris (1987), but not Starr (1978).) However, Lewin does not speak in terms of actions at all, which may have led him (seeking to represent a distinction inherent in the notion of action, without action) to cast the automorphisms as a different group than the group of TTOs, when actually it is all the same group, acting on itself by conjugation. 10 Nets generalize Lewin-nets, which generalize Knets. See the definitions in Rahn (2007).
Approaching Musical Actions
293
any normal subgroup of G, then by the definitions, G acts by conjugation on H as automorphisms of H, permuting the elements of H. If H = G (which is possible because G is a normal subgroup of itself), this action is called an inner automorphism of G. Klumpenhouwer himself investigated this situation for structure-preserving actions under the term “network isomorphism,” for the case that H = G = the T/I group (isomorphic to D24) and node contents are individual pc. If the action on the node contents is known, the problem to be solved in general is finding the action on the arrow-label, as shown in the commutative diagram of Example 2. Example 3 shows the solution here, where the group H = G = T/I. This case is simply the (inner) automorphisms of the T/I group, so that the action on the arrow-labels is by conjugation, and ? = hgh-1. The bottom right corner element of the diagram in Example 2 is then hg(x) = hgh-1h(x), as shown in Example 3, with the action on group elements h.g collapsing to simple composition of group elements.
Example 2. The Net action commutative diagram – problem in general
Example 3. Solution for groups H=G
The calculation of hgh-1 is straightforward if slightly tedious, for the four cases. Example 4 shows the four commutative action diagrams for the four cases within the “network isomorphism” action set up in Example 3. A more interesting version of this action is the generalization to the action of the automorphism group whose elements are of the form TaMp(x) Æ px + a, where each p is coprime to the modulus of some equal tempered system of pc. For the case ETS=12, I will refer to this as the T/M group, the familiar full group of TTOs. Both T/I and T are normal subgroups of the full T/M group. Isomorphic action by conjugation still works if the group of arrow labels is any normal subgroup of the group acting on the Nets. For example, let H be the T/M group and G, the arrow-label group, be the group of transpositions, TnM1, or the T/I group, that is, TnM1,11 – or alternatively, of course, another full T/M group, TnM1, 11, 5, 7. Note that the familiar pcs that are the node-content of Knets can themselves be construed as the abelian group of transpositions, T, which is isomorphic to Z12.
294
J. Rahn
Example 4. The four commutative action diagrams for the four cases within the T/I group
It would be interesting to explore T/M for arbitrary ETS, as the group structures alter considerably. For example, when the modulus is itself a prime, such as 19 or 53, every element is coprime to it and generates the entire group cyclically. The number 12 is in fact unusually rich – superabundant – in factors and therefore Z12 is unusually poor in cyclic generators. To define specifically the action of the T/M group on Nets with arrow labels in any of its normal subgroups, we have to solve the diagram in Example 2 for the question mark ? for the case in which g is some TaMp and h is some TnMq, where a and n are elements of Z12 and p and q are 1, 5, 7, or 11. We know this will be hgh-1 so we just have to calculate what that is. The first step is to find h-1. For the left inverse, we set h-1 TnMq = T0M1 and solve the equation to get h-1 = q-1 (x – n) (Example 5). Solving for the left inverse gets the same, as it should since automorphisms form a group. This result only makes sense for q coprime to the modulus. In this case, q-1 = q since q2 = 1, and the formula reduces as shown in Example 5. Written in the form TnMq, the inverse of TnMq is T(-qn) Mq. M4 H ' M4 *$ M + ! '*$ M + H M' ' -#- ", G; MG G*G$ Q + H $ Q G M G H $ Example 5. Inverse of TnMq
The next step is to calculate the conjugation hgh-1. Set h = TnMq and g = TaMp. The calculations are shown in Example 6. Written in the form of TnMq, the conjugation of TaMp by TnMq = T(-pn + n + qa)Mp. This solves the diagram of Example 2 for ? within the T/M group, TnMp, p = 1, 5, 7, 11, modulus = 12.
Approaching Musical Actions
295
M4 H ' (' '*$+ H ' *'$('+ H ' *'$(') + "( ' 5 H 4 H ' 5 $(' 5)' ) !$())' H M))' Example 6. Conjugation in TnMp
We can arrive at a full 4 by 4 matrix of specific conjugations within the T/M group by substituting all the combinations of p’s and q’s into the T(-pn + n + qa)Mp formula from Example 6. The matrix is given in Example 7. Each matrix entry is a solution for the ? of Example 2 for one combination of values for p and q. All this works for non-commutative Nets as well as Lewin-nets. 11
M4 H ' (' ' H
4
G
K
44
' H 4
MDQ G
MIQ K
5Q 44
G
G
MDQG G
MIQG K
5QG 44
K
K
MDQK G
MIQK K
5QK 44
44
M
MD ( G
MI ( K
5 ( 44
Example 7. Conjugations of the T/M group
In a sense, all of this is just bookkeeping. Isobel may be more focused on the nodeobjects, or perhaps on the arrows of the Net. Once Isobel decides to act mathematically on some Net representation, acting on the objects entails corresponding alterations in the arrows, and vice versa. 11
For a non-commutative example, just take T1I acting on the Net with pc 0, 9, and 8 as nodes, and arrowsT9 from 0 to 9, T11 from 9 to 8, but T8I from 0 to 8. T9 and T11 do not compose to T8I so this is not commutative, and is not properly a Knet at all (perhaps a “KNet”). Using the table in Example 7, T1I acting on this Net maps contents 0 to 1, 9 to 4, and 8 to 5, and maps arrows T9 to T3, T11 to T1, and T8I to T6I, as it should. Any additional arrows in a polysemic version would also check out.
Note that non-commutative Nets are a significant generalization, in that they make possible many analytical assertions in the form of Nets that are not possible when the Nets are constrained to be commutative, as are Knets. No longer are you restricted to the few, tightly constrained Knet arrangements of three-node Nets, for example, if you want to assert threenode Nets at all.
296
J. Rahn
Remember that this is all part of our quest for more complex representations that might work as part of a model of musical actions – Isobel’s acts. To further this quest, we can generalize this action of the T/M group in several ways. First, as noted, it is valid for Knets, and for more general Nets which include polysemic and noncommutative Nets, as discussed in my JMM paper “Cool Tools.” Second, it is valid for any modulus, as noted – that is, any ETS. Third, we can generalize the action to more complex node-contents than single pcs, so long as the group has an interpretation in which it can act on the node-contents. We defined such an action for node-objects that are themselves Knets, above. It is even easier when the node-objects have less structure than Nets. For example, we are familiar with the notion that an action on a set of pcs is defined in terms of the set of images of the individual pcs under the action on pcs. Similarly, the action on an ordering of pcs can be defined as the ordering of the images of the pcs under that action. Therefore there is an action on Lewin-nets proper, and in general on Nets, whose node-contents are sets of pcs or orderings of pcs rather than individual pcs. An action on a set of Nets can be defined in terms of the set of images of the individual Nets under that other action. Since a chain is a Net, and a chain-hom-set as defined in “Cool Tools” is a set of chains that are subnets of some larger Net, that is, a set of Nets, we can define a T/M action on chain-hom-sets, so long as the node contents of each chain are themselves the pcs or sets of pcs or orderings of pcs, etc, acted on by the T/M group. These serve also to illustrate a principle that constrains recursion of action or even more generally, layered action: there must be a bottom level in which node content is simple in the sense that the action at that level is defined on it directly. By definition of action on a Net given in Example 1, action on a Net is action on some data set S which is the node contents, and action on the algebraic entities labeling the arrows of the Net. At each level, all actors must act directly on the arrow labels of the arrows at that level, and at least indirectly on the underlying data set S. All the levels must be consistent in that their algebraic entities and data objects eventually “fall through” to the same bottom level. Let’s look briefly at the case of an action on linear orderings of pcs. This is of interest in that the orderings themselves can be interpreted in various useful ways. Of course the orderings may be syntactic, as in serialism, or temporal-linear, as in a representation of a motive. In such cases, conventionally, the action on the ordering is uniform – the same group element acts identically on each of the pcs in the ordering, as shown in Example 8. * (4 (5 - - - ( + H *(4+*(5+ - - - *(+ Example 8. Homogeneous uniform action on orderings
This is a simple two-level action. It could form part of more complex actions, for example, as the next to the bottom and bottom levels of an action on Nets of sets of sets of Nets of orderings of orderings …of orderings of pc. But consider another action on orderings of pcs which is not uniform (Example 9). In this case, each pc at the bottom level is subject to a different operation within a larger action on the ordering of pc. Of course, g would have to be defined as having components gi which act in an appropriate fashion. In this case, the component actions
Approaching Musical Actions
297
are independent, so we can use the direct product of the underlying group acting on the pcs with itself n times, Gn where n is the number of elements in the orderings acted on.
H 4 5 - - - &"
H 4 4 5 5 - - - * ( 4 ( 5 - - - ( + H 4*(4+ 5*(5+ - - - *( + Example 9. Homogeneous non-uniform action on orderings, g element of Gn
There has been considerable attention paid recently to transitions and relations between chords that are sorted into voices in some way, that is, abstractly, by register, by instrument, or whatever. Voice-sorted chords are properly represented as ordered n-tuples of pitches or pc, so that (unlike in multisets) you can keep track of the voices.12 For any of these problems one can use the action of Example 8, for G = T or T/I or T/M or some restriction of these such as moving at most one or two voices by 1 or 2 semitones within group T.13 Examples 8 and 9 are labeled as “homogeneous uniform action” and “homogeneous non-uniform action.” A uniform action on a complex object operates on each element of the complex object by the same element of the algebraic entity, e.g. the same group operation. In Example 8, each pc in the ordering pci is operated on by g. The non-uniform action of Example 9 operates on each element of the ordering pci by a different group element gi. A homogeneous action is a multilevel action in which the same algebraic entity is used at all levels, as is the case in Examples 8 and 9. A more complex example would be using the T/M group at each level of the “action on Nets of sets of sets of Nets of orderings of orderings …of orderings of pc,” with each action of the T/M group homogeneously “falling through” to the next level and eventually to the pcs at the bottom level. Clearly, a homogeneous action may be either uniform or non-uniform. The idea of a heterogeneous action is a bit more complicated. It is possible that at different levels of a multilevel action, different algebraic entities are used to move to the next level. At each level, the action to the next level must be properly defined on the objects and arrows. Note that this action may be either uniform or non-uniform. Let the action from the i-th to the i+1-th level be notated Acti. The specification of each Acti proceeds as usual; examples 8 and 9 are examples of such a specification for uniform and non-uniform actions, respectively. See Example 10. Level-hetero actions are layered, but not recursive.
12
A multiset simply indexes its elements by their multiplicities, so a given content element can appear more than once, but the elements are still unordered; it still has no way of tracking a particular element-position from set to set. 13 Brandon Derfler is writing a Ph.D. dissertation at the University of Washington on parsimonious voice-leading chord spaces using this idea, entitled “Single Voice Transformations: A Model for Parsimonious Voice Leading.”
298
J. Rahn
L $(" $ &,& " $ T(U H 4 - - - M 4
() )$ $() ( ) T(U $0 &,& " &,& Q 4Example 10. Heterogeneous Action
There is one more distinction we can make among complex actions. Define a note (we could even use the term “sound,” but this might get confused with the aural sensation) as an (ordered) n-tuple (list, vector) of dimensional values in a way familiar from computer music; for example <start-time, duration, pitch, loudness, usw>. This representation might be quite complex; I have written and used csound instruments that take over 40 parameters. However, for easiest mathematical treatment we would want to make sure that all the parameters were independent of one another. The case of parameters that are partly dependent on each other, for example the usual treatment of attack, duration, sustain and decay controls in computer music, is more complicated. Suppose then we have a note with n independent parameters pi, represented as a list or n-tuple. In general, each parameter may have a different perceptual space, which may require a different algebraic object – let’s just say, group – to act on the objects in that space. If the parameters are independent, we can define an action on the notes using the direct product group of the respective groups for each parameter in the list. (See Example 11.) The action in Example 11 is non-uniform because each parameter is acted on by a different group element, but in addition, the groups from which the group elements are taken also vary from parameter to parameter. Instead of being an exponentiation of some one group G as in Example 9, this is the direct product group of a number of different groups. Yet this action remains homogeneous in the sense that the same action is applied at all its levels (one transition). We call it a “levelhomogeneous group-heterogeneous non-uniform action.”
( ( &! " # H
4 5 - - - V ! 4 5 - - - &" H 4 4 5 5 - - - * 4 5 - - - + ! 4*4+ 5*5+ - - - * + Example 11. Level-homogeneous group-heterogeneous non-uniform action on orderings
So actions can be: level-homo or level-hetero, group-homo or group-hetero. If group hetero, then non-uniform. If group-homo, then either uniform or non-uniform. Ignoring “hard” dependencies such as “if group-hetero then non-uniform,” there would be a total of 8 such classifications, a sort of Eight-Fold Way. If we bring other entities such as rings, fields, and modules into the picture, the scheme exfoliates too madly to bother with. Admittedly, this classification scheme for actions is rather complicated. However, this is one of its virtues, leading us away from any simplistic tendencies and toward ideas of mathematical actions that are complex enough perhaps to begin to model Isobel’s musical acts.
Approaching Musical Actions
299
Finally, let’s take a different approach to Isobel’s representations.14 We informally tie together three notions: relation, net, and space. Define an n-ary relation R in the usual way as any subset of the Cartesian product of n sets. If all the sets in the Cartesian product are the same, the product is homogeneous; if not, heterogeneous. If it is homogeneous, we call it a relation “on” its unique underlying set. Considering R as a relation, we can ask if it is reflexive, symmetric, transitive, connected, and so on. This can also be considered an n-dimensional net, or digraph, when a net is defined as Lewin did originally, identifying arrows with ordered pairs (tuples, in the general case) of objects. This is also interpretable under certain conditions as the total note-space, with each point in it a note as defined above. Each subspace, and in particular each dimension, would have its own metric, defining its interval structure, and so on in the familiar development of transpositions and other isometries and isomorphisms. We can color this net or relation in more or less elaborate ways by giving a formal interpretation of it, as defined in Example 12 for homogeneous relations on a set S (the definition for heterogeneous relations follows mutatis mutandis).
" $ &$" , " $ $ $"(
"(" . *$ + / $#""# $() $ " $" .( / "
) %& 0 " () $ %$! )$ ,! 7&
$ " , * $"#"# 4 + ) (" "# 7& $# . * $ + " &$" ,0 %) ,0 ) <$& %& = &$" "# & ! , Example 12. Definition of an interpretation
A simplified version of this idea of interpretation can serve to define the more general Net idea, by adding a coloring, an n+1th place to each n-tuple which is occupied by a list of arrow labels. (There is more to it than this, but I am leaving it out to simplify; for a fuller formal semantics, see my article “Network Models.”) Each such list of arrows is homdirectR(x, y), the list of all arrow-labels directly from point x to point y (path of length one). This would be an alternative to my construction in “Cool Tools.” A coloring can also be viewed as a binary relation, between the n-tuple points in the space (or if it is viewed as a graph, arcs) and the colors in the n+1th place. In general any n-ary relation can be built recursively from binary relations. The set of all homogeneous binary relations on a set Y, that is, the power set of Y × Y, with composition of binary relations defined in the usual way (see Example 13), is a monoid, with the diagonal of all and only <x, x> as its identity relation. Monoidal acts are defined for it (see the definition of S-act). So we could define a Net whose mode-contents are binary relations, and whose arrow-labels are also binary relations that act on the node content! This would open up quite a new field of inquiry about Nets. 14
What follows is endebted to Rahn (1994).
300
J. Rahn
L "$! &$" , " $ 2 )
#) N&O (" , $" , Y H
T*$ 1+ " 2 2 X ) / & " 2 %) $ & $" &,1U Example 13. Composition of binary relations
Finally, consider phrase-structure grammars, which have been used as models of Schenkerian-type theories of tonal music and therefore are one kind of plausible candidate for external representations of Isobel’s internal representations of piece-tilnow and piece-to-come. Any such grammar is a kind of finite-state machine and therefore can be formalized as an S-act.
Example 14. Grammar and metagrammar
Approaching Musical Actions
301
These grammars produce structures that are trees, which are a kind of graph or net (or relation!) that is partially ordered in a particular way. Note that the grammar does tie together in this way the ideas of mathematical action, relation, and net. Example 14 is taken from my 1994 article, “Network Models.” It shows a little invented phrase-structure grammar for a kind of non-tonal music, purely as a methodological illustration, formalized as a formal theory with axioms and with inference rules in the form of Emil-Post productions, along with one particular derivation sequence modelling the structure in this grammar of one of the musical pieces, that is, theorems or sentences, producible by this grammar. Imagine Isobel imagining her piece in this way, that is, a basic vocabulary of pc sets, a secondary vocabulary emphasizing pc sets of Tn-type (0 3 7) and (0 2 5) and emphasizing transformation of any pcset by T1. She realizes that one stretch of her music, perhaps even the music-til-now, is the particular production within this grammar shown in Example 14 as A Derivation. Clearly, many other stretches of music can be represented as sentences within this grammar, but not just any stretch of any music. So the grammar as a whole lends a certain flavor to the piece, which from Isobel’s standpoint, is good. But what does Isobel do next? She would like to move to some closely related new stretch of music, but it should not be so closely related that it does not sound like something new. To generate just any new stretch from the grammar would probably not constrain the production enough, that is, the new thing would not necessarily be close enough to the earlier thing. Isobel realizes that the particular derivation sequence that produced her piece-tilnow would constrain new production more than the grammar as a whole, but in a way that need not produce something too close. She can theorize the production sequence, as shown in the “meta-grammar” in Example 14, as itself a kind of finite-state machine. This meta-grammar is the bottom diagram of Example 14. Isobel can operate this machine to get a next slice of her piece that is closely, but not too closely related, to what has come before. We have come to the end of this for now, leaving further development and applications of all these ideas to later research. I will feel successful if this has focused attention on the formal modelling of creative musical acts, and has illustrated and encouraged thinking in terms of models that are not simplistic, but are complex enough to be credible.
References Kilp, M., Knauer, U., Mikhalev, A.: Monoids, Acts, and Categories, with Applications to Wreath Products and Graphs. de Gruyter, Berlin (2000) Lewin, D.: Music Theory, Phenomenology, and Modes of Perception. Music Perception 3(4), 327–392 (Summer 1986) Lewin, D.: Generalized Musical Intervals and Transformations. Yale University Press, New Haven (1987) Lewin, D.: Klumpenhouwer Networks and Some Isographies That Involve Them. Music Theory Spectrum 12(1), 83–120 (Spring 1990) Morris, R.: Composition with Pitch Classes. Yale University Press, New Haven (1987)
302
J. Rahn
Morris, R.: Class Notes for Advanced Atonal Theory. Frog Peak Music, Lebanon HN (2001) Rahn, J.: Repetition. Contemporary Music Review 7, 49–58 (1993) Rahn, J.: Some Remarks on Network Models for Music. In: Atlas, R., Cherlin, M. (eds.) Musical Transformation and Intuition: Essays in Honor of David Lewin. Raphael Atlas and Michael Cherlin, pp. 225–235. Ovenbird Press, Roxbury (1994) Rahn, J.: The Swerve and the Flow: Music’s Relationship to Mathematics. Perspectives of New Music 42(1), 130–150 (Winter 2004) Rahn, J.: Cool Tools: Polysemic and noncommutative Nets, subchain decompositions and cross-projecting pre-orders, object-graphs, chain-hom-sets and chain-label-hom-sets, forgetful functors, free categories of a Net, and ghosts. Journal of Mathematics and Music 1(1), 7–22 (2007) Starr, D.: Sets, Invariance, and Partitions. Journal of Music Theory 22(1), 1–42 (1978)
A Transformational Space for Elliott Carter's Recent Complement-Union Music* John Roeder University of British Columbia School of Music [email protected]
Abstract. Elliott Carter's recent music exploits a special combinatorial property of the all-trichord hexachord. I show how this property can be reconceived in terms of interesting and analytically significant musical transformations: three involutions on the pitch-class aggregate which constitute a Klein four-group, and which have a natural interpretation as the symmetry group on a particular 12-vertex geometrical structure. Accordingly the opening of Carter's Figment II for solo cello can be analyzed transformationally as a complete traversal of this structure by just a few, striking, characteristic gestures.
Scholars of Elliott Carter's music have noticed refinements in his compositional technique over the last two decades. The composer affirms, "From about 1990, I have reduced my vocabulary of chords more and more to the 6-note chord no. 35 and the 4note chords nos. 18 and 23, which encompass all the intervals" (Carter 2002, ix).1 The hexachord set-class (sc) 012478 he mentions is unique in the sense that it contains an instance of every one of the 12 classes of trichord. Accordingly scholars call it the alltrichord hexachord (or ATH). It also exhibits what Robert Morris (1990, 182) has theorized as the "complement union property" (CUP): there exist two set-classes X and Y such that the all-trichord hexachord always results from the union of any Xclass set and any disjoint Y-class set. Only 21 of the 50 hexachord set-classes exhibit this property (Considered as a pair, the all-interval tetrachord-classes Carter mentions also exhibit it). But the all-trichord hexachord is nearly unique in manifesting the complement union property two ways: X and Y can be set-classes 048 and 016, or 0167 and 04. In other words, the union of any augmented trichord with any three other notes that form a 016 trichord will always be an ATH; and the union of any 0167 tetrachord with two other notes that form a major third or minor sixth will always be an ATH. A particularly clear exploitation of the complement union property of the alltrichord hexachord is the first section of Carter's Figment II–Remembering Mr. Ives *
This research is supported by a Standard Research Grant from the Social Sciences and Humanities Research Council of Canada. 1 In this paper I will refer to sets of pitch classes (pcs) by enclosing pitch-letter names—or their standard integer representations, C=0, C#=Db=1, etc.—in set brackets. Set-classes (scs), that is, sets of pc-sets related by transposition or inversion, are named by their prime form, without brackets or commas. Carter, though, uses his own system for denoting set classes; his tetrachord no. 18 means the set-class with prime form 0146, his tetrachord no. 23 is 0137, and his hexachord no. 35 is 012478. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 303–310, 2009. © Springer-Verlag Berlin Heidelberg 2009
304
J. Roeder
for solo cello (2001). Example 1 shows an annotated score in which segments of the music are bracketed and labeled with circled integers. During each segment an alltrichord hexachord is formed the same way. Two perfect fifths are presented a tritone apart; so they form a member of sc 0167. To them is added an 04-class dyad (ic4), the notes of which are labeled and boxed on the Example. The result of this combination is, in every case, an all-trichord hexachord. Different types of brackets on the Example indicate how the successive ATHs group into regions characterized by the member of sc 0167 that they hold in common. For instance, every one of the dottedbracketed ATHs, numbered 5 to 8, contains the pc set {E,F,Bb,B}, or {4,5,t,e}, so their succession is labeled as the "45te region." This bracketing clarifies one aspect of the musical form. The piece begins with what I will call an opening fanfare, in which the fifths and major tenth are exposed for the first time. Starting in m. 4 the music then presents four ATHs, each composed of
Example 1. Carter, Figment II–Remembering Mr. Ives, mm. 1-22, and its all-trichord hexachords
A Transformational Space for Elliott Carter's Recent Complement-Union Music
305
the same 0167-type set combined with an ic4 dyad. The ic4 dyads are disjoint with each other and with the 0167-type set, so the series of four ATHs completes the pc aggregate. Next, the music presents four more similarly related ATHs, sharing the same 0167-type set, and combining it in turn with four mutually disjoint ic4 dyads. Then the music does the same thing based on a third 0167-type tetrachord. Starting from m. 4, then, twelve ATHs are presented, based on three disjoint members of sc 0167. These three 0167-type sets combine to complete the aggregate on a larger scale. Many subtle details of the passage support this abstract compositional design. After {D#4, G#3} is repeated in the fanfare, a strong contrast of pitch content articulates the beginning of the 0167 ATH-region in m. 4, specifically, the two new pcs F# and C#, the latter of which is registrated as the highest pitch yet. Also at this moment, the dynamics begin to continuously crescendo and diminuendo, rather than change abruptly as they did in the preceding fanfare. Each ic4 in mm. 4-9 is orchestrated as a major-third pitch-simultaneity, giving the 0167 ATH-region aural consistency. Starting at the breath mark at m. 9, the ic4s change to minor-sixth pitch intervals, helping to articulate the beginning of the next ATH-region, 45te, centered on the {Bb, F, E, B} tetrachord. The registral, rhythmic, and dynamic climax of the passage, in m. 13, also ups the harmonic ante by presenting in quick succession a minor sixth, a major third, then a major seventh. This last simultaneity interval, which has not appeared previously in the piece, triggers the beginning of the 8923 ATH-region, which is also marked by a temporary disruption of the display of the complement union property. That is, rather than forming 0167s, the perfect fifths are incorporated into the two all-interval tetrachords marked on the score. This contrast, however, helps prepare for closure, which is achieved around m. 17 by restoring the complement-union partition of the all-trichord hexachords into 0167s and 04 dyads; the sense of return is supported further by the 04 dyads' presentation as major-third pitch intervals, as at the beginning of this piece. Indeed, Carter cleverly contrives mm. 19-21 to recall both the dyads of the opening fanfare and the cadential-sounding openstring C2 of the 0167 region. This account implies a simple combinatorial conception of this music, identifying how each instance of the all-trichord hexachord is composed of the same subset types. It is also consistent with the composer's conception to the extent that it is apparent in the recently published Harmony Book, in which Carter systematically tabulates these set combinations (Carter 2002, 159). Example 2a shows the relevant excerpt from that book, in which hexachord no. 35 is shown to result from combining tetrachord no. 2 with four different major thirds. Accordingly, Guy Capuzzo (1999; 2004) analyzes CUP-manifesting passages of ATHs in Carter's Gra, Changes, and other recent compositions, as realizations of compositional spaces (graphs) whose nodes contain sets belonging to two different subset types, and whose edges signify how they combine into ATHs. An example of the kind of graph Cappuzo draws for the ATH is shown in Example 2b. The central node of the graph contains the pc set {4,5,t,e}, conforming to Carter's Harmony Book example; surrounding it are nodes containing instances of ic4 that do not intersect with {4,5,t,e} or with each other. Edges connecting two nodes indicate that the contents of the nodes combine into an ATH. Capuzzo describes such spaces for other CUP-manifesting set-classes, including the pair of AITs.
306
J. Roeder
(a)
(b)
Example 2. (a) Carter, Harmony Book, p. 159 (b) Graph showing the CUP of ATH
However, the highly limited hexachordal vocabulary of Figment II also seems ideally suited to be understood in terms of musical transformations (Lewin 1987; 1993). This involves defining a family of objects of the same type, structured by a semigroup (at least) of transformations, each of which changes one object to another in the same family. On that basis one may construct a transformational space — a network whose nodes represent the objects, linked into a structure by a minimal set of transformations (Gollin 2000). Transformational analysis also identifies characteristic ways that objects are transformed during the piece. These characteristic transformations are conceived as gestures that correspond to distinctive motions in the space, which give the gestures location, direction, landmarks, and orientation. Repetitions of gestures, or changes of gestures, can be understood to articulate musical form. Generally speaking, transformational analyses add focus and temporal orientation to more combinatorial accounts. Capuzzo himself provides a basis for a more transformational conception of Carter's recent complement-union music. If one considers the family of ATHs that share a member of 0167, and the twelve-tone operations that hold the 0167-type subset fixed, then those operations form a simply transitive group of characteristic transformations that engage the [0167] structure at another level. Capuzzo does not develop this idea much further, though, perhaps because the space (and the excerpt he analyzes from Gra) involves only four ATHs. However, a recent article by Adrian Childs (2006) develops a transformational system for CUP-manifesting all-interval tetrachords (AITs). Childs conceives of the transformations among AITs as contextual, partial transpositions, which hold one dyad of an AIT fixed while transposing the other to yield another AIT. Although partial transpositions in general do not have a semigroup structure, the constraints of CUP ensure that these particular ones do. Childs's analyses demonstrate how these transformations succinctly express AIT-preserving voiceleading in Carter's music; however, he does not propose a transformationally structured space in which characteristic gestures can be heard to act. Now, following Lewin's typical strategies, one might analogously define transformations that preserve the 0167s in each quadruple of related ATHs (Capuzzo suggests this as a possible extension of his work). Two of these would be contextual inversions and the other would be T6. For the task of analyzing Figment II, however, this solution seems overly general in two respects. For one thing, the passage does not obviously feature hexachords; rather, it focuses on dyads, with the ATHs acting as a
A Transformational Space for Elliott Carter's Recent Complement-Union Music
307
less immediately salient harmonic background. Also, only three of the six 0167s, and only 12 of the 24 ATHs, appear, so it does not seem appropriate to invoke the idea of a contextual inversion that would apply to all ATHs. A more appropriate and analytically expressive structure results from considering three involutions on the pitch-class aggregate that are not pc inversions. Written in cyclic notation, they are: S = (01) (67) (23) (89) (45) (te), F = (07) (16) (29) (38) (4e) (5t), and T = (06) (17) (28) (39) (4t) (5e). The label S stands for Semitone, because it preserves specific (but not all) dyads in ic1. F (for Fifth) preserves specific dyads in ic5, and T (for Tritone) preserves all the ic6 dyads. They also preserve every member of tetrachord class 0167 that appears in Figment II; for example S({2,3,8,9}) = {2,3,8,9}, because S transforms 2 to 3, 3 to 2, 8 to 9, and 9 to 8 (They do not have the same effect on ATHs that do not appear in this passage; for example F({1,2,7,8}) = {0,3,6,9} not {1,2,7,8}). To see how these involutions are manifested musically, consider in Example 1 the succession from ATH 1 to ATH 2, which pivots on the two perfect fifths in mm. 6-7 that they hold in common. The succession from the first of these fifths,, to the second, , can be regarded as the transformation of {6,1} by any of these three transformations, S, F, or T, into {0,7} (The T transformation even supports hearing these fifths as having harmonic roots, as it transforms the root of one into the root of the other). Interestingly, the operation T also transforms the ic4 dyad in ATH 1, {4,8}, to the ic4 dyad in ATH 2, {t,2}. Indeed, the series of dyads at the beginning of ATH2, <{6,1},{0,7},{t,2}>, is the retrograde of the T transformation <{t,2},{0,7},{6,1}> of the series of dyads that end ATH1, <{4,8},{6,1},{0,7}>. So we can regard the background change from ATH 1 to ATH 2 as an elision of these two series, that is, what Lewin (1987) would call a retrograde T-chain of the ATH1 dyad succession. Similarly, the change from ATH 3 to ATH 4 can be heard as the elision of the dyad succession <{5,9},{6,1},{0,7}>, in mm. 8-9, with its retrograde Ttransform, <{6,1},{0,7},{e,3}>.
Example 3. Transformational analysis of all of Figment II, mm. 1-22
Example 3 shows how every ATH succession within each region can be analyzed similarly as the transformation of an ic4 dyad and two perfect fifths by one of the involutions S, F, and T. Each region alternates two transformations in a distinctive way, creating form. The first region focuses on T, using it twice in alternation with F. The beginning of the second region is articulated by the introduction of a contrasting,
308
J. Roeder
as yet unheard transformation S, but makes some accommodation with established procedure by alternating it with the T transformation that was focused on earlier. The last region focuses on the hitherto subsidiary transformation F. The return of T in alternation with F here gives the passage not only consistency but also closure, via the rounding of an ABA form. As shown by the brackets at the bottom of the example, in each region appears a different number of retrograde chains (involving at some point every one of the three involutions), peaking in the middle region, the only time span when ATHs are transformed by S. The involutions also suggest a transformational space suitable for analyzing Figment II. Since SF = FS = T, FT = TF = S, and TS = ST = F, they, along with the identity, constitute a Klein four-group, which has a natural geometrical interpretation as the group of symmetries of a rectangle (a subgroup of the symmetries of a square). To illustrate, Example 4 shows three corresponding symmetries acting on squares whose vertices are labeled with the pcs of {0,1,6,7}. Under this particular labeling scheme, T acts like flipping the square about a vertical axis, as shown from the first square to the second in the example, F acts like flipping the square about a horizontal axis, and S acts like the rotation of the square by 180 degrees. Comparing the leftmost and rightmost squares, which are identical, shows that by successively applying F then S after T, the square is transformed back to its original orientation.
Example 4. Isomorphism of the group of ATH-preserving involutions to the Klein 4-group
Now consider the toroidal figure in Example 5a. It is composed of three squares, each with its vertices labeled by the pitch classes from one of the three members of sc 0167 in the music's regions (pc t is shown as 10, and e as 11). Vertical sides of the squares represent F transformations, and their horizontal edges represent T transformations; for example, F transforms the pc 2 at the lower right corner to the pc 9 at the vertex connected vertically to it, and T transforms 2 to the pc 8 at the vertex connected horizontally to it within the same square. The vertices of different squares are connected by edges to form triangles whose vertex pcs belong to sc 048. Each ATH in Example 1 is represented on such a figure by the vertices of a sc-0167 square, plus two vertices, one from each of the other two squares, that form a 048-type trichord with one of the vertices of the first square. For instance, ATH 1 is composed of pcs {0,1,6,7}, the vertices of one square, plus the pcs 4 (from the {4,5,t,e} square) and 8 (from the {8,9,2,3} square), which form a 048-type trichord with 0. In Example 5a, the pcs of the {0,1,6,7} square are shown as two arrows indicating two members of ic5s, the dyads {0,7} and {1,6}, and the pcs of the 048 trichord are shown as the tail of one of those arrows (at the pc 0) plus the two gray balls on pcs 8 and 4 forming ic4. Shading highlights the 0167 square and the 048 triangle. Similarly, Example 5b shows ATH 2 to be composed of the ic5 vertex pairs of the same {0,1,6,7} square plus the pcs t and 2, an ic4 that forms a 048-type trichord with pc 6.
A Transformational Space for Elliott Carter's Recent Complement-Union Music
309
In this space the pitch-class transformations that manifest CUP in Example 2 can be visualized as distinctive gestures that change the location of the ic5s and the ic4. This is suggested by the juxtaposition of Examples 5a and 5b. The Ttransformation of ATH 1 to ATH 2 takes the solid arrow directed from 6 to 1 in Example 5a and translates it horizontally across the {0,1,6,7} square, so that now connects 0 to 7. T similarly takes the dotted arrow directed from 0 to 7 and translates it horizontally to connect 6 to 1. The vertices of the triangle involving the pcs 0, 4, and 8 are translated horizontally across their respective squares to the vertices 6, 10, and 2, respectively. Thus the T transformation preserves the square- and triangle-relationships among the ATH-forming vertices, and it manifests the complement union property of the ATH by pairing the same {0,1,6,7} square with two different ic4 dyads, {4,8} and its T-transform {10,2}. Likewise the juxtaposition of Examples 5b and 5c clearly displays the invariants that characterize the vertical transformation F from ATH2 to ATH3, and so highlights their complement union property Example 5. Geometrical representation of the under its action. F takes the succession from ATH1 to ATH2 to ATH3 dotted arrow directed from 6 to 1 in Example 5b and translates its ends vertically, so it is now directed from 1 to 6. Similarly F exchanges the pcs 0 and 7 connected vertically by the solid arrow. The vertices of the triangle involving the pcs 6, 10, and 2 are translated vertically across their respective squares to the vertices 1, 5, and 9, respectively. The S transformation, not shown in the Example, effects a different and distinctive geometrical action, moving balls and arrow ends
310
J. Roeder
diagonally across the square, but also maintains the geometrical relationships among them, thus manifesting CUP. The change from one 0167 region to another, as between ATH4 and ATH5 in mm. 9-10, involves another symmetry of this threedimensional figure—a rotation by 120°, along the edges of the triangles, from one square to another. In effect then, the space is a Cayley diagram that manifests the coset structure of the group actions. It also provides a striking medium to visualize the succession of twelve ATHs that structures the passage, which was described in connection with Example 1. As suggested by Example 5, the succession of four CUP-related ATHs corresponds to the systematic union of an 0167 square with each of the four unique ic4 pairs of the remaining pcs, each pair consisting of pcs from the other two squares. Visually, the progression through the three CUP-manifesting quadruples of ATH repeats this combinational process for each of the three 0167 squares in the space, so that eventually every vertex functions as a member of both a square and a triangle. Accordingly the passage can be understood transformationally as a complete traversal of this structure by just a few, striking, characteristic gesture. A three-dimensional computer animation that can be viewed at theory.music.ubc.ca/~trx/analyses.html makes vivid these transformational gestures, and the musical form of Figment II that they manifest within the space. As an analysis, it subsumes the combinatorial and contextual-inversion perspectives into a temporally oriented mode of listening.
References Capuzzo, G.: Variety within Unity: Expressive Means and Their Technical Ends in the Music of Elliott Carter. Ph.D. dissertation, University of Rochester (1983–1994) Capuzzo, G.: The Complement Union Property and the Music of Elliott Carter. Journal of Music Theory 48(1), 1–24 (2004) Carter, E.: Harmony Book. Hopkins, N., Link, J.F. (eds.). Carl Fischer, New York (2002) Childs, A.: Structural and Transformational Properties of All-Interval Tetrachords. Music Theory Online 12(4) (2006) Gollin, E.: Representations of Space and Conceptions of Distance in Transformational Music Theories. Ph.D. diss., Harvard University (2000) Lewin, D.: Generalized Musical Intervals and Transformations. Yale University Press, New Haven (1987) Lewin, D.: Musical Form and Transformation: 4 Analytic Essays. Yale University Press, New Haven (1993) Morris, R.: Pitch Class Complementation and its Generalizations. Journal of Music Theory 34(2), 175–245 (1990)
Networks Tom Johnson 75 rue de la Roquette 75011 Paris Tel.: 33 1 43 48 90 57 [email protected] www.tom.johnson.org
I was pleased to learn that it would be possible to make a little exhibition of drawings here in Berlin, as well as presenting my lecture, because the Networks I’m working on now are as much visual as aural, and I think these structures need to be seen as well as heard. Of course, those who are simply reading this text will only see the drawings that are included here, but hopefully this will be enough to convey the general idea. Many composers make graphs, tables, or charts of some sort to calculate the details of their music, and I have been doing this for a long time, but finding the system visually has become a particularly important part of the Networks I have been working on since 2005, which have mostly to do with harmony and with defining groups of chords. My earlier music also sometimes concerned chord groups and combination theory, the most obvious example being The Chord Catalogue, but this interest took a new turn when a young Dutch Composer, Samuel Vriezen, showed me how he had composed a group of 11 five-note chords such that each chord had two notes in common with each other chord, and a mathematician friend, Jean-Paul Allouche, suggested that I investigate block designs. This is a relatively new kind of combination theory that I don’t think Vriezen had ever studied either, and which, in fact, is not widely know even to mathematicians. The principles are rather simple, however, and after studying the subject a bit, I realized that these networks of subgroups or chords could lead me to rich and yet unknown musical materials. Let me begin with the example of a block design known as (7, 3, 1). That means that 7 notes (elements) are divided into chords (sub-groups) of 3 notes, in such a way that each pair of notes comes together in one chord. One can do it in this way: (1,2,3), (3,4,7), (2,4,6), (2,5,7), (1,6,7), (3,5,6) (1,4,5), but one can also solve the problem in this way: (1,2,4), (2,3,7), (4,6,7), (2,5,6), (1,5,7), (1,3,6), (3,4,5). How can we see the relationships between these three-note chords? Is there a way to combine the two solutions? How do we find the beginning, the end, the continuity of the logic? How can we find the nerve of the system? Well, the answer to all these questions is to begin drawing pictures. In this way I found four different representations, all of which are in the exhibition, but I particularly want you to see the network in this way, where the white triangles represent the first solution, the shaded triangles represent the second, and the 14 chords are all shown twice: T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 311–317, 2009. © Springer-Verlag Berlin Heidelberg 2009
312
T. Johnson
Of course, mathematicians have been studying such structures for a long time, and I was fortunate to make contact with two mathematicians from the University of Vermont who took an interest in my drawings and wrote some comments, which are posted along with the drawings in the exhibition space. About this particular drawing, Jeffrey Dinitz, who specializes in combinatorial designs, and Dan Archdeacon, who studies topological graph theory, sent me the following text: This shows four representations of the universal coverings of K7 (the complete graph on 7 vertices) on the torus. In each case the seven shaded triangles form one Fano plane (the projective plane of order 2) and the seven white triangles form another. One can see here how (7,3,2) combines two (7,3,1) systems, one of white triangles and one of shaded triangles. To hear this network, we can simply assign the numbers to a scale of seven notes, and read the white circle followed by the gray circle, as I have done in the music notation below. Of course the symmetries would be the same using any seven-note
Networks
313
scale, but I rejected dozens of candidates before finding this one, where the notes and chords sound truly equal and the music seems to homogenize. I am going to play the 14 chords on the piano, but of course it is not really piano music, and could be heard equally well on another instrument (or instruments):
Could you hear the order? Could you hear that each note occurred the same number of times, that each pair of notes came together the same number of times, that everything is in perfect balance? Well, I must admit that I don’t really hear this either, but it is remarkable how clearly one hears if there is a mistake, so the ear is somehow sensitive to what is going on. And of course, this is a new way of listening, and any new way of listening does require a bit of training. No doubt in a few years we will have learned how to perceive such patterns more easily. Another question is whether we can call this a piece of music. I didn’t really compose it. I simply found it within a mathematical phenomenon, and for some time I felt that such things were simply models or prototypes, rather than actual pieces of music. At the same time, when I try to expand the progression, to add a melody, to move the music through a series of variations, the result always seems vulgar. The sequence is much more satisfying in its natural state, and since the natural numbers really are a part of nature, this can be regarded as a little gem found in nature. It is quite lovely just as a diamond in the rough and does not need to be cut and polished. Another block design with musical potential is (13,4,1), a collection of 13 four-note chords. In other cases there are many ways of forming a group of symmetrical subgroups, but in the case of (13,4,1) there is only one solution. Of course, you can always exchange the twos with the eights, for example, but this will just be a morphism of the basic block design. Again each note and each pair of notes occur the same number of times, and again it seems necessary to draw some pictures in order to uncover the nerve of the system. In this case the essence of the structure seems to emerge best in the following diagram. Let us listen to the sequence of two times 13 chords by alternating between the inner and the outer circles. You probably won’t be able to count fast enough to be sure that all 13 notes occur the same number of times, and that the chords appear twice each, but I think you can hear that the music is somehow turning in a circle, and if I play a wrong note, you will hear that there was a mistake.
314
T. Johnson
Networks
315
The comment offered by Dinitz and Archdeacon concerning (13,4,1) is probably too technical for most musician readers, but let me quote it for those who will understand: This is the projective plane of order 3. It can be obtained from the (9,3,1)-design by adding 4 new points {1 , 2 ,3 ,4} and a new block containing them. Then to each block in the ith parallel class of the (9,3,1) design add the point i . Again the resulting chord sequence is terribly short to be considered a musical composition, but it is a complete system nonetheless, and as I continue to study all these networks of chords, I have gradually concluded that such systems must be considered finished objects. If I try to “develop” the material the way composers are taught to do, I don’t really improve anything. Now, in August 2007, as I revise this text for the printed edition, I am also preparing an edition of about a dozen of these little Networks.1 Of course, some combinatorial designs have more blocks and take more time. A good example is (9,3,1). The basic solution here consists of only 12 three-note chords, in which each note is used four times. But this group of 12 chords can be expanded to what the mathematicians call a large (9,3,1), combining seven different solutions to the problem. Now we have a system of 7 * 12 = 84 chords, in which all 84 combinations of the nine notes, taken three at a time, are included exactly once. That is difficult to draw, so I’ll just give you the music.
1
Available from Editions 75. Visit www.tom.johnson.org for more info.
316
T. Johnson
Networks
317
I will not take the space here to include the three unique solutions of (10,4,2), but since we are so attached to our 12-tone chromatic tradition, I want to show you my realization of (12,4,3), so that you can see some 12-tone music that comes directly from a block design. Incidentally, Jeffrey Dinitz told me that mathematicians can prove the existence of 14 million non-isomorphic solutions for the (12,4,3) problem. Many of these are known as “resolvable,” which means that the 33 chords of this group can be divided into 11 groups of three chords, each of which includes all 12 notes. Here is the drawing and the music notation for my (12,4,3). It is resolvable, so each measure, each group of three chords, contains the complete scale: It is also possible to form a large (15,3,1), which makes the 13 * 7 * 5 = 455 chords of Kirkman’s Ladies, a 13 – minute piece that I wrote in 2005. Another block design, known as 4-(12,6,10) produced the 330 chords of Block Design for Piano (2006), an 18-minute composition. Now I am completing a Septet, which transforms the 11 chords of (11,5,2) into 10 different solutions. And enough new possibilities arise that 2 I expect to be exploring this area for some years to come. This is of course only a miniscule introduction to block designs and to the chord groups one can find within this branch of combination theory. The definitive book on the subject, the Handbook of Combinatorial Designs (Chapman and Hall/CRC, second edition 2007), edited by Charles J. Colbourn and Jeffrey H. Dinitz, provides about a thousand pages of supplementary reading for those who wish to go further.
2
After the Berlin conference and finishing the "Networks" edition, I wrote a 12-minute Septet taken from (11,5,2) that was premiered in The Hague in September. The following months I spent most of my composing time with the 55 chords of a particular solution of (11,4,6), which became a series of drawings and also a 20-minute piece for two electric keyboards. Currently I am concentrating on a number of different solutions to (12,4,3), again with both drawings and compositions. There are over 17 million ways to form the 33 blocks of this design, and the resulting music is totally devoted to the equality of the 12 notes and at the same time has almost no resemblance to serial traditions. TJ, March 2008.
From Mathematica to Live Performance: Mapping Simple Programs to Music Katarina Miljkovic New England Conservatory of Music [email protected]
Abstract. This paper focuses on selected simple programs used to model generative processes for basic elements of music material such as rhythm, pitch and texture, as well as large-scale works of music. After presenting decisions on sound mapping procedures, I’ll introduce the system NKM, A New Kind of Music, designed by Peter Overmann, director of software technology for the Mathematica programming environment. NKM is a system controlled by cellular automata (CA), modeling a number of processes in nature. The CA presented in the paper belong to a group of elementary rules that encapsulate four classes of complexity, from simple to universally complex, conceived by Stephen Wolfram and presented in his book A New Kind of Science. All of the examples were generated in Mathematica, the software by Wolfram Research Inc. Mathematical basis for the examples can be found in the book A New Kind of Science.1
1 Background …just as a researcher into nature strives to discover the rules of order that are the basis of nature, we must strive to discover the laws according to which nature, in its particular form “man”, is productive. And this leads us to the view that the things treated by art in general, with which art has to do, are not “aesthetic”, but that it is a matter of natural laws … all discussion of music can only take place along these lines. (Anton Webern, The Path to New Music) My interest in generative music came from a deep belief in close connections between processes in nature and the organization of sound in music. Graphics generated by simple programs in A New Kind of Science that can “capture the essence of the complexity--and beauty--of many systems in nature” (Wolfram Research), attracted my attention soon after the book was published. The remarkable gestural content resembled graphic scores (Ex.1-5). It seemed as if the pattern propagation, ranging from simple, repetitive, self-similar, random, to universally complex, captured the essence of musical processes across styles and cultural traditions. 1
The book can be accessed online at http://www.wolframscience.com/nksonline/toc.html
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 318–329, 2009. © Springer-Verlag Berlin Heidelberg 2009
From Mathematica to Live Performance: Mapping Simple Programs to Music
319
The following examples show two kinds of graphic representation of elementary rules. While the graphs in the left column model the process of evolution generated by two-dimensional, two-neighbor dependent CA, examples in the right column represent the oscillation of a total number of active cells in each step of evolution. The first column suggests the nature of pattern propagation while the second can easily be perceived as dynamic potential of a statistical sound complex. Class 1 Ex.1a-b
Rule 4
Rule 172
Class 2 Ex.2a-b
Rule 9
Ex.3a-b
Rule 90
Rule 82
Class 2-3 Rule 210
Class 3 Ex.4a-b
Rule 45
Rule 45
320
K. Miljkovic
Class 4 Ex.5a-b
Rule 110
Rule 110
Examples 1-5 demonstrate distinctly different behavior in each class of complexity that can be associated with the following musical equivalents: Class 1, with stable, pedal-like, prolonged sonorities (Ex.1); Class 2, with a range of simple repetitive modules (Ex.2); Class 2-3, (on the edge of randomness) with self-similar, repetitive musical gestures on multiple scales (Ex.3); Class 3, with random, irregular textures, statistical complexes, outbursts of sound (Ex.4); Class 4, with multilayered, well balanced, complex musical texture (Ex.5). Musical potential of the elementary rules was evident; the main question was how to proceed with mapping. As frequently noted by the authors on generative music, careful programming can produce good results (Miranda 2001, 1), potentially mimicking any style or composer. Here, the intent was to investigate intricate structures of elementary rules and reveal them through new configurations of sound. In terms of mapping, the premise was that sound mapping corresponding to pattern propagation (McAlpine, Miranda, and Hoggar 1999, 25) would most likely reach this goal.
2 Data Gathering The first trials were directed toward modeling basic elements of music. They were limited to a small number of steps in the evolution of different rules. Each cell on a vertical axis, black or white, from bottom up, corresponded to an ascending step of a previously chosen scale (see Ex. 6). A basic rhythmic unit, an eight note, was a step on a horizontal axis. A black cell was note-on, a white cell was note-off, a rest. Using elementary rules, it was quite easy to model stepwise motion, interval leaps and various textures. Ex. 6a-d Monophonic texture Rule 16
Rule 2
Homophonic texture Polyphonic texture Rule 18 Rule 110
From Mathematica to Live Performance: Mapping Simple Programs to Music
321
The next phase was data gathering through aural observation of thousands of 30second long sound samples generated by simple programs. In addition to elementary rules, substitution systems, random walk and Turing machines were tested. The process was like mining and unearthing patterns that were sometimes musically associative (Cope 2003, 11) and often pleasing. With significant help in programming on part of the Wolfram Research team, different mapping procedures were developed. Soon, Wolfram Research grouped the rules according to their potential to generate specific musical genres: world music, pop, ambient, jazz, to mention only a few. In 2005, Peter Overmann created the NKM system, based on CA. Features of the system made cycling through rules, changing initial conditions and generating midi files fast and easy. Multiple filters articulated sound structures making them more transparent. Possibilities of partitioning CA regions offered multiple choices for orchestration. A variety of scales, metric articulation and tempi, together with a wider register range and longer duration of samples, were provided.2 In spite of significant work on mimicking musical styles, the advancement was, from my perspective, too fast, lacking necessary time for reflection and detailed musical analysis of data. A limited duration of sound samples associated with a frequent cycling through the rules prevented observation of long-range processes and development of musical material, a finding often associated with generative music in particular, and algorithmic composition in general (Klouche 2005, 331 and 353). The next important step was to try to create longer musical pieces. After intense collaboration with Peter Overmann, I decided to generate a piece for violin and piano by a single rule, a predetermined set of initial conditions and musical parameters thus testing the potential of rules to preserve an interesting musical activity for an extended time period.
3 Large Scale Piece 3.1 Choice of a Rule The texture, with a melodic line and piano accompaniment, was an important factor in considering the choice of a rule and initial conditions. Dynamic unfolding of a melodic line, rhythmic differentiation between piano and violin and constant variation of musical material seemed to be necessary components for building the piece. After testing a number of simple programs, Rule 30 proved to be the best choice because of its high level of rhythmic activity and constant, unpredictable interval shifts. 3.2 Partitioning Mapping included multiple steps. The sound fabric of Rule 30 was generated by the procedure outlined in example 6. The algorithm for a melodic line traversing the fabric of the rule was developed by analyzing and then modeling Cantus Firmi and a number of themes from classical repertoire. Three melodic lines were generated, one for the violin part (Ex.7a) and two in the lower region. For the piano accompaniment (Ex.7b), blocks of black cells longer than five units were articulated by pitches of extended duration as an additional layer in the piano part (Ex. 7c). 2
A reduced version of the application can still be accessed at http://tones.wolfram.com
322
K. Miljkovic
Ex. 7a (excerpt)
Ex.7b (excerpt
Ex.7c (excerpt)
4 Initial Conditions The width of the rule was limited to 16 cells with initial conditions {1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1}3. It is worth mentioning that a search for initial conditions that created a texture revealing counterpoint of three well-shaped melodic lines took some time.
5 Choice of Musical Parameters The chosen scale was characterized by an equal number of semitones and whole tones, 2,1,2,2,2,1,1,1,4 and interesting internal symmetries and asymmetries.5 It was, to some extent, consistent with random data of the rule: The probability of occurrence of white cells, disrupting steady pulsation and open chromatic motion thus creating intervals larger than a semitone, is roughly 50%. 6 Tempo was 180 basic units per minute. Because of the non-periodic nature of the rule, it was difficult to identify metric grouping. The score (derived from the sound file) was written in 4/4, with measures being orientation points rather than metric units.
6 The Outcome Example 7a shows development of self-similar gestures in the violin part while example 7b demonstrates counterpoint between three lines. The occurrence of a large 3 4
1 represents a black, active cell, 0 represents a white cell. 1 represents a semitone, 2 represents a whole tone (two semitones).
5
The pitch collection is known as Raga Bahar and Miyan Ki Malhar according to their underlying scales in Indian music.
6
Statistical analysis of data that proved my assumptions was performed by Todd Rowland, Wolfram Research Inc. and communicated to me by e-mail on 1/25/2008.
From Mathematica to Live Performance: Mapping Simple Programs to Music
323
field of white cells in the middle of the example proved to be an important feature, pushing all three lines to the lower register, narrowing the range and introducing element of contrast into the texture. Another contrasting layer was pointillistic texture across the entire register range, obtained by sustained pitches of six or more durational units. The result was a coherent structure with elements of contrast propelling the musical flow. The composition (titled Awakening) had multiple concert performances. It was received very well. Comments by trained musicians focused on the beauty of the synthesis of a well-known language and “something new” that they couldn’t quite define.
7 Generative Pitch Collections and Rhythmic Grouping The previously described composition proved Rule 30 to be a powerful music generator. Similar procedures applied to other elementary rules pointed to a wide range of possibilities for generating longer compositions. Still, there were unresolved problems: the mapping relied on pitch collections outside of the system of a chosen rule, and the form of the piece was to some extent arbitrary. Characteristics of the rule were preserved through texture and rhythmic activity while the identity of frequency collections was, partially, left out. I felt a need to go back a few steps and make a deeper musical exploration of the elementary rules. A possible solution was to use equal temperament for basic frequency units (12tone chromatic scale) apply it to all of the elementary rules and then explore differentiation between pitch collections and rhythmic grouping generated by: a) different classes, b) different rules of the same class, and c) different initial conditions of the same rule. The outcome was as expected. Different classes generated distinctly different pitch and rhythm, the rules of the same class generated a variety of similar materials, and change of initial conditions created a number of variations of the same material. The following examples demonstrate a proposal for pre-compositional procedures for CA music, based on a detailed examination of features embedded in elementary rules.
8 Mapping Mapping was slightly modified. This time, each cell on the horizontal axis, from left to right, corresponded to an ascending semitone and the duration of an eight note. I alternated between two ways of music mapping: mapping A articulated each black cell while mapping B articulated only the first in a block of consecutive black cells and extended its duration for the length of the entire block. Ex. 8a
Rule 127
324
K. Miljkovic
For example, in mapping A, each step of evolution of Rule 127 (Ex. 8a), Class 1, generates a chromatic scale while in mapping B, each step generates an extended duration of a single pitch, depending on the width of the CA. In mapping B, CA from Class 1, with a careful choice of initial conditions and a filter controlling clusters, could produce interesting vertical sonorities. Sometimes they are a mixture of prolonged and repetitive sounds, as in Rule 5 (Ex.8b). Ex. 8b
Rule 5
8.1 Rule 90 Rule 90 from Class 2-3 generated a revolving whole tone scale and a regular pulsation. Using twelve-tone equal temperament, the mapping of each step in evolution resulted in a combination of major seconds (interval 2) and major thirds (interval 4) outlining a whole tone scale. Steps of evolution with the highest density of active, black cells generated uniform pulsation without rhythmic grouping. Steps of evolution with a lower density, because of the presence of fields of white cells, revealed groups divided by periods of silence (Ex.9a). Simultaneous mapping of multiple rows resulted in a texture characterized by stratification of pitch and rhythm. Because of its transparency and periodicity, the entire Class 2-3 had the potential to quickly generate pleasing musical results. Ex. 9a-d
Rule 90
Steps 15-16, graphic representation
Step 15, notation
From Mathematica to Live Performance: Mapping Simple Programs to Music
325
Step 16, notation
8.2 Rule 30 Time and frequency intervals in musical texture generated by Rule 30 were impossible to reduce to any pattern. Still, a surprising level of coherence emerged through aural observation of single rows (randomly chosen from different regions of the rule) that could be explained by statistical analysis of a large quantity of data from multiple rows. Example 10c represents step 32 of Rule 30 in mapping A. We can identify open chromatic motion disrupted by insertion of larger intervals caused by frequent occurrences of blocks of white cells. In order to reduce open chromatic motion, I applied mapping B. In spite of a lower level of activity, the main characteristics of the rule remained untouched (example 10d). The absence of periodic behavior produced an effect of constant shift, forward motion and musical expectation. Clearly, the rule’s most important feature was its potential to constantly generate fresh aural information. Ex. 10a-d
Rule 30
Steps 31-32, graphic representation
Step 32, mapping A
326
K. Miljkovic
Step 32, mapping B
Further, I mapped each step of evolution as a vertical sonority. The first 13 steps, beginning with one cell and then gradually widening (Ex.10a), resulted in a slow emergence of vertical sonorities. Regardless of randomness of the rule, the chords in a sequence were differentiated enough to be perceived as some kind of a progression. The reason was the property of the rule. Statistical analysis of Rule 30 revealed that 47% of the total number of intervals larger than a semitone were major seconds (interval 2); 26%, minor thirds (interval 3); 12%, major thirds (interval 4); 6%, perfect fourths (interval 5); 3%, tritones (interval 6). The interval content of Rule 30 seemed to be best described by exponential decay. This balance created a musical texture with unified chromatic and diatonic segments and potential to generate interesting and unpredictable tension-resolution vertical relationships. 8.3 Rule 110 Mapping a randomly chosen step in the evolution of Rule 110 (Ex. 11c-d) at first did not show any obvious periodicity (Ex. 11c). The cells exhibited a slight tendency to group by seven, but only in the first 14 basic units. The reason was the choice of region of the rule. Ex. 11a-d
Rule 110
Steps 31-32, graphic representation
Step 32, Mapping A
From Mathematica to Live Performance: Mapping Simple Programs to Music
327
Step 32, Mapping B
Example 11a clearly demonstrates wide regions of a periodic background disrupted by occasional propagation of irregular descending structures. The steps, presented in the examples 11c-d, belonged to a region where transition from periodic background to irregular structures occurred. Examples 12a-d present four steps of evolution of Rule 110, each 36 cells wide (three octaves). The first two (12a-b) are from a region with uniform background while the second two (12c-d) are from regions with irregular structures. Ex. 12a-b
Steps from a region of periodic background, mapping A
In both of the preceding examples, a revolving cycle of 14 basic units can be identified. Each subsequent cycle will be transposed by a major second up (interval 2). After six cycles (84 basic units), the first one will repeat, thus creating a new cycle on a large scale. On a small scale, it is possible to subdivide the 14-unit cycle into two sub-cycles of seven basic units (sometimes clearly articulated) where the transposition interval will then be a perfect 5th up (interval 7). Three possible cycles, each on a different scale, introduce structural hierarchy and interesting new possibilities for composing. The following steps of evolution demonstrate two degrees of deviation from periodic background. While there are small irregularities at the beginning of examples 12a-b, examples 12c-d illustrate the emergence of randomness. Ex. 12c-d
Steps from a region of irregular behavior, mapping A
328
K. Miljkovic
In example 12c, the 14-unit cycle is preserved in spite of modifications. Example 12d demonstrates dissolution of the cycle preserving only a few segments. It is difficult to identify it even if we are to take deviations into account. In case of a much longer example, we would observe the settling of random behavior and transition back to 14-unit cycles and periodic structure. In conclusion, Rule 110, incorporating cycles, a fine balance of simple periodicity, small deviations and randomness, has the potential to generate multilayered musical structures. It is interesting that generating longer compositions with this rule proved to be more difficult than in the cases of rules from Classes 1-3. Working with the cycles, capturing transitions from periodic to random behavior with the pre-determined set of initial conditions is not an easy task and requires sophisticated programming.
9 New Ground The Previously described steps are an attempt to lay out a ground for composing with the aid of simple programs from A New Kind of Science. Mappings were introduced that transparently and intuitively reveal characteristics of pattern propagation, allow analysis and classification of sound structures and provide tools for navigation through the music by means of 256 elementary rules leading to an endless number of simple programs. These steps have become a point of departure for a broadly expanded exploration of the vast sound world of cellular automata.
Acknowledgements I am indebted to Stephen Wolfram, Peter Overmann, Jason Cawley, Todd Rowland and Ed Pegg for their support, work on programming and generosity to introduce me to the world of cellular automata and A New Kind of Science.
References Cope, D.: Computer Analysis of Musical Allusions. Computer Music Journal 27(1), 11–28 (2003) DuBois, R.L.: Applications of Generative String-Substitution Systems in Computer Music (2003), http://www.music.columbia.edu/~luke/dissertation/ dissertation.pdf Klouche, T.: Aspekte einer Systematik computerbasierter Musikwissenschaft. In: Kulturbesitz, P. (ed.) Jahrbuch des Staatlichen Instituts für Musikforschung, Günther Wagner, pp. 316– 353. Mainz (2005) Weisstein, E.W.: MathWorld–A Wolfram Web Resource, http://mathworld.wolfram.com McAlpine, K., Miranda, E., Hoggar, S.: Making Music with Algorithms: A Case-Study System. Computer Music Journal 23(2), 19–30 (1999)
From Mathematica to Live Performance: Mapping Simple Programs to Music
329
Miranda, E.R.: Granular Synthesis of Sounds by Means of Cellular Automata. Leonardo 28(4) (1995) Miranda, E.R.: Evolving Cellular Automata Music: From Sound Synthesis to Composition. In: Proceedings of the Workshop on Artificial Life Models for Musical Applications - ECAL 2001, Prague University of Economics (September 2001) Miranda, E.R.: Composing Music with Computers. Focal Press, Oxford (2002) Webern, A.: The Path to New Music. In: Reich, W. (ed.) Translated by Leo Black, Bryn Mawr, Pa. T. Presser Co. in assoc. with Universal, London (1963) Wolfram, S.: A New Kind of Science. Wolfram Media, Champaign (2002) Xenakis, I.: Formalized Music. Pendragon Press, New York (1992)
Nonlinear Dynamics of Networks: Applications to Mathematical Music Theory Jonathan Owen Clark Brunel University, London, UK [email protected]
1
Introduction and Musical Motivation
Algebraic approaches to modelling and the theory of dynamical systems are important aspects of theories of mathematics and music. Group-theoretic approaches have been used for some time in models of pitch-class, tuning and interval etc. More recent approaches by Mazzola (2002) and others strikingly extend this algebraic formulation into the realm of modules and categories. And the theory of dynamical systems has found musical applications in both algorithmic music creation (for example in the compositions of Agostino Di Scipio), and the physical modelling of musical instruments (in the work of Xavier Rodet and others at IRCAM). This paper presents a variation on how these dual approaches may be brought together, with particular applications to Mathematical Music Theory. In particular we consider the nonlinear dynamics of a generalised form of networks. In the work of mathematicians Ian Stewart and Martin Golubitsky these networks are considered as directed graphs of cells and edges, where the cells have internal dynamics given a system of ordinary differential equations (ODEs) and the edges represent coupling effects between the cells. A simplified summary of some of the main results of the theory is given in Section Two. Algebraic factors are crucial- group symmetries of the network play an important role in understanding the time series outputs of particular cells in the network and the resulting global pattern formation. They imply a ‘catalogue’ of different forms of behaviour from which the actual behaviour is ‘selected’. In addition, in cases of approximate symmetry or networks with repeated sub-units, both groupoids and local internal symmetries can be shown to provide partial generalisations. From the point of view of Mathematical Music Theory, what is striking about the network dynamics of such systems is their idiomatically ‘musical’ nature. Equilibria, periodic, multirhythmic and rotating wave states are supported, and the bifurcation theory (the study of abrupt changes in the existence and distribution of singularities in the associated vector field) reveals even more exotic behavior, such as synchronised chaos and ‘bubbling’. Traditional work on utilising nonlinear methods in both composition and synthesis has concentrated on setting up a particular dynamical system whose parameters are varied and mapped musically in real time- a ‘bottom-up’ approach. The paper stresses a potential route for a different methodology. We consider a ‘top-down’ model that takes as its starting point the catalogue of solutions T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 330–339, 2009. c Springer-Verlag Berlin Heidelberg 2009
Nonlinear Dynamics of Networks
331
of a network itself. The group-theoretic properties of the network force these behavioural motifs irrespective of the local parametric conditions. A discussion of the musical and philosophical implications of this proposed methodology is given in Section Three, and aims at providing a connection between the mathematical ontology of Gilles Deleuze and the morphogenetics of musical events. In this section we also consider some applications to algorithmic composition and show a method where generative music derived from the catalogue of different oscillation patterns may be ‘switched’ using a simple Markov Chain architecture.
2
Nonlinear Dynamics of Networks
The material in this section follows the material in a number of papers and publications by Golubitsky and Stewart (1988, 2000, 2006). A coupled cell network is considered as a directed, finite graph whose nodes (or ‘cells’) represent state variables and whose edges represent couplings between the variables. The cells have intrinsic systems of ODE’s which govern the internal dynamics of each, so that the network is at a basic level an interacting high-dimensional dynamical system. To make this precise, suppose x˙ = f (x) where x = (x1 , x2 , ..., xn ) ∈ Rn and f : Rn → Rn is smooth, and γ : Rn → Rn is a linear map. We say the system has symmetry γ if γx(t) is a solution to the above equation whenever x(t) is a solution. It follows easily that γ is a symmetry if and only if f satisfies the condition f (γx) = γf (x). In contrast the symmetry group of the network is defined as the automorphism group of its graph. Denote this by Γ . This group permutes the nodes and edges of the graph and each such network symmetry induces a symmetry of the corresponding ODE’s as defined above, so that these two definitions of symmetry are in fact equivalent. If Σ ⊂ Γ is a subgroup, then Fix(Σ) = {x ∈ Rn : σx = x, ∀σ ∈ Γ } is a flow-invariant subspace, and the set of all such σ is a subgroup of Γ known as the isotropy subgroup of x. Now suppose that x(t) is a periodic solution to the system, so that x(t) = x(t + T ) for some period of time T . Define H = {γ ∈ Γ : γ{x(t)} = {x(t)}},
K = {γ ∈ Γ : γx(t) = x(t), ∀t}
The first are spatiotemporal symmetries, the second spatial symmetries. The first correspond to phase-relations in the behavior of the cells, that is one or more of the cells have identical periodic dynamics except for fixed phase shifts, and the second to synchrony of cells, that is cells with identical waveforms that are in phase. Phase-shifts correspond to actions of the circle group, S1 , i.e. for each h ∈ H, there is a phase shift θ(h) ∈ S1 such that hx(t) = x(t + θ(h)). Moreover, θ : H → S1 is a homomorphism with kernel K. It follows that H/K is isomorphic to a finite subgroup of S1 and hence is cyclic. Moreover, since fixedpoint subspaces are flow-invariant, K is an isotropy subgroup for the action of
332
J.O. Clark
Γ on Rn . In fact, one can show that, with certain caveats, the converse holds too, namely that cyclicity of H/K and isotropy of K implies the existence of periodic solutions - we state the full result below. H/K-Theorem (Golubitsky and Stewart 2006, p.316). Let Γ be the symmetry group of a coupled cell network in which all the cells are coupled and the internal dynamics of each cell is at least two-dimensional. Let K ≤ H ≤ Γ be a pair of subgroups. Then there exist periodic solutions to some coupled cell system with spatiotemporal symmetries H and spatial symmetries K if and only if H/K is cyclic and K is an isotropy subgroup. Moreover the system can be chosen so that the periodic solution is asymptotically stable. But the possible states of the system are only half the story. Typically in dynamical systems theory one is concerned with the states that the system supports, and then with transitions between them as parameters are varied; the latter is known as bifurcation theory. Having classified the possible spatiotemporal and spatial symmetries of a network with the H/K Theorem we now ask how these may arise from symmetry-breaking bifurcations. This is a question, mathematically, about the existence of solutions. The usual context for bifurcations is the loss of stability, as a single parameter λ is varied, of an Γ -invariant equilibrium x0 in a Γ -invariant system of ODE’s x˙ = f (x, λ). We may assume that the bifurcations are located at λ=0 and occur when the Jacobian J = (df )x0 ,0 has eigenvalues on the imaginary axis. If these are at ±i then the bifurcation is defined as a Hopf bifurcation. This represents a conversion in the cell dynamics from a steady-state equilibrium to a periodic state. The major result in this area that classifies the existence of periodic states through Hopf bifurcation is the following. Equivariant Hopf Theorem (Golubitsky and Stewart 2000, p.91). With the above conditions, define a C-axial subgroup, Σ ⊂ Γ × S1 to be an isotropy subgroup of the action of Γ ×S1 on the center subspace E c of J with dimFix(Σ) = 2. Then for each such Σ, there exists a unique branch of periodic solutions to the system with spatiotemporal symmetries Σ. We now illustrate these results with a number of examples of coupled cell networks (each with a small number of cells). Example 1. A three-cell network with identical cells and identical bidirectional coupling (indicated by the bidirectional arrow). O 1 aC CC CC CC C! = 3 {{ { { {{ }{{ 2
Nonlinear Dynamics of Networks
333
The ODE’s for each cell may be written x˙ 1 = g(x1 , x2 , x3 ),
x˙ 2 = g(x2 , x3 , x1 ),
x˙ 3 = g(x3 , x1 , x2 )
where x1 , x2 , x3 ∈ Rk , k ≥ 2, and some function g : (Rk )3 → Rk . The overline indicates that g is invariant under permutation of the second and third coordinates. This property reflects the identical coupling whose effect comes from the precise form of g. Here, the network has symmetry group S3 ∼ = D3 generated by σ = (12) which swaps cells 1 and 2, and τ = (123) which cycles all three cells. By the H/K Theorem, periodic solutions are characterized by pairs (H, K) equal to (Z3 (τ ), 1), (Z2 (σ), Z2 (σ)), and (Z2 (σ), 1). The first corresponds to a rotating wave where each cell has identical dynamics but with successive T /3 phase shifts. The second forces two cells to be in phase, and the third forces two of the cells to be out of phase (by exactly half a period) with third cell frequency twice that of the other two. This latter solution is of course of special interest in Mathematical Music Theory, indicating as it does a ‘symmetry-forced’ frequency-doubling oscillation, something that has obvious parallels with Fourier Spectra. All three periodic solutions occur as Hopf bifurcations. To see this, write the action of D3 × S1 on R4 (identified in this case with with C2 ) as φ(z1 , z2 ) = (e−iφ z1 , eiφ z2 ),
κ(z1 , z2 ) = (z2 , z1 ),
θ(z1 , z2 ) = (eiθ z1 , eiθ z2 )
where φ ∈ Z3 acts by rotation through 2π/3 and θ ∈ S1 . Then there are three C-axial subgroups, Z2 ((κ, 0)), Z3 = <(φ, φ)>, and Z2 (κ, π) each with twodimensional fixed-point subspaces, and corresponding to the above H/K pairs (Z2 (σ), Z2 (σ)), (Z3 (τ ), 1), and (Z2 (σ), 1) respectively. Example 2. A three-cell network with identical cells and identical unidirectional coupling. 1 aC CC CC CC C 3 {= { {{ {{ {{ 2 In this case the ODE’s take the form x˙ 1 = g(x1 , x3 ),
x˙ 2 = g(x2 , x1 ),
x˙ 3 = g(x3 , x2 )
This time, the unidirectionality of the coupling reduces the symmetry and hence the number of periodic solutions. The symmetry group this time is Z3 , and since this group has no non-trivial isotropy subgroups, the only supported periodic state inherited from the bidirectional ring is the rotating wave state, resulting from a Hopf bifurcation corresponding to the C-axial subgroup Z3 = <(φ, φ)>.
334
J.O. Clark
Example 3. The above scenario (Example 1) for a three-cell symmetric network can be generalised. In the case of rings of N identical cells with bidirectional coupling, the symmetry group is DN and it can be shown (Golubitsky et al. 1988, p.368) that the required C-axial subgroups form triples (here ζ = 2π/N , and Zc2 = {(0, 0), (π, π)}). (Z2 (κ), ZN , Z2 (κ, π),
(Z2 (κ) ⊕ Zc2 , ZN , Z2 (κ, π) ⊕ Zc2),
(Z2 (κ) ⊕ Zc2 , ZN , Z2 (κζ) ⊕ Zc2 )
depending on whether N is odd, or, congruent to 2 or 0 modulo 4 respectively. For convenience we show below a complete table of waveforms for N = 12. 0 1 2 3 4 5 6 7 8 9 10 11 A A A A A A A A A A A A A A+1 A+2 A+3 A+4 A+5 A+6 A+7 A+8 A=9 A+10 A+11 A B C 2D C1/2 B1/2 A1/2 B1/2 C1/2 2D C B A B C C1/2 B1/2 A1/2 A1/2 B1/2 C1/2 C B A A A+2 A+4 A+6 A+8 A+10 A A+2 A+4 A+6 A+8 A+10 A B B1/2 A1/2 B1/2 B A B B1/2 A1/2 B1/2 B 2A B B 2A B1/2 B1/2 2A B B 2A B1/2 B1/2 A A+3 A+6 A+9 A A+3 A+6 A+9 A A+3 A+6 A+9 A 2B A1/2 2B A 2B A1/2 2B A 2B A1/2 2B 2A B 2A B1/2 2A B 2A B1/2 2A B 2A B1/2 A A+4 A+8 A A+4 A+8 A A+4 A+8 A A+4 A+8 A B B A B B A B B A B B 2A B B1/2 2A B B1/2 2A B B1/2 2A B B1/2 A A+5 A+10 A+3 A+8 A+1 A+6 A+11 A+4 A=9 A+2 A+7 A B C 2D C1/2 B1/2 A1/2 B1/2 C1/2 2D C B A A1/2 B C C B A A1/2 B1/2 C1/2 C1/2 B1/2 A A1/2 A A1/2 A A1/2 A A1/2 A A1/2 A A1/2 Here A, B, C, D are waveforms subject to ‘internal’ (κ, θ) symmetry. The notation A + n indicates the waveform A with a phase-shift of n/12ths of the total period T of the network, 2A means an oscillation at twice the frequency, and A1/2 indicates a waveform with a T /2 phase-shift. The table also contains periodic solutions arising from the nonstandard as well as standard actions. Essentially the solutions split into blocks of q sets of size h where q = gcd(h, j), 0 ≤ j ≤ N − 1. We omit the details. One of the surprising aspects of the theory of Stewart and Golubitsky is that non-trivial global symmetry is not, in general, necessary to prove the existence of periodic solutions in a coupled cell network, as the next example shows. Example 4. A coupled cell network that has trivial symmetry group but that supports a travelling wave solution. = 2 CC CC {{ { CC { CC {{ { { ! 3 1 l
/ 4
/ 5
/ 6
/ 7
Nonlinear Dynamics of Networks
335
Here the ODE’s may be given the form x˙ 1 = g(x1 , x3 ),
x˙ j = g(xj , xj−1 ),
j = 2, ..., 7
If we restrict our attention to the states in which x1 = x4 = x7 ,
x2 = x5 ,
x3 = x6
then the equations of this network are identical to those in Example 2 (the unidirectional ring on three cells). Hence this network is a ‘quotient’, and it can then be shown (Golubitsky and Stewart 2006, p. 324) that the rotating wave of the 3-cell network ‘lifts’ to the 7-cell network, producing a periodic solution in which cells 1,4 and 7 are synchronous, as are 6 and 3 (with a 2T /3 phase-shift), and as are 5 and 2 (with a T /3 phase-shift). This notion can be made precise by defining an equivalence relation, on the cells, such that cells that are in the same equivalence class have related ‘input sets’ (defined as the set of cells from which a particular cell receives coupling and whose associated variables appear in the ODE component). The dynamics are then determined by a network, the quotient of the whole G by the equivalence relation, G/ (the graph-theoretic quotient) whose cells correspond to clusters of synchronous cells (equivalence classes) and whose edges are defined to preserve the input type defined by the equivalence relation . In the above case the equivalence classes are {1, 4, 7}, {2, 5}, {3, 6}. See Golubitsky and Stewart 2006 (Section 9) for more details, where this leads to the conclusion that the structure is governed not by the symmetry group of the network (which in this case is trivial), but the symmetry groupoid of the network, which measures, in some sense, the existence of ‘local’ symmetries. Other Examples. In Golubitsky and Stewart 2006, many examples of lowdimensional coupled cell networks are given, including a 3-cell network with asymmetric coupling, with an associated synchrony subspace which equations defining the standard R¨ ossler attractor. This is an intriguing example in which the synchronous dynamic of cells can be chaotic. Such a network, when perturbated, exhibits transverse dynamics that are approximately synchronous- a state in which cells repeatedly obtain and then lose synchrony intermittently (a phenomenon called ‘bubbling’). The details are complex and are omitted.
3 3.1
Discussion and Applications Nonlinear Dynamics and Musical Ontology
We now discuss some connections between the theory of nonlinear dynamics and the topic of the morphogenesis of musical or sonic events, and we will refer here to Manuel DeLanda’s reading of the mathematical ontology of Gilles Deleuze (DeLanda 2002). Central to this reading is DeLanda’s interpretation of the Deleuzian concept of multiplicities. He defines these in relation to wellknown concepts in nonlinear dynamics and complexity theory as ‘nested sets
336
J.O. Clark
of vector fields related to each other by symmetry-breaking bifurcations, together with the distributions of attractors which define each of its embedded levels’, (DeLanda 2002, p. 30). An ontological distinction is then drawn between the parts of this model that carry information about the actual world (possible ‘states’ of the model are actualised as the trajectories of a particular dynamical system) from that part that which, in principle, is never actualised, something that Deleuze calls the virtual. This latter concept should not be confused with the common usage of the word ‘virtual’, that of the ‘virtual reality’ of digital simulations, but as the embedded levels of singularity structure on manifolds which can change through bifurcation. As Deleuze states in Difference and Repetition, p.208, ‘the reality of the virtual consists of the differential elements and relations along with the singular points that correspond to them’, and is, in some sense is part of the object itself, coiled up inside it, ‘as though the object had one part of itself in the virtual into which it is plunged as though into an objective dimension’. That the singularities themselves are never actualised is clear from the mathematics; trajectories in dynamical phase space approach attractors asymptotically, getting ever closer but never reaching them. But they are nevertheless real entities that structure the possible phase space paths- as Deleuze puts it ‘the singularities preside over the genesis of the trajectories’. The above discussion of the nonlinear dynamics of networks can now be interpreted similarly in this context and related to an ‘immanent’ musical ontology. Events in sonic space have complex spectrotemporal and spatiotemporal unfoldings which can be interpreted as one of so many possible actualisations of that event- each structured by singularities and differential relations. We saw in Section Two how a coupled cell system can be ‘dormant’ in a steady-state equilibria, but as a result of a codimension-one bifurcation, spontaneously oscillate with a characteristic period. And further bifurcations are possible, from period-doubling cascades to chaos. It has been known for some time that, in particular, Hopf bifurcations, like those described earlier, play a role in the emerging spectral dynamics of musical tones, and many authors have utilised this phenomenon in the physical modelling of musical instruments (for example see Rodet and Vergez 1999). Particularly important here is how such bifurcations can be seen to structure a (different) sense of musical time; spontaneous bifurcations could result in the emergence, or not, of spectral components in a musical tone. And the timings of these events suggest a conception of musical time as being more like a nested set of levels of periodicities which are fundamentally asymmetric. As complexity theorist Ilya Prigogine puts it, a process ‘in the regime of uniform steady state...ignores time. But once in the periodic regime, it suddenly ‘discovers’ time in the phase of the periodic motion....We refer to this as the breaking of temporal symmetry’, (Nicolis and Prigogine 1989, p.21). In addition, other parts of the morphogenetic evolution of sonic events can be interpreted in this framework, such as attack transients, a phenomenon that certainly relates to the concept, in nonlinear dynamics, of ’relaxation time’the time taken for a trajectory to settle into a behavior within its basin of attraction. Equally important is what this ontological perspective entails for the
Nonlinear Dynamics of Networks
337
relationship between chance and determinism in sonic morphogenetics. In Example Three we saw that a ring of 12 identically coupled cells can undergo Hopf bifurcation so that an equilibrium is replaced by one, and only one, of 17 different periodic solutions. The question is- which one? Whilst the bifurcation itself is in some sense deterministic (when the critical value is reached, we now the system will bifurcate), only chance, in the form of a particular perturbation of the environmental circumstances at the time of bifurcation will decide which periodic state is reached. Many solutions are possible for the same parameter value. For the most part however, the study of dynamical systems within musical contexts has often involved consideration of particular systems, which are modelled in real-time, and bifurcation points simulated deterministically etc. And one must perhaps draw a distinction here between theoretical models for musical morphogenesis and ’real-world’ applications. These latter applications propose a model which is then ‘matched’ to a ‘real’ musical situation, so that for example, a dynamical system modelling the tone production of a particular instrument is deemed successful if the output matches the spectrogram of the instrument being modelled via a quantitative analysis. But in Deleuzian ontology, there is a deeper isomorphism here- ‘that a given phase space and the physical system it models may be both actualizations of the same virtual multiplicity. That is turn, implies the crucial relation between model and reality is not one of resemblance (phase space trajectories resembling plotted series of laboratory measured values) but one of co-actualization’, (DeLanda 2002, p. 180). It could be argued however that such qualitative descriptions of the spatial and temporal unfolding of sonic events are often subsumed to more quantitative or informatical approaches. What is perhaps missing from Mathematical Music Theory at present are theoretical models of the inherent uncertainty and contingency that underlie the temporal evolution of even the most simple sonic events. Indeed, this is really a question of scale. Large scale properties of musical events (rhythm, harmony, even pitch) are relatively well-studied, and such concepts represent musical entities that have, in some sense, already ‘coalesced’ from small-scale microevents. But a qualitative theory of these ‘intensive’ and immanent events is perhaps still needed. And it could be that the generality of the theory of coupled cell networks outlined above could form part of such a theory. One of the advantages of this approach is that it is, to a certain degree, model-independent. We have assumed nothing in the examples about the nature of the coupling function for instance, or for that matter, the internal dynamics of each individual cell. To a large extent, these considerations are secondary to the local or global symmetry conditions imposed on the system that ‘force’ the cell dynamics. But some significant problems remain; many assumptions need to be made, such as identical coupling and lack of plasticity of edge-formation (new coupling arising in the network spontaneously etc.). Factors that probably do arise ‘in nature’. And in the groupoid case, which may be the most flexible, due to less stringent requirements on the global symmetries, still has a sketchy bifurcation theory, which is known to be related to the (intractable) representation
338
J.O. Clark
theory of nonsemisimple associative algebras (see Golubitsky and Stewart 2006, p. 360). But it nevertheless may represent a step in the right direction, and more research is required here. 3.2
Applications to Algorithmic Composition
We end with a description of a sound installation (Clark 2005) that takes the catalogue of the possible oscillations in a ring of N cells (described in Example Three) as a starting point. The installation incorporates an eight-channel speaker system linked to a MaxMSP patch that dictates the sonic output of each cell, or speaker. Larger numbers of speakers may also be used; the diffusion throughout the speaker grid handled by third-party MaxMSP spatialisation tools. The periodic solutions act as modulators of the amplitude envelope of particular audio buffers, to create a polyphonic texture in which the phase-locking and phase-shifts of the individual channels can clearly be heard. This reflects one of the artistic motivations behind the installation, namely an approach to sound spatialisation that specifically seeks to explore both phase and synchrony as compositional material. Phase particularly is perhaps often overlooked as a compositional determinant, the tape loop experiments of Steve Reich in the 1960’s notwithstanding, and it is perhaps surprising that the opportunities for spatial separation afforded by multi-channel installation set-ups have not explored this parameter more fully. In the MaxMSP patch that generatively creates phase and synchrony patterns in the loudspeakers, the choice of a particular (bifurcatory) oscillation type is dictated by chance and uses the Markov model of Visell (2004). Visell uses the standard implementation, considering a set of K states S = {s1 , s2 , ..., sK } modelling a time progression of events, and a set of transition probabilities Tij = P (sj |si ), together with an M -dimensional parameter space from which a generated sequence of vectors of parameters v1 , v2 , ... is selected. A set of probability distributions P (vj |sk ) are defined and indicate the probability for a vector vj to have been generated by state sk . Visell models the P (vj |sk ) as weighted Gaussian mixtures. One can also perform a simple training algorithm which updates the matrix Tij so that the model ‘learns’ from its own output. We can consider now this model in our setting. Each Markov state is identified with a single branch of periodic bifurcating solutions. We assume these are preceded in each case by a neutral ‘off’ state, and that for musical reasons, the output decays back to this state after a certain period of time. Similarly, we may treat the period T as a variable in our model. Small values of T can be used, for example, to modify microsound parameters. The M -dimensional parameter vector sampled from the Gaussian mixtures is used to model musical parameters in each individual channel, including spectral content of the waveforms. One can also, as Visell points out, link multiple networks simultaneously into a ‘network of networks’, the sonic result of which is a kind of ‘network polyphony’.
Nonlinear Dynamics of Networks
339
References Clark, J.O.: Immanence. A sound installation first shown as part of the ’HTMLes’ CyberArt Bienalle at the Monument National, Montreal (2005) DeLanda, M.: Intensive Science and Virtual Philosophy. Continuum, London (2002) Golubitsky, M., Stewart, I.: Nonlinear Dynamics of Networks: The Groupoid Formalism. Bulletin of the AMS 43(3), 305–364 (2006) Golubitsky, M., Stewart, I.: The Symmetry Perspective: From Equilibrium to Chaos in Phase Space and Physical Space. Birkh¨ auser, Basel (2000) Golubitsky, M., Stewart, I., Schaeffer, D.G.: Singularities and Groups in Bifurcation Theory, vol. II. Springer, Berlin (1988) Mazzola, G.: The Topos of Music. Birkh¨ auser, Basel (2002) Nicolis, G., Prigogine, I.: Exploring Complexity. W. H. Freeman, New York (1989) Rodet, X., Vergez, C.: Nonlinear Dynamics in Physical Models: From Basic Models to True Musical-Instrument Models. Computer Music Journal 23(3), 35–49 (1999) Visell, Y.: Spontaneous Organisation, Pattern Models, and Music. Organised Sound 9(2), 151–165 (2004)
Form, Transformation and Climax in Ruth Crawford Seeger’s String Quartet, Mvmt. 3 Edward Gollin Williams College [email protected] Abstract. The paper reckons the progression of permuted registral states in Ruth Crawford Seeger’s String Quartet as a walk through a Cayley graph of the symmetric group, S4. The graph privileges the 2-cycles that exchange adjacent voices in the four-voice quartet texture, imposing a metric upon the group elements that allows one to distinguish permutations that are contextually close or ‘smooth’ from those are distant or ‘agitated.’
The paper explores the transformational structure of voice permutations in the third movement of Ruth Crawford Seeger’s String Quartet (1931), extending and offering an alternative to an analysis presented in an article by Ellie Hisama (Hisama 1995). Hisama argues against a ‘traditional’ view of the movement’s structure in which the music reaches a singular climax (of dynamics, ambitus, density) three-quarters of the way through the piece. She instead suggests that the intertwining lines of the ensemble project multiple climaxes in the structural domain of ‘twistedness’, i.e. the degree to which the relative registral positions of the ensemble depart from a ‘standard’ quartet ordering. Example 1. Ruth Crawford Seeger, String Quartet, iii, measures 25–32.
Example 1 presents mm. 25–32 of the work, illustrating a texture typical of the movement in its approach to the traditional climax: sustained tones in each instrument lie within a narrow ambitus; individual voices leap over or under those of its neighbors, creating a variety of ‘registral states’, particular registral dispositions of the instruments T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 340–346, 2009. © Springer-Verlag Berlin Heidelberg 2009
Form, Transformation and Climax in Ruth Crawford Seeger’s String Quartet, Mvmt. 3
341
in the ensemble. The distinct registral states in Example 1 are identified beneath the score. The first state shown in m. 25 is the standard quartet ordering (high to low: vn1, vn2, vla, vc). On the third beat of m. 25, violin 1 crosses below violin 2, creating a new registral state (high to low: vn2, vn1, vla, vc). On beat 2 of m. 26, the cello leaps up and crosses above the viola, creating a third distinct registral state (high to low: vn2, vn1, vc, vla). Additional voice crossings in the subsequent measures lead to the final registral state of the passage at m. 32, a complete registral inversion of the standard quartet ordering (high to low: vc, vla, vn2, vn1). At the bottom of Example 1, I indicate what Hisama calls the “degree of twist” of each registral state. For a given sonority, the degree of twist tallies the number of voice pairs whose registral ordering represents an inversion of the standard quartet ordering.1 For instance, the ‘standard,’ referential ordering at the opening of Example 1 has no inverted pairs, and hence has a degree of twist 0. The second state has one inverted voice pair (violin 2 over violin 1) and thus has degree of twist 1. The third state has two inverted voice pairs (violin 2 over violin 1, cello over viola) and thus has degree of twist 2. The sonority on beat three of bar 29 has four inverted voice pairs (violin 2 over violin 1, cello over violin 1, viola over violin 1, cello over viola) and thus has degree of twist 4. The final state, in bar 32, has degree of twist 6—all voice pairs are inverted relative to those of the standard ordering—and thus represents a state of maximal twistedness in a four-voice sonority. Example 2. A summary of registral states in measures 1–63.
1
Hisama (1995, 298).
342
E. Gollin
Example 2 lists the registral states of instruments in the first 63 measures of the quartet movement. Each registral state is identified by the measure in which it first appears.2 Hisama’s degree of twist is indicated above each state. The two maximally-twisted four-voice states (asterisked at mm. 32 and 56), in Hisama’s view, represent climaxes in the domain of “twistedness.” The gradual motion toward and away from these two maxima, progressing through more and less twisted states, supports what Hisama has termed a ‘feminist,’ as opposed to the traditional ‘masculine,’ narrative of form in the work.3 Rather than considering the notion of ‘climax’ to reside simply in the twisted registral states of the ensemble, the present paper adopts a transformational interpretation of Hisama’s analysis, considering the ways in which climax can be understood to reside also in the structure of the permutations between them. In one sense, the ‘structure’ of the permutations is manifest as the symmetric group, S4, comprising the transformations between the 24 possible permuted states of the quartet. Yet a transformational view of the quartet movement as simply a reification of S4 through projection of the largely undifferentiated members of the group is not particularly informative. Indeed, it simply exchanges a view of the work as a succession of states with a view of the work as a succession of permutations between those states, themselves objects of a different sort. One way to assert distinctions and impose a hierarchy among group elements is through a combinatorial representation of the group, that is, through a group presentation.4 A group presentation represents a group as a set of generators, {a, b, c, ...}, together with a set of defining relators, {W, U, V, ...}, written < a, b, c, ...; W, U, V, ... >.
(1)
The generators are group elements that (together with their inversions) can be combined to express any group element. Group elements are expressed symbolically as words in the symbols a, b, c, … that represent the ordered combination (reading from left to right) of the generating elements signified by those symbols. The defining relators are words equivalent to the identity element that express how the group structure constrains the combination of generators. Most groups admit multiple distinct presentations. For instance, the symmetric group S3 (equivalently the dihedral group D3) can be represented by a presentation either on two distinct order-2 generators: < b, c; b2, c2, (bc)3 >
(2)
or by a presentation on an order-3 generator and an order-2 generator: < b, c; b3, c2, cbcb-1 >.
(3)
Examples 3(a) and (b) depict the presentations given by [2] and [3], respectively, as Cayley graphs. The graphs reveal how the two presentations differently organize the six elements of the group, imposing distinct hierarchies upon those elements based 2
Brackets embracing violins 1 and 2 at mm. 19 and 19a indicate that the two instruments play in unison; the default position of violin 1 above violin 2 in those two states is assumed. 3 Hisama (1995, 305). 4 Magnus et al. (1976) is the classic work on combinatorial group theory. Gollin (2000) explores various music-transformational applications of a combinatorial group perspective.
Form, Transformation and Climax in Ruth Crawford Seeger’s String Quartet, Mvmt. 3
343
upon the distances between elements in the graph. In Example 3a, for instance, any group element has only two immediate neighbors, two elements are distant by two steps, and one element is antipodal, lying three steps distant on the graph. The graph of Example 3b (and its underlying presentation) by contrast imposes a less articulated hierarchy among elements of the group: any element has three immediate neighbors and the remaining two elements are only two steps distant. Example 3. Two presentations of S3, represented as Cayley graphs.
Example 4. A concrete instantiation of the presentation from Example 3a.
The graphs of Example 3 are graphs of abstract presentations of S3: nodes are unlabeled, and can be instantiated by a variety of elements of the abstract group. Example 4 instead presents a concrete instantiation of the abstract graph of Example 3(a), associating the abstract generators b and c with particular elements of a concrete group. In this case, the graph represents the group of registral permutations of three instruments (vn2, vla, vc) in a three-voice texture. Generators have been instantiated by the particular order 2 elements that exchange instruments in neighboring voices: b = (AT), the permutation that exchanges instruments in the Alto and Tenor voices of a given three-voice sonority; c = (TB), the permutation that exchanges instruments in the Tenor and Bass voices of a given three-voice sonority. Each node has been labeled to represent a particular registral ordering of the instruments. In the case of the Crawford Seeger quartet, given its gradual twisting and untwisting of adjacent lines, (AT) and (TB) form a natural set of generators against which to compare and measure permutations in the three-voice texture that precedes entry of the fourth part (mm. 14–19).
344
E. Gollin
On Example 2, arrows beneath the three-voice registral states, are labeled using words in the symbols b and c that describe the permutations between successive states. The words, moreover, describe a path through the graph of Example 4, a path that represents the progression of states. Two aspects of the path/progression are notable. First, word lengths increase over the course of the three-voice section: two words, c and b (describing a motion on the graph from 12:00 to 10:00, and 10:00 to 8:00 respectively) are followed by two statements of the word cb (describing motions from 8:00 to 4:00 and 4:00 to 12:00 respectively). Because the presentation, by virtue of its choice of generator elements, values ‘smooth’ transitions (those that exchange single pairs of neighboring elements), the progression from simple to compound transformations reflects an increase in transformational or permutational complexity or ‘agitation.’ Second, the progression of states completes a full circuit on the graph; the return to the opening state formally ends the section, and the introduction of a fourth voice ensues. Example 5. A Cayley graph of S4 on the order-2 generators that exchange adjacent voices in a quartet texture.
Form, Transformation and Climax in Ruth Crawford Seeger’s String Quartet, Mvmt. 3
345
One can similarly present S4 in generating symbols {a, b, c}, instantiated by the three permutations that exchange adjacent voice parts in a four-voice texture (SATB), a = (SA), b = (AT), c = (TB). The complete presentation < a, b, c; a2, b2, c2, (ac)2, (ab)3, (bc)3>
(4)
is represented as a Cayley graph on Example 5.5 The graph is analogous to the familiar Tonnetz in pitch space; like the graph of Example 4, it imposes a metric upon the quartet’s registral states, allowing one to measure the transformational ‘distance’ between any two states as the word length of that transformation in the group presentation, a distance measured in terms of the space’s ‘privileged’ generating permutations. Beneath the four-voice registral states of Example 2, the permutations between successive states of the quartet are expressed as words in the presentation [4], or equivalently, as pathways through the voice-permutation space of Example 5. Inspection of the voice permutations on Example 2, reckoned as words in the symbols {a, b, c}, reveals that while the progression of registral states becomes twisted and untwisted during the approach to the traditional climax (Hisama’s multiclimax view), the word lengths between successive states gradually increase as one approaches the traditional dynamic/registral climax. That is, whereas the first 45 measures exclusively feature permutations of word length 1 and 2, two words of length 3 (both statements of the word cba) lead to the registral states at mm. 49 and 56. The words of length 3, which signify a greater degree of transformational ‘agitation,’ fortify, from a transformational perspective, a traditional understanding of the movement’s form as a gradual progression toward a singular climax. At the same time, the sequence of word lengths in voice-permutation space can be understood to support Hisama’s view that states of maximal twistedness constitute points of local climax: the first state with maximal degree of twist (m. 32) arrives amid a run of four length-2 permutations; the second state with maximal degree of twist (m. 56) is attained by a length-3 permutation. The transformational structure of the permutations thus articulates and underscores the structure of the registral-state maxima.6
References Hisama, E.: The Question of Climax in Ruth Crawford’s String Quartet, Mvt. 3. In: Marvin, E.W., Hermann, R. (eds.) Concert Music, Rock, and Jazz since 1945: Essays and Analytic Studies, pp. 285–312. University of Rochester Press, Rochester (1995) Gollin, E.: Representations of Space and Conceptions of Distance in Transformational Music Theories. Ph.D diss., Harvard University (2000)
5
The graph is a Schlegel diagram (a distorted planar projection) of a truncated octahedron. Schlegel diagrams are discussed in Loeb (1991). 6 A different transformational view of the permutations in the passage can be gleaned by observing permutations among instruments rather than among voices. Such a view might reflect the intuition that incremental changes of state are those that exchange instruments pairs that are timbrally similar. Lewin (2007, xvii–xx) explores distinctions between voice permutations and instrument permutations in the analysis of works in triple counterpoint.
346
E. Gollin
Lewin, D.: Generalized Musical Intervals and Transformations (Corrected edn. with New Author’s Preface). Oxford University Press, New York (2007) Loeb, A.L.: Space Structures: Their Harmony and Counterpoint. Birkhäuser, Boston (1991) Magnus, W., Karrass, A., Solitar, D.: Combinatorial Group Theory: Presentations of Groups in Terms of Generators and Relations. Dover Publications, New York (1976)
A Local Maximum Phrase Detection Method for Analyzing Phrasing Strategies in Expressive Performances Eric Cheng1,* and Elaine Chew2 1 Hsieh Department of Electrical Engineering, University of Southern California Viterbi School of Engineering [email protected] 2 Epstein Department of Industrial and Systems Engineering, Hsieh Department of Electrical Engineering, University of Southern California Viterbi School of Engineering [email protected]
Abstract. This paper proposes a Local Maximum Phrase Detection (LMPD) method for the analysis of phrasing strategies in expressive performances. The LMPD method systematically extracts a quantitative representation of phrasing strategy by equating the occurrence of a local maximum in the loudness curve with the occurrence of a phrase or sub-phrase. We further define mathematical descriptors for phrase strength and volatility, and phrase typicality, for comparing phrasing strategies among performances. Phrase strength measures the prominence or clarity of a phrase, and the volatility is defined as the standard deviation of the phrase strengths within a performance. Phrase typicality quantifies the degree to which a phrase loudness peak location is characteristic among the performances polled. The ideas behind these descriptors extend to phrase information derived from tempo variation. We illustrate the LMPD method using preliminary results from its application to eleven commercially available audio recordings of a solo violin Bach Sonata.
1 Introduction Previous research into expressive phrasing strategies has generally been local in nature. That is, the majority of studies have discussed how performers vary musical parameters within a single phrase or near a single phrase boundary (see, for example, Cambouropoulos (2001), Gabrielsson (1987), Langner and Goebl (2003), and Repp (1992)). While certain local phrasing strategies - such as the clarifying of boundaries with declines in tempo and dynamics - are well documented, relatively little is known about higher level phrasing strategies, i.e., how performers choose to segment a piece into phrases. In this paper, we aim to develop tools for understanding phrasing strategies from this more global perspective. To that end, we propose the Local Maximum Phrase *
Eric Cheng is now a Research Analyst with Areté Associates.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 347–353, 2009. © Springer-Verlag Berlin Heidelberg 2009
348
E. Cheng and E. Chew
Detection (LMPD) method, which derives a quantitative representation of phrasing strategies by identifying the number and locations of phrases throughout a performance, and defines mathematical descriptors to quantify the characteristics of each phrase. These methods provide the means to quantitatively compare and contrast global phrasing strategies. We extract beat-level1 tempo and loudness data from music audio using manual onset detection and a psycho-acoustic model of loudness. By superimposing author-annotated phrase boundaries over the tempo/loudness data, we find that local maxima in the loudness curve are reliable indicators for phrase or sub-phrase occurrences in the piece examined. This observation leads to the LMPD method, which first equates the occurrence of a local maximum in the loudness curve with that of a phrase or subphrase. Then, the phrase strength, which measures how strongly the phrase stands out against the background, is calculated as the average loudness increase from the adjoining local minima; and, the phrase volatility is computed as the standard deviation of all phrase strengths. Finally, each phrase is assigned a value, the phrase typicality, that quantifies the popularity of its location among the performances polled. To demonstrate the efficacy of the LMPD method, we apply the method to performance data obtained from commercially available audio recordings, and present the preliminary results.
2 The Method This section presents our method for performance analysis, and consists of: (1) a description of our method for tempo and loudness extraction; (2) arguments for focusing on loudness data; and, (3) the proposal of the local maximum phrase detection method, including the extraction of phrases, and of the mathematical descriptors. 2.1 Data Extraction We first extract beat onset times manually using a marking tool in Final Cut Pro. Then, we use the onset times to calculate beat-level tempo. To compute loudness data, we first calculate a loudness waveform, in Sones, using a MATLAB implementation of the PEAQ2 standard (Kabal 2002). The waveform is then smoothed using a Gaussian window, and sampled at each onset time, to obtain beat-level data. To validate the accuracy of the extracted data, we compare loudness values for a single recording to a manually plotted reference curve representing our own perception. The smoothing window width is optimized so that the loudness data best matches the reference curve. 2.2 The Case for Loudness We extracted performance data (tempo and loudness) from eleven commercially available audio recordings of the Andante movement of J. S. Bach's Sonata No. 2 for 1
We use the term “beat-level” to refer to the estimation of tempo and loudness through a sampling of onset times and loudness values at the natural frequency with which one would tap along to the beat of the piece (cf. Sethares, in this volume). 2 Perceptual Evaluation of Audio Quality – a standard of the International Telecommunications Union.
A Local Maximum Phrase Detection Method for Analyzing Phrasing Strategies
349
solo violin, BWV 1003. We chose this piece for its regular pulse and unambiguous phrase structure - qualities that simplified both data extraction and analysis. The performances were by Ehnes, Enescu, Grumiaux, Heifetz, Kremer, Menuhin, Milstein (1956 and 1975), Mintz, Szerying, and Szigeti. To devise a systematic method of phrase detection, we first annotated phrase boundaries using only the score as our guide. Then, as shown in Figures 1 and 2, we superimposed these boundaries over plots showing tempo and loudness data for all performances to see whether any identifiable patterns arose.
Fig. 1. Phrase boundaries superimposed over tempo data
Fig. 2. Phrase boundaries superimposed over loudness data
350
E. Cheng and E. Chew
In the two figures, vertical lines denote starts of phrases, and each trajectory represents a single performer's performance data. Observe that the loudness trajectories appear to be more consistently related to the annotated phrase boundaries than their tempo counterparts. In particular, phrases are well characterized by a crescendo/decrescendo arch similar to that mentioned in several past studies, such as Gabrielsson (1987), Sundberg, Friberg and Bresin (2003), and Todd (1992). In some cases, phrases are characterized by two or more sub-arches, suggesting that those performers chose to divide the annotated phrases into subphrases. In contrast, the tempo strategies are less systematically related to the phrase boundaries, with a greater diversity of trajectories. This is not entirely unexpected. Beran and Mazzola (1999) and Repp (1992) found pronouncedly different tempo strategies in their investigations of piano performances of Schumann’s “Träumerei.” The greater diversity of tempo strategies, which was confirmed by analyzing the average interperformer correlations for the tempo and loudness data ( r tempo = 0.4862 and
r loudness = 0.7627 ), led us to conclude that, for this piece, loudness is a more reliable parameter for phrase detection, and in particular, that a crescendo/decrescendo arch is a reliable indicator for the occurrence of a phrase. 2.3 Local Maximum Phrase Detection If we assume that each phrase is characterized by a crescendo/decrescendo arch, then each phrase should also be associated with a local maximum in loudness. The LMPD method uses this local maximum as a mathematical indicator for the existence of a phrase. The method consists of two steps: (1) record number and locations of local loudness maxima for each performance; and, (2) interpret each local maximum as a phrase or sub-phrase. The total number of local maxima in a performance provides a global measure of the degree to which a performer highlights local vs. global phrase structure, while the locations of the local maxima allow us to compare different phrase subdivision strategies. This method also allows us to define additional mathematical descriptors to further quantify the characteristics of phrasing strategy. These descriptors are discussed in the next two sections. 2.3.1 Phrase Strength and Volatility We define the phrase strength (P.S.) of a phrase to be equal to the average loudness difference between its local maximum and the two adjoining local minima:
1 P.S. = [(M i − m j ) + (M i − mk )] , 2
(1)
where Mi is the loudness value of the local maximum, and mj, and mk are the loudness values of the two adjoining local minima as shown in Figure 3. P.S. values allow us to measure the prominence or clarity of a particular phrase.
A Local Maximum Phrase Detection Method for Analyzing Phrasing Strategies
351
Fig. 3. Phrase strength parameters
Fig. 4. Top: Average performance trajectory. Bottom: Number of performers placing a local maxima at a particular location.
The phrase volatility (P.V.) of a particular performance is defined to be the standard deviation of all P.S. values in the performance. Thus, the greater the variability in phrase strengths, the greater the phrase volatility.
352
E. Cheng and E. Chew
2.3.2 Phrase Typicality The phrase typicality (P.T) of a phrase quantifies the popularity of its location. It is defined to be the proportion of other performers who also place a local maximum at the location of the phrase in question. Mathematically, the phrase typicality is given by:
P.T. =
1 [M(i) −1], N −1
(2)
where M(i) is the total number of performers placing a local maximum at location i, and N is the total number of performers. Thus, the greater the number of performers placing a local maximum at a particular location, the greater the phrase typicality of a phrase at that location. Figure 4 shows how M(i) varies from location to location. In the sample performance trajectory, on the top half of Figure 4, the vertical dotted lines indicate local maxima. The histogram on the lower half of Figure 4 shows the number of performers placing a local maximum at a particular location, equivalent to M(i) in Equation 2. The histogram shows, for example, that the peak at bar five is highly typical, and shows up in eleven of the twelve recordings, while a peak at the beat just before it was observed in only one recording.
3 Conclusion and Discussion In conclusion, we have presented a novel method for the analysis of phrasing strategies in expressive performances by equating the occurrence of a local maximum in loudness with the occurrence of a phrase. Preliminary results suggest that a local maximum in loudness is a meaningful expressive event that reliably indicates the presence of a phrase. We defined mathematical descriptors to quantify the characteristics of each phrase. Until we conduct listening tests, we can only hypothesize about the perceptual significance of these descriptors. The fact that we found the loudness data to vary more systematically with the phrase structure, more so than the tempo information, in our case study using Bach’s Sonata for solo violin, BWV 1003, may be an artefact of this particular sonata, Bach’s compositions, the Baroque genre, or of solo violin performances of this kind of music. In a study by Chuan and Chew (2007), the authors found tempo data to reflect well the phrases projected in performances of Chopin’s Preludes (Nos. 1 and 7) for piano, and used tempo variations in Kissin’s and Rubinstein’s recorded performances to analyze the performers’ phrasing strategies. While tempo variation in expressive performance is common in performances of Chopin’s music, or of other compositions of the Romantic genre, performers typically refrain from significant tempo deviations in the performances of music by Bach and his contemporaries. Thus, the expressive strategies employed in the projection of perceptual groupings in such music may be more confined to the management of dynamics in the performance. We posit that the LMPD method and its accompanying mathematical descriptors, while defined here based on loudness information, would extend well to the quantifying of properties of phrases extracted from tempo data. More comprehensive analyses of the twelve Bach recordings will follow.
A Local Maximum Phrase Detection Method for Analyzing Phrasing Strategies
353
Acknowledgements This material is based upon work supported by a Frank H. Buck Scholarship, and by the National Science Foundation under grant No. 0347988. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors, and do not necessarily reflect the views of the Frank H. & Eva B. Buck Foundation, or of the National Science Foundation.
References Beran, J., Mazzola, G.: Analyzing musical structure and performance – a statistical approach. Statistical Science 14(1), 47–79 (1999) Cambouropoulos, E.: The local boundary detection model (lbdm) and its application in the study of expressive timing. In: Proc. of the Intl. Computer Music Conf. (2001) Chuan, C.-H., Chew, E.: A Dynamic Programming Approach to the Extraction of Phrase Boundaries from Tempo Variations in Expressive Performances. In: Proc. of the Intl. Conf. on Music Information Retrieval (2007) Gabrielsson, A.: Once again: the theme from Mozart’s piano sonata in A major (K. 331). In: Action and Perception in Rhythm and Music, pp. 81–103. Royal Swedish Academy of Music, Stockholm (1987) Kabal, P.: An Examination and Interpretation of ITU-R BS. 1387. McGill University, Perceptual Evaluation of Audio Quality (2002) Langner, J., Goebl, W.: Visualizing expressive performance in tempo-loudness space. Computer Music Journal 27(4), 69–83 (2003) Repp, B.H.: Diversity and commonality in music performance: An analysis of timing microstructure in Schumann’s Träumerei. Journal of the Acoustical Society of America 92(5) (1992) Sethares, W.A.: Rhythm and Transforms, Perception and Mathematics. In: Klouche, T., Noll, T. (eds.) MCM 2007. CCIS, vol. 37, pp. 347–353. Springer, Heidelberg (2009) Sundberg, J., Friberg, A., Bresin, R.: Attempts to reproduce a pianist’s expressive timing with director musices performance rules. Journal of New Music Research 32(3), 317–325 (2003) Todd, N.P.M.: The dynamics of dynamics: a model of musical expression. Journal of the Acoustical Society of America 91(6), 3540–3550 (1992)
Subgroup Relations among Pitch-Class Sets within Tetrachordal K-Families Jerry G. Ianni1 and Lawrence B. Shuster2 1
Department of Mathematics, Engineering, and Computer Science LaGuardia Community College/CUNY Long Island City, New York, United States of America [email protected] 2 Department of Music and Dance University of Massachusetts at Amherst Amherst, Massachusetts, United States of America [email protected]
In 1990 and 1991, Henry Klumpenhouwer and David Lewin introduced Klumpenhouwer networks (K-nets) as theoretical tools that display transformational interpretations of dyads contained within pitch-class multisets (Lewin 1990; Klumpenhouwer 1991). Informally, K-nets are directed graphs that employ pitch classes as nodes and elements of the T/I group as edges. In order for a K-net to be well defined, its edges must commute throughout the directed graph and its nodes must map to adjacent nodes according to the corresponding edge transformations. Several types of K-nets emerged by varying the cardinalities of the underlying pitch-class multisets, the number of constituent dyads subject to transformational interpretation, the number of transpositional and inversional operators employed, and the relative positions of these operators. We will work exclusively with two common types of K-nets: trichordal K-nets and box-style tetrachordal K-nets. See Examples 1a and 1b, respectively, for representatives of these two types. Example 1. Two Klumpenhouwer Networks
We will assume that the reader is familiar with various forms of isography, Kclasses, and K-families (O’Donnell 1998; Lambert 2002). However, it is important to note that Lambert's conception of K-classes and K-families as harmonic spaces places emphasis on the sonorities defined by the pitch-class multisets. Accordingly, these objects are defined as collections of pitch-class multisets, not as collections of K-nets. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 354–364, 2009. © Springer-Verlag Berlin Heidelberg 2009
Subgroup Relations among Pitch-Class Sets within Tetrachordal K-Families
355
The sole criterion for a pitch-class multiset to be a member of a K-class or a K-family is that there exists a transformational interpretation of its dyads that satisfies the relations of (one of) the generating K-net diagram(s). Another way to understand this criterion is to restrict formally the notion of strong isography to pitch-class multisets as follows: the pitch-class multisets S and T are called strongly isographic if there are K-net interpretations of S and T, respectively, that are strongly isographic as K-nets. Under this perspective, the pitch-class set [1, 2, 4] is observed to be a member of the I3/I5 trichordal K-class from the T2 K-family via the K-net of Example 1a. On the other hand, if the transformational interpretation of the dyad [2, 4] is changed to a T10arrow emanating from 4, then [1, 2, 4] is observed to be a member of the I5/I3 trichordal K-class from the T10 K-family. One also finds that the T2 trichordal K-family is identical to the T10 trichordal K-family using this point of view. Our main goals are to probe the algebraic structure of trichordal and box-style tetrachordal K-classes, to develop some voice-leading applications among the pitch-class multisets that appear in a given K-class, and to show how these relations inform the harmonic design of arrays in the music of Igor Stravinsky. In 2002, Philip Lambert observed the following characteristic in many K-classes: whenever a pitch-class multiset from a given K-class is transposed by six semitones, the resulting pitch-class multiset belongs to the same K-class. He called it the T6 – partner (Lambert 2002). It can be proven easily that this characteristic holds in general for arbitrary K-classes. Thus, if C is a K-class, we are motivated to define a symmetry of C to be an element of the T/I group that acts on C by permuting its members. The group action under consideration is function evaluation. It then follows that both T0 and T6 are symmetries of every K-class. Other elements of the T/I group can be symmetries of one or more particular K-classes. For example, I4 and I10 are symmetries of the I3/I5 trichordal K-class, and I5 and I11 are symmetries of the I0/I10 Kclass from the box-style tetrachordal T3/T7 K-family. See Examples 2 and 3, respectively, for details on the actions of these symmetries. The three results below demonstrate that symmetries of K-classes can be sorted algebraically in very meaningful ways. We will only prove the first result in this paper. The other two are more technical and require supporting lemmas. However, we will provide a heuristic and a brief outline of the arguments for Theorems 2 and 3 after presenting their statements. Example 2. Four Symmetries – I3/I5 trichordal K-class
356
J.G. Ianni and L.B. Shuster
Proposition 1. Let C be a K-class. Then the set G consisting of all symmetries of C is a subgroup of the T/I group. Proof. The action of the T/I group on the set consisting of all pitch-class multisets induces an action on its power set. C is an element of this power set, and G is the stabilizer subgroup of C under the induced action. ❚ We call the stabilizer subgroup G of Proposition 1 the (full) group of symmetries for C. As an exercise, the reader can verify that the groups of symmetries for the K-classes given in Examples 2 and 3 are {T0, T6, I4, I10} and {T0, T6, I5, I11}, respectively. Example 3. Four Symmetries – I0/I10 K-class from the Box-style tetrachordal T3/T7 K-family
Theorem 2. Let C be a trichordal K-class, and let G be its group of symmetries. Suppose that the transformational operators that specify C are Tn, Ij, and Ij+n, where n 6 and j 11 are non-negative integers. a. b. c. d. e.
If n = 1, 3, or 5, then G = {T0, T6}. If n = 0, then G = {T0, T6, Ij, Ij+6}. If n = 2, then G = {T0, T6, Ij+1, Ij+7}. If n = 4, then G = {T0, T6, Ij+2, Ij+8}. If n = 6, then G = {T0, T3, T6, T9, Ij, Ij+3, Ij+6, Ij+9}.
Informal Sketch of Proof. To facilitate discussion of the proof of Theorem 2, we introduce some terminology. A transpositional dyad is a dyad whose constituent pitch classes have been declared to be related transpositionally. Similarly, an inversional dyad is a dyad whose constituent pitch classes have been declared to be related inversionally. Using this terminology, observe that a pitch-class multiset of cardinality three contains one transpositional dyad and two inversional dyads relative to each of its K-net interpretations. Let C be a trichordal K-class, and let σ be a symmetry of C. Since σ is an element of the T/I group, its action preserves the interval classes of the constituent dyads of each member of C. It can be proven that C contains a member S whose transpositional dyad’s interval class is distinct from the interval classes of the two inversional dyads. Thus, σ maps the transpositional dyad of S onto the transpositional dyad of
Subgroup Relations among Pitch-Class Sets within Tetrachordal K-Families
357
σ(S) and the two inversional dyads of S onto the two inversional dyads of σ(S). The identification of all the symmetries proceeds by examining the general forms of S, each transpositional symmetry Tk, and each inversional symmetry Il, respectively: S = [a, b, a + n], Tk(S) = [a + k, b + k, a + n + k], and Il(S) = [l – a, l – b, l – a – n]. Use the facts that the transpositional (inversional) dyads map to each other to obtain algebraic restrictions on the value(s) of k and l. By analyzing these restrictions, we can derive Theorem 2. ❚ In preparation for the statement of Theorem 3, we introduce a standard form for each box-style tetrachordal K-class C. It can be proven that C is generated by a unique Knet diagram whose transpositional arrows Tm and Tn satisfy the following conditions: m n are both non-negative integers, both transpositional arrows are oriented horizontally from left to right as in Example 1b, the Tm arrow is on the bottom, and either. a. b.
m v 0 and m + n 12; or m = 0 and n 6.
We state that C is a member of the box-style tetrachordal K-family indexed by (m, n). From the restrictions on possible values of m and n, we can deduce that there are forty-three distinct box-style tetrachordal K-families. We now state Theorem 3. Theorem 3. Let C be a box-style tetrachordal K-class, and let G be its group of symmetries. Suppose that the transformational operators that specify C are Tm, Tn, Ij, and Ij+m+n, where j 11 is a non-negative integer and the ordered pair (m, n) indexes the standard form of C. a. If m + n is odd and n v 6, then G = {T0, T6}. b. If m + n is odd and n = 6, then G = {T0, T3, T6, T9}. c. Suppose m + n is even, and choose r such that m + n = 2r. If n v 6, then G = {T0, T6, Ij+r, Ij+r+6}. d. Suppose m + n is even, and choose r such that m + n = 2r. If n = 6, then G = {T0, T3, T6, T9, Ij+r, Ij+r+3, Ij+r+6, Ij+r+9}. Informal Sketch of Proof. Unfortunately, it does not seem possible to use the same heuristics to discover a proof for Theorem 3 as we did for Theorem 2. To illustrate one difficulty, observe that the I11/I5 K-class from the box-style tetrachordal T1/T5 K-family consists of pitch-class multisets that each contain inversional and transpositional dyads with the same interval class. Thus, we cannot easily rule out the possibility of a symmetry for a box-style tetrachordal K-class mapping a transpositional dyad onto an inversional dyad. However, we can make progress by considering the general form of a member of C: [a, a + m, j – a, j – a + n]. Observe that the sum of the pitch classes in each member of C is the same, namely 2j + m + n. The only transpositional operators that preserve this sum are T0, T3, T6, and T9. So, none of the other transpositional operators can possibly be a symmetry of C. By exploring the invariant sum property in conjunction with general forms for transpositional symmetries Tk and in❚ versional symmetries Il, we can derive Theorem 3 through algebraic analysis. If S is a member of a trichordal K-class C, then S can be written in the form [a, b, a + n] for suitably chosen non-negative integers a, b, and n. In this representation, the
358
J.G. Ianni and L.B. Shuster
transpositional dyad of S is [a, a + n], and the inversional dyads are [a, b] and [b, a + n]. Klumpenhouwer specifically noted that strong isography is preserved throughout the progression of trichordal pitch-class multisets obtained by setting the transpositional dyad [a, a + n] in contrary motion to the singleton [b] (Klumpenhouwer 1991). This overall progression can be represented by [a + j, b – j, a + n + j] for j = 0, 1, 2, …, 11. This voice-leading context provides a basis to perceive aurally that T6 is a symmetry of every trichordal K-class: contrary motion by 6 semitones leads to the same pitch-class multiset as parallel motion by 6 semitones. This fact corresponds to the mathematical observation that [a + 6, b – 6, a + n + 6] is the same pitch-class multiset as [a + 6, b + 6, a + n + 6]. Is there a basis to perceive aurally the inversional symmetries that sometimes appear? In the trichordal case, suppose that the two voices of the transpositional dyad [a, a + n] are set in contrary motion towards each other. At the same time, suppose that the singleton voice b is split into two voices that move in contrary motion away from each other. This overall progression is represented by tetrachordal pitch-class sets [a + j, b – j, b + j, a + n – j] for j = 0, 1, 2, …, 11. The inversional dyads [a, b] and [b, a + n] of the original trichord correspond to the inversional dyads [a + j, b – j] and [b + j, a + n – j], respectively, in the progression of tetrachordal pitch-class sets. Observe that a + b = (a + j) + (b – j) and b + (a + n) = (b + j) + (a + n – j). Thus, even though the number of semitones separating the voices of the transpositional dyad [a + j, a + n – j] varies, we find that double axial isography is preserved throughout the progression. Moreover, if the pair of voices from the original transpositional dyad meet, then the number of semitones of motion j satisfies a + j = a + n – j or 2j = n. At the precise moment of meeting, the original singleton voice b will have expanded into a dyad [b – j, b + j] whose voices are (b + j) – (b – j) = 2j = n semitones apart. We obtain a trichordal pitch-class set [b – j, a + j = a + n – j, b + j] that belongs to the same Kclass as [a, b, a + n]. However, the roles of the voices have switched, and the overall correspondence of the pitch-class sets manifests the inversional symmetry Ia+b+j. Since n = 2j is even and a + b + j is the average of a + b and a + b + n, this disclosure of Ia+b+j as an inversional symmetry is consistent with Theorem 2. A similar basis for the aural perception of inversional symmetries of box-style tetrachordal K-classes is also available. The key idea for this voice leading is to set the two voices of one transpositional dyad in contrary motion towards each other and to set the two voices of the other transpositional dyad in contrary motion away from each other. If these pairs of dyads exchange their transpositional distances, then the resulting tetrachordal pitch-class multiset will be an inversional image of the original tetrachord that belongs to the same K-class. We emphasize that our voice-leading model does not preserve strong isography among the tetrachordal pitch-class multisets obtained in the progression because the corresponding transpositional dyads do not have identical interpretations. However, double axial isography is preserved through the unfolding of the two wedge formations between the individual voices. This fact provides the basis for analytical application of various linear relationships in the arrays of Igor Stravinsky, and it also extends the theoretical framework developed by Philip Stoecker in his work on (single) axial isography (Stoecker 2002). In this context, recall that two trichordal K-nets are axially isographic if they have an inversional operator in common.
Example 4. Tetrachordal sets in Stravinsky’s Threni (mm. 405-417)
Subgroup Relations among Pitch-Class Sets within Tetrachordal K-Families
359
360
J.G. Ianni and L.B. Shuster
Example 4 presents the concluding chorale from Stravinsky’s Threni, measures (405-417) as analyzed by Joseph Straus (2001) in his text Stravinsky’s Late Music. Example 5 demonstrates the four-part pre-compositional array Stravinsky employed to generate the harmonies of the chorale. The four-part array consists of a series of twelve vertical columns and four horizontal rows. Each horizontal row contains a complete statement of the four primary row forms, designated P, I, R and IR respectively. Each vertical column of the array consists of a collection of four pitch-class multisets that Stravinsky employs as the harmonies of the chorale, evident by the segmentations indicated by Straus. The twelve sonorities of the array are evenly distributed between two phrases that combine to form a melodic period (see Example 5). Phrase 1, performed by the chorus (mm. 405-408), contains sonorities 1-7. Phrase 2, performed by the soloists (mm. 408-411), contains sonorities 7-12. The cadence in measure 408 establishes a formal articulation that defines the binary phrase structure of the passage. Continuity is established by use of harmonic dove-tailing with the final sonority of phrase 1, set 7, also serving as the initial sonority of phrase 2. In the second period of the chorale (mm. 412-419), the same binary phrase structure recurs with the twelve sonorities of the array now appearing in retrograde order. Example 5. Four-part array from Stravinsky’s Threni (mm. 405-417)
A diverse collection of sonorities appear within the array as shown in Example 5. These include seven distinct set-class types representing six trichords, five tetrachords and a single dyad {(03), (0235), (015), (026), (0127), (0347), (0369)}. Isolated “pockets” of harmonic correspondence result from the repetition of the trichordal set class (026) in order positions 4 and 5 and again in order positions 8 and 9 of the array. In both instances, the transformation that maps the two subclasses onto one another is I9. Similar instances of set-class equivalence include the (015) trichords that appear in order positions 3 and 6, and the (0127) tetrachords present in order positions 7 and 10. In both cases, the transformation that maps each respective subclass onto the other is also I9. The distribution of these various set-class types establishes a simple symmetrical pattern that provides an important source of harmonic shaping and reinforces the binary phrase structure of the passage. While successful in characterizing the individual sonorities that comprise the array and revealing instances of harmonic correspondence between them, set-class theory remains incapable of disclosing the structural forces responsible for promoting unity and coherence between all the sonorities within the array.
Subgroup Relations among Pitch-Class Sets within Tetrachordal K-Families
361
Example 6. Double Axial Isography in Stravinsky’s Threni, (mm. 405-417)
Additional forms of correspondence demonstrated by Straus prove more rewarding. Straus illustrates how Stravinsky’s four-part array can be divided along a horizontal axis in order to produce pairs of inversionally related dyads as demonstrated in Example 6. For each respective order position within the array, pitch classes contained in the P and I row forms sum to the common index sum 6, whereas all pitch classes in the R and IR row forms sum to the index sum 0. The succession of inversionally related pairs of dyads establishes an important source of abstract unity and consistency amongst the sonorities of the array in pitch-class space. Different organizational features are revealed when we shift focus from consideration of dyads to sets of larger cardinalities. Example 6 interprets the twelve sonorities of the array as a series of box-style tetrachordal K-nets. In order to maintain the important inversional relationships reflected within the array, the I-related dyads remain paired in the subsequent K-net interpretation. Accordingly, the first dyad consisting of elements of the P and I row forms appears as the left-hand inversional dyad within the tetrachordal box-style graph configuration and the second dyad that includes pitch classes from the R and IR row forms appears as the right-hand inversional dyad in each box-style tetrachordal graph. Limited instances of isographic correspondence are apparent within the succession of tetrachordal K-nets. Graphs 11 and 12 are related by strong isography at. The strong correspondence exhibited between these sonorities provides a stark contrast within the otherwise disparate succession of unrelated K-classes. As a result of the retrograde distribution of array sonorities, these same K-classes recur as the initial sonorities of the first phrase of the second period (mm. 412-417). Thus, strong isography characterizes both the termination of the first melodic period (graphs 11 and 12) and the onset of the second period (graphs 12 and 11). These correspondences emerge as important aural markers within the harmonic design of the passage. Moreover, the harmonic overlap observed between melodic periods reflects, on a larger structural
362
J.G. Ianni and L.B. Shuster
level, Stravinsky’s previous use of harmonic dovetailing apparent in the repetition of graph 7 located at the juncture between phrases 1 and 2 in first melodic period. Graphs 4 and 5 are also related by strong isography, as well as by subclass affiliation as members of the same (026) trichordal set-class. Apart from these localized instances, however, traditional forms of isographic correspondence are unable to establish a broader context in which to characterize all the sonorities of the array as a unified, as opposed to diffuse, succession of harmonies. As a consequence of inversional symmetry, we observe that each of the twelve tetrachordal K-nets share an identical collection of I-operators as demonstrated in Example 6. Each graph contains an I6 operator located on the left-side of the graph, and an I0 operator on the right-side. The common I-operators typify the large-scale double wedge voice-leading motion that connects each sonority within the array to its neighbors. In each case, the successions of dyads that sum to 6 establish one wedge motion, and those dyads that sum to 0 establish the remaining wedge motion. Whereas Stoecker’s original formulation of single axial isography preserves one inversional operator between two trichordal K-nets, double axial isography preserves both of the inversional operators between two box-style tetrachordal K-nets. While conventional isographies require an angle of hearing that emphasizes the presence of invariant, or in the case of negative isography, complementary, transpositional operators, double axial isography requires us to shift orientation in order to focus on the invariant inversional operators present in each tetrachordal K-net. The invariant inversional operators function as dual inversional axes with each successive tetrachordal pitch-class set providing a different rotation of pitch classes about the inversional axes. Our notation for double axial isography, as it appears in Example 6, is an operator designated by <(x, y)> where the terms, x and y, correspond to the difference between respective top and bottom T-operators between contiguous graphs. These inversional relationships are manifest in pitch-class space in the form of double wedge voice-leading patterns that incorporate all sonorities of the array as members of a single large-scale voice-leading configuration. While successful in revealing a linear perspective of how the sonorities within the array are connected, axial isography provides little information regarding the vertical organization of the sonorities and the factors that establish harmonic affiliations between them. Consideration of the groups of algebraic symmetries that generate the pitch-class sets associated with individual K-class representatives provides additional perspective concerning the harmonic organization of the array sonorities. Whereas axial isography is construed on the basis of identical I-labels reflected between network graphs, correspondences involving algebraic symmetries are contingent upon common inversional sums apparent when the corresponding I-operators of one graph sum to the same amount as the I-labels of another graph. Because each graph in the array from Threni consists of identical pairs of inversional operators (I0 and I6), their corresponding sums will consequently be the same: 6. However, in other cases, pairs of I-operators belonging to discrepant K-classes will differ while their respective sums remain the same. For instance, despite their different labels, we know from Theorem 3 that the group of symmetries for each of the following Kclasses contains at least four common operators: I1/I5, I2/I4, I3/I3, I4/I2, I5/I1, I6/I0. In each case, the sums of respective I-operators sum to 6. Specifically, the group of symmetries for each representative K-class contains the operators T0, T6, I3, and I9.
Subgroup Relations among Pitch-Class Sets within Tetrachordal K-Families
363
The single exception is presented in graph 9 (see Example 6). Due to the high degree of internal symmetry inherent in this pitch-class set [T, 2, 4] and its corresponding K-net interpretation (K-class I0/I6), additional symmetrical correspondences are also manifest. In addition to the operators {T0, T6, I3, I9} that are shared between all K-classes in the example, this particular set also includes the additional operators {T3, T9, I0, I6} and thus manifests the fourth case stated in Theorem 3, part d. Despite the presence of these additional operators, all sonorities in the array are unified by the fact that every K-class consists of the subgroup of common symmetries {T0, T6, I3, I9}. This unity extends beyond the fact that only T0 and T6 are guaranteed as common operators. The result is an elegant algebraic unity that transcends traditional K-class distinctions by establishing a new theory of harmonic correspondence based on the presence of equivalent transformational operators, as opposed to equivalent objects. The specific pitch-class collections that appear in the array from Threni represent one of many possible transformational pathways within the larger harmonic network characterized by the common group of symmetries. Any given transformational pathway within the network can potentially employ all of the transformational operators included in the algebraic subgroup, fewer, or even none in cases where only a single set-class representative is present in the particular pathway realization. All potential pathways, however, will involve a voice-leading design consisting of the double wedge motions characteristic of axial isography. In the array from the Threni excerpt, Stravinsky elected to navigate a transformational pathway that emphasizes the I9 operator within the greater harmonic space informed by the subgroup of common symmetries {T0, T6, I3, I9}. Of the twelve sonorities present in the array, there are four instances of set-class duplication. Order positions 4, 5, 8 and 9 present (026) trichordal subclasses related by I9. The (0127) tetrachords that appear in order positions 7 and 10 are also related by I9 as are the (015) trichordal subclasses that appear in order positions 3 and 6. The four remaining sets in order positions 1, 2, 11 and 12 employ only the identity transformation T0 due to the fact they are not affiliated by subclass relation with any other member sonority of the array. Axial isography and correspondences involving subgroups of common symmetries go hand in hand. Axial isography provides a model that characterizes how successive sonorities within the array are connected through wedge-voice leading pathways in pitch-class space. Together, axial isography and subgroup correspondences combine to establish a means in which to confirm our musical intuitions of Stravinsky’s chorales as distinct, unified wholes. Similar organizational features are reflected in the arrays that appear in numerous other works by Stravinsky including Movements, Agon, Epitaphium, and The Flood amongst others. Acknowledgements. Jerry G. Ianni wishes to acknowledge two sources of grant support at LaGuardia Community College for the research presented in this paper: the Educational Development Initiative Team and the Division of Academic Affairs. Both authors are very grateful to Philip Lambert for many helpful comments and suggestions. In addition, we also gratefully acknowledge the valuable comments, suggestions, and insights offered by Norman Carey, David Headlam, Patty Howland, Robert Morris, Thomas Noll, and Joe Straus.
364
J.G. Ianni and L.B. Shuster
References Armstrong, M.A.: Groups and Symmetry. Springer, NewYork (1988) Klumpenhouwer, H.: A Generalized Model of Voice-Leading for Atonal Music. Ph.D. dissertation, Harvard University (1991) Lambert, P.: Isographies and Some Klumpenhouwer Networks They Involve. Music Theory Spectrum 24(2), 165–195 (2002) Lewin, D.: Klumpenhouwer Networks and Some Isographies that Involve Them. Music Theory Spectrum 12(1), 83–120 (1990) Lewin, D.: A Tutorial on Klumpenhouwer Networks Using the Chorale in Schoenberg’s Op. Journal of Music Theory 11(2) 38(1), 79–101 (1994) O’Donnell, S.: Klumpenhouwer Networks, Isography, and the Molecular Metaphor. Integral 12, 53–80 (1998) Stoecker, P.: Klumpenhouwer Networks, Trichords and Axial Isography. Music Theory Spectrum 24(2), 231–245 (2002) Straus, J.N.: Stravinsky’s Late Music. Cambridge University Press, Cambridge (2001)
K-Net Recursion in Perlean Hierarchical Structure Gretchen C. Foley University of Nebraska-Lincoln [email protected]
1 Introduction Klumpenhouwer networks, or K-nets, are graphic representations of the intervallic relationships among elements of a set.1 Theorist David Lewin has suggested that Knets may be applicable to Perle cycles, entities referred to by George Perle in his theory of twelve-tone tonality as cyclic sets. These entities are created through the alternation of inversionally related interval cycles. The present study seeks to broaden the applicability of K-nets in Perle’s theory by exploring their recursive nature at varying levels of structure.
2 K-Nets and Perle Cycles The interval cycle provides the foundation for Perle’s system of twelve-tone tonality. Perle uses two inversionally related interval cycles to form a cyclic set, hereinafter referred to by the more recently coined term Perle cycle. Ex. 1 shows one such Perle cycle formed by inversionally related interval 7 cycles. In this formation any given pitch class (pc) forms a pair of sums with its neighbor notes. These sums repeat throughout the Perle cycle and are called tonic sums. The tonic sums provide the Perle cycle with its name; the Perle cycle in Ex. 1 is 2,9. Further, the cyclic interval itself may be determined by subtracting the first tonic sum from the second (tonic sum 9 – tonic sum 2 = 7, the cyclic interval). Example 1. Perle cycle 2,9
Although the two bracketed trichordal segments in the example belong to two different set classes (scs) – sc 3-9 (027) and sc 3-4 (015), respectively – they share similarities in their internal structure since they both derive from the same symmetrical Perle cycle. Fig. 1 displays these segments graphically as K-nets in order to view their internal relationships. 1
Lewin named these networks after his former student Henry Klumpenhouwer, who extended Lewin’s original conception of networks to include inversional as well as transpositional relationships (Lewin 2003).
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 365–374, 2009. © Springer-Verlag Berlin Heidelberg 2009
366
G.C. Foley
Fig. 1. K-nets of trichordal segments from Perle cycle 2,9
The nodes of the K-nets are filled with the pcs of the two segments, and the arrows connecting the nodes show the specific relationships between them. The first two networks of Fig. 1 are strongly isographic because their graphs are isomorphic: they have identical node and arrow configurations, with the same transformations associated with corresponding arrows. The T7 value represents the cyclic interval 7, and the I2 and I9 values represent the tonic sums formed by the adjacent pcs in the Perle cycle. Thus, despite the fact that the two segments are members of different scs, they both comprise identical internal relationships, as uncovered by the K-nets. Moreover, any pair of trichordal segments from the particular Perle cycle of Ex. 1 can be shown to be strongly isographic. The third network in Fig. 1 displays this recursion in the more abstract K-graph; the arrows indicate the transformational operations shared by all trichordal segments of the Perle cycle. Any three adjacent pcs from this Perle cycle can be fitted into the nodes of this K-graph. The alignment of the ascending and descending interval 7 cycles in Ex. 1 is one of twelve possible alignments. The interval cycles may rotate in relation to one another, creating a different set of repeating tonic sums. Rotating the descending interval 7 cycle twice results in a new Perle cycle, identified in Ex. 2 as 0,7. Example 2. Perle cycle 0,7
The K-nets of all trichordal segments from this new Perle cycle will be strongly isographic with one another. However, a comparison of these bracketed segments with those from Ex. 1 reveals that their K-nets are not strongly isographic, due to the new tonic sums resulting from the shifted cycle. Instead, the K-nets are said to be positively isographic. In Fig. 2 K-graphs representing trichordal segments from both Perle cycles show identical T values between corresponding nodes, but with the consistent difference of ten (t) in their corresponding I values. Lewin expressed this relationship with the symbol .2 This value also represents the difference
2
Lewin (2003, 88).
K-Net Recursion in Perlean Hierarchical Structure
367
Fig. 2. relationship between Perle cycles 2,9 and 0,7
between the corresponding tonic sums of the two Perle cycles, as given below the K-graphs.3
3 K-Nets, Arrays, and Axis-Dyad Chords Any two Perle cycles in a vertical alignment form an array, which takes its name from the combined Perle cycles; in Ex. 3 two aligned Perle cycles form the array 2,9/0,7. Furthermore, just as the interval cycles within a Perle cycle are not fixed in relation to one another, the Perle cycles themselves may rotate within the larger array. Such rotation creates two types of vertical alignments between the Perle cycles. Ex. 3 illustrates the first type, in which the corresponding ascending and descending forms are aligned. Perle refers to this type as a difference alignment. In a sum alignment (not shown), the ascending cycle of one Perle cycle is aligned with the descending cycle of the other. Example 3. Array 2,9/0,7
The array functions as a resource from which to reap melodic and harmonic material. Perle segments arrays into units of varying size; the main unit is the axisdyad chord, a hexachordal collection formed by combining trichordal segments from each of the Perle cycles. In the array above, one such axis-dyad chord may be
3
Lewin identified one other relationship exhibited by K-nets, negatively isographic networks. Sets in such relationships have networks of identical node and arrow configurations, but their T values are complementary rather than identical, and their corresponding I values show the same sum rather than the same difference. Lewin symbolized negatively isographic networks as; here n equals the sum of the analogous I arrows.
368
G.C. Foley
constructed by two vertically aligned trichords 7 7 2 and 7 5 2.4 If the 0,7 Perle cycle were rotated two places to the left, the resulting axis-dyad chord would comprise trichords 7 7 2 and 2 t 9. Due to the symmetrical nature of the array, all axis-dyad chords from the same array in the same type of alignment (either difference or sum) will be isomorphic. The first K-graph in Fig. 3 represents the recursive fit of all axis-dyad chords drawn from that same array’s difference alignment, while the second K-graph represents the recursive fit of all other axis-dyad chords drawn from that array’s same sum alignment. Chords with strongly isographic K-nets may also be generated by symmetrical rotation, that is, moving the upper and lower trichordal segments by the same amount in the opposite direction. Hence, the dotted lines connecting the nodes in the upper and lower trichords indicate the variable status of the two Perle cycles of the array in relation to one another. Conversely, axis-dyad chords resulting from an asymmetrical shifting of the Perle cycles instead will have positively or negatively isographic K-nets. The latter (not shown) comprise complementary I values between corresponding nodes and identical sums formed by corresponding T values.5
Fig. 3. K-graphs of axis-dyad chords in difference and sum alignments
4 K-Nets and Array Relationships The closest relationships among arrays are those of transposition and inversion. Transposing an array entails adding a constant even integer to each of the four tonic sums. For example, the arrays 0,7/4,e and 4,e/8,3 are related by T4, since 4 is added to each of the tonic sums. At the same time, transposed arrays also retain the same cyclic intervals, which together form the array’s interval system. The interval system for the two transpositionally related arrays mentioned above is 7,7. Consequently, all arrays related by transposition will share strongly isographic relationships among their tonic sums, as shown in Fig. 4. 4
Perle recognizes repetitions of a pc within a segment as independent entities rather than multiple instances of a single member of a collection. Since any collection may include pc duplication, a hexachordal collection in Perle’s terms may contain less than six distinct elements. The same obtains for dyads, trichords, and tetrachords. 5 For more information on these relationships, see Foley (2002).
K-Net Recursion in Perlean Hierarchical Structure
369
Arrays related by inversion share the same sum between corresponding tonic sums. In addition, their interval systems are complementary rather than identical. In Fig. 5 the array 0,7/4,e comprises interval system 7,7, while the inversionally related array 2,7/t,3 is built on the interval system 5,5. Fig. 5 also demonstrates that inversionally related arrays will have negatively isographic K-nets.
Fig. 4. Strong isography between tonic sums of transpositionally related arrays
Fig. 5. Negative isography between tonic sums of inversionally related arrays
5 K-Nets, Interval Systems, Modes, and Keys Every array contains three pairs of tonic sums, the primary, secondary, and tertiary tonic sum pairs, formed by the horizontal, vertical, and diagonal pairs, respectively. Fig. 6 represents these three pairs as K-nets. In the first K-net, Fig. 6a, the interval system of array 0,7/4,e is realized as the intervallic distance between the tonic sums in each horizontal pair, namely 7,7. All arrays with the same interval system of 7,7, such as array 2,9/1,8 or array 6,1/5,0 can be fitted into the corresponding K-graph below the first K-net. Arrays of differing interval systems may also be related, via modes and keys. These relationships establish connections across the Perle cycles within the array. The
370
G.C. Foley
Fig. 6 a-c. Three pairs of tonic sums in array 0,7/4,e
mode reflects the difference between the corresponding tonic sums between the two Perle cycles. In the second K-net, Fig. 6b, array 0,7/4,e has a modal designation of 4,4 (as 4-0 and e-7). Arrays of any combination of interval systems may belong to the same mode if they preserve the same differences between their corresponding tonic sums. For example, the array e,2/3,6 (interval system 3,3) also belongs to mode 4,4, as does array 0,1/4,5 (interval system 1,1). All such arrays whose corresponding tonic sums differ by four may recur in Fig. 6b’s corresponding K-graph of mode 4,4. In the same way, mode 0,0 contains all arrays whose tonic sums differ by zero, and so on. A mode with two different elements in its name, such as mode t,8, results from arrays whose interval systems comprise different cyclic intervals, such as arrays 1,6/e,2 (interval system 5,3) and 1,7/e,3 (interval system 6,4). Each array may belong to only one mode, of which there are 144 in the universe of twelve-tone tonality. The key reflects the sums of symmetrically aligned tonic sums between the two Perle cycles in an array. The third K-net, Fig. 6c, shows how array 0,7/4,e has a key designation of e,e (0+e and 7+4). As with modes, keys may relate arrays of the same or different interval systems. Therefore, key e,e comprises all arrays whose oppositely aligned tonic sums sum to eleven, such as arrays 3,6/5,8 (interval system 3,3) or 4,5/6,7 (interval system 1,1). The corresponding K-graph of Fig. 6c represents all such arrays in key e,e. Likewise, key 7,3 contains all arrays whose symmetrically aligned tonic sums add up to 7 and 3, such as array 1,t/5,6 (interval system 9,1), all of which can be fitted into a K-graph whose diagonal arrows indicate I7 and I3 transformations. Each array may belong to only one key, of which there are 144 in total.
6 K-Nets and Synoptic Arrays As described above, relationships among arrays include transposition, inversion, and membership in the same mode or key. Yet collections of arrays lacking these associations may be related at more fundamental levels. Perle establishes such
K-Net Recursion in Perlean Hierarchical Structure
371
connections in his concepts of synoptic arrays, in which reside arrays with related interval systems. A synoptic array divides into two categories, synoptic modes and synoptic keys.6 Just as how a mode includes all those arrays whose individual tonic sums in the component Perle cycles differ by the same value, a synoptic mode includes all those arrays whose cyclic intervals between the tonic sums differ by the same value. There are seven different synoptic modes, numbered from 0 to 6, determined by the interval class difference between the two cyclic intervals in the interval system. Fig. 7 illustrates the array 0,7/4,e and its interval system of 7,7 in the first K-net. The difference between the cyclic intervals in this interval system is zero, and so the array belongs to synoptic mode 0. Likewise, other arrays in synoptic mode 0 display a difference of zero between their cyclic intervals. For example, the interval system of array 5,6/3,4 is e,e and the interval system for array 1,4/6,9 is 3,3. The more abstract K-graph to the right of the K-net in Fig. 7 represents the recursion of all arrays in synoptic mode 0. Consequently, all arrays in the same synoptic mode have interval systems related by transposition. Each of the seven synoptic modes represents a large grouping of arrays. There are twelve combinations of interval systems that differ by 1; within each of these interval systems there are 144 arrays. Thus, each synoptic mode contains 1,728 (as 12x144) different arrays whose interval system differs by the same interval class.
Fig. 7. K-nets of arrays in synoptic mode 0
In contrast to the synoptic mode, a synoptic key includes all those arrays whose component cyclic intervals sum to the same value. Just as how a key comprises all arrays whose symmetrically-aligned tonic sums in the component Perle cycles add up to the same sum, a synoptic key comprises all those arrays whose cyclic intervals between the tonic sums add up to the same sum. There are seven synoptic keys, numbered from 0 to 6. While the arrays 0,7/4,e, 1,6/3,0 and 2,5/2,1 all have differing interval systems, their component cyclic intervals sum to the same amount, namely 6
Perle’s nomenclature is more precise, first dividing “master arrays” into master modes and master keys, to distinguish between arrays given in difference and sum alignments (Perle 1996, 103), then dividing “synoptic arrays” into synoptic modes, comprising master arrays with corresponding cyclic interval differences, and synoptic keys, comprising master arrays with corresponding cyclic interval sums (Perle 1996, 195).
372
G.C. Foley
Fig. 8. K-nets of arrays in synoptic key 2
sum 2, thereby relating the arrays via membership in synoptic key 2. These relationships are evident in the K-nets and K-graph of Fig. 8. As with the synoptic mode, there are 1,728 different arrays in each synoptic key.
7 K-Nets and Tonality Tonality represents the most abstract relation among arrays in Perle’s theory, and is based on the concept of the axis of symmetry. Any symmetrical collection of pcs theoretically contains an axis around which the various pcs are symmetrically positioned. The smallest such collection is a dyad. For every even sum n the two axial points are calculated as n/2 and n/2 + 6. For every odd sum n the two axial points are calculated as (n±1)/2 and ((n±1)/2) + 6. Arrays represent concatenations of dyads; therefore arrays with the same aggregate sum will be symmetrically related and have the same axis of symmetry. Arrays so related fall into one of three tonalities as categorized by Perle. Tonality 0 includes all arrays whose tonic sum aggregates are 0, 4, or 8. The axial points consist of two repeated even integers, and so are simply transpositions of one another by an even value of Tn, as shown in Fig. 9. Tonality 1 constitutes all arrays whose tonic sum aggregates are represented by an odd integer. Their axes are also transpositionally related, and include one even and one odd integer that differ by 1.7 Third, all arrays whose tonic sum aggregates are 2, 6, or t belong to Tonality 2. Their axes consist of a pair of repeated odd integers, also transpositionally related by an even value of Tn. Although Tonality 0 and Tonality 2 both contain arrays of even aggregate sums, the arrays of the former are not symmetrically equivalent to those of the latter, because their axes are from even/even and odd/odd integer pairs, respectively. Perle contends that all arrays from the same category of aggregate sum belong to the same tonality and share the same axis of symmetry. Thus, the symmetrical relations extend from pcs within a single dyad to massive groupings of arrays. 7
Perle further subdivides Tonality 1 based on transpositional equivalence: the first encompasses arrays with aggregate sums 1, 5, and 9; the second, arrays of aggregate sums 3, 7, and e (Perle 1996, 141).
K-Net Recursion in Perlean Hierarchical Structure
(a)
(b)
(c) Fig. 9. (a) Tonality 0. (b) Tonality 1. (c) Tonality 2.
373
374
G.C. Foley
8 Summary This study focused on the relational aspects of Perle’s theory, from the most basic level of Perle cycles through increasingly abstract entities, culminating in Perle’s conception of tonality. The recursion of K-nets effectively demonstrated the structural integrity at the core of Perle’s twelve-tone tonality and infused in all its hierarchical layers.
References Foley, G.C.: Arrays and K-Nets: Transformational Relationships Within George Perle’s Theory of Twelve-Tone Tonality. Indiana Theory Review 23, 69–97 (2002) Lewin, D.: Thoughts on Klumpenhouwer Networks and Perle-Lansky Cycles. Music Theory Spectrum 24(2), 196–230 (2003) Perle, G.: Twelve-Tone Tonality, 2nd edn. University of California Press, Berkeley (1996)
Webern’s Twelve-Tone Rows through the Medium of Klumpenhouwer Networks Catherine Nolan University of Western Ontario [email protected]
The theory of Klumpenhouwer networks (K-nets) in contemporary music theory continues to build on the foundational work of David Lewin (1990) and Henry Klumpenhouwer (1991), and has tended to focus its attention on two principal issues: recursion between pitch-class and operator networks and modeling of transformational voice-leading patterns between pitch classes in pairs of sets belonging to the same or different Tn/TnI classes.1 At the core of K-net theory lies the duality of objects (pitch classes) and transformations (Tn and TnI operators and their hyper-Tn and hyper-TnI counterparts). Understood in this general way, K-net theory suggests other avenues of investigation into aspects of precompositional design, such as connections between Knets and Perle cycles, K-nets and Stravinskian rotational or four-part arrays, and between K-nets and row structure in the “classical” twelve-tone repertoire. K-net theory and analytical application are contingent on consistently textured musical spaces. That is to say, isographic relations depend in the first instance on graphs of the same size, which in turn reflect some basic musical consistency, such as the same number of voices in a musical passage, the same number of transformations in multi-level network recursions, or the same size of pitch-class segments in a cycle or row. The twelve-tone rows of Webern, known for their high degree of intervallic control, invite closer investigation of their even partitions from a transformational perspective. K-nets offer an effective technology for such an investigation because of their predisposition toward trichords and tetrachords, and because of their capacity for revealing dynamic relations within and between equivalent and non-equivalent pitchclass sets of the same size. The transformational as opposed to taxonomic perspective on relations between successive row segments offers new insights into the integrity of Webern’s rows, and poses beneficial questions about K-net interpretations of aggregate partitions for future investigation. This essay focuses on twelve-tone rows of Webern whose disjunct trichords and tetrachords belong to one or two set classes, and represents the rows as successions of three- or four-node K-nets in order to examine properties and constraints of network isographies. For the most part, I refrain from discussing Webern’s instantiations of the rows in his twelve-tone works, and focus on the implications of their K-net representations. That is, the K-net representations of Webern’s rows are, in general, considered for their abstract properties rather than as models of specific passages of music. 1
Tn/TnI classes refer to the equivalence classes of pitch-class sets related by transposition (Tn) and/or inversion (TnI).
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 375–385, 2009. © Springer-Verlag Berlin Heidelberg 2009
376
C. Nolan
Rows whose disjunct segmental trichords or tetrachords belong to a single set class Three of Webern’s rows are composed of segmental trichords or tetrachords belonging to a single set class: the row for the Concerto for Nine Instruments, Op. 24; the row for the String Quartet, Op. 28; and the row for the First Cantata, Op. 29. The row for Webern’s Concerto for Nine Instruments, Op. 24, is derived from a generating 3-3 [014] trichord, and the trichords are internally ordered so as to relate to one another by the four serial transformations. The row can be interpreted as a series of isographic three-node K-nets in three different ways, each reflecting one of the trichord’s three interval classes as a transposition operator, as shown in Figure 1.
(a)
(b)
(c) Fig. 1. Op. 24 row: K-net interpretations of disjunct trichords
The succession of graphs in Figure 1a shows the shared T4-arrow in each network, and its networks are all strictly identical, or strongly isographic. The successions in Figure 1b and 1c preserve a T1- and T3-arrow in each network respectively, and their constituent networks are positively, not strongly, isographic. In Figure 1b and 1c,
Webern’s Twelve-Tone Rows through the Medium of Klumpenhouwer Networks
377
pairs of complementaryand operations separate alternating strongly isographic networks.2 The first and third graphs in Figure 1b belong to the same Kclass, as do the second and fourth graphs, while successive graphs do not, and likewise for the graphs in Figure 1c. Philip Lambert employs the K-net representation shown in Figure 1a to model the opening statement of the first movement of Op. 24, and points to the registral separation of pitches of the T4 dyad from the remaining pitch in each trichord to support this segmentation. (See Lambert 2002, 174-76.) While the correspondence between registral separation and K-net construction is convincing, I would argue that the K-net representations shown in Figures 1b and 1c show equally convincing correspondences with musical parameters other than registral separation. For example, in the same passage, the T1 dyads of Figure 1b articulate the total registral space occupied by the trichord, and the T3 dyads of Figure 1c articulate each trichord’s total durational space. We must consequently question whether a succession of strong isographies such as that shown in Figure 1a is somehow phenomenologically or otherwise superior to a succession of positive isographies as shown in Figures 1b and 1c. The row for the String Quartet, Op. 28, is similarly derived from serial transformations of a generating 4-1 [0123] trichord, and can be represented in three different ways as a series of four-node K-nets, as shown in Figure 2.3 Each network includes a disjunct pair of the six intervals of the 4-1 tetrachord indicated by the two transposition operators. While the individual networks are configured differently in each representation, each succession displays a series of positively isographic relations. The singular positive isographic type in the three K-net representations of the row for Op. 28 seems appropriate given the limited intervallic resources of the 4-1 [0123] row segments, but the absence of strong isography between adjacent or even alternate networks may seem at odds with the K-net representations of the derived row for Op. 24, given the similar generation of the two rows. Like the row for Op. 24, the disjunct segmental trichords of the row for the First Cantata, Op. 29, all belong to set class 3-3 [014], but in the Op. 29 row the transformations do not extend to order relations within the trichords.4 (This, of course, has no effect on K-net formation.)Yet the disjunct trichords of the Op. 29 row can again be represented in three different ways as a succession of isographic three-node K-nets, as shown in Figure 3. 2
The abbreviations “t” and “e” are used for 10 and 11 respectively. Wherever possible, I show only strong or positive isographies, and invoke negative isography only where strong or positive isography does not obtain. The disjunct tetrachords in Figure 2a, for example, because of the inversional as well as transpositional symmetry of the 4-1 set type, can be reconfigured to show a succession offollowed by network isographies. Adjacent disjunct trichords in Figure 1a are inversionally related; hence alternate networks can similarly be reconfigured to show a succession of network isographies. 4 Despite the superficial similarity of the rows for Op. 24 and Op. 29 as aggregate partitions of the same type (34) and their identical set-class membership, they belong to different mosaics, because the mosaics are unrelated by Tn or TnI. Mosaics that are generated by the same set class(es), but are distinct, are described by Morris and Alegant as Z-related. (See Morris and Alegant 1988, 79 - 81.) The Op. 24 and Op. 29 rows also have different degrees of symmetry; the Op. 24 row has a degree of symmetry 4 (the unordered pitch-class content of its disjunct trichords is preserved under four Tn or TnI operations), while the Op. 29 row has a degree of symmetry 2. 3
378
C. Nolan
(a)
(b)
(c) Fig. 2. Op. 28 row: K-net interpretations of disjunct tetrachords
Unlike the network isographies of the disjunct trichords of the Op. 24 row shown in Figure 1, however, no representation results in a succession of strongly isographic networks. The representation shown in Figure 3a, however, repeats the pattern of alternatingand isographies seen in Figures 1b and 1c, reflecting the membership of the first and third trichords in the same K-class, and likewise the second and fourth trichords. In Figures 3b and 3c a new pattern of network isographies emerges in the recurrence of the operation between the first and second network and between the third and fourth network, a consequence of the T2 relation between corresponding pitch classes. While all the K-nets in Figure 3b and 3c belong to the same family, no pairs of K-nets belong to the same K-class. (See Lambert 2002.)
Webern’s Twelve-Tone Rows through the Medium of Klumpenhouwer Networks
379
(a)
(b)
(c) Fig. 3. Op. 29 row: K-net interpretations of disjunct trichords
Since any pair of pitch-class sets belonging to the same set class can be modeled by isographic K-nets, the exclusivity of strong and positive isography in the K-net representations of the rows just discussed is to be expected. The perception of a more direct phenomenological viability when networks are related by strong rather than by positive isography comes into question, however, when one considers multiple K-net representations of the same objects in which contrasting types of isography may be equally convincingly supported by musical intuition. Rows whose disjunct segmental tetrachords belong to two set classes The rows for the Symphony, Op. 21, the Variations for Orchestra, Op. 30, and the First Cantata, Op. 29, share not only a pattern of set-class membership in their disjunct segmental tetrachords, but also, as mosaics, the same degree of symmetry. Each of these rows shares a simple a b a pattern in the set-class membership of its constituent tetrachords: that is, the first and last tetrachords belong to one set class, while the middle tetrachord belongs to a different set class.
380
C. Nolan
The row for the Symphony, Op. 21, is unique in that any K-net representation of its disjunct tetrachords (belonging to set classes 4-1 [0123], 4-9 [0167], and 4-1 [0123]) exhibits only strong isography. Figure 4 shows the succession of strongly isographic four-node K-nets whose complementary T1 and Te arrows reflect the only interval class shared by the two set classes. Each network can be reconfigured to show two T1arrows, which results in contrasting I-arrows with no change to the strong isography between successive networks.
Fig. 4. Op. 21 row: K-net interpretation of disjunct tetrachords
(a)
(b) Fig. 5. Op. 30 row: K-net interpretations of disjunct tetrachords
The row for the Variations for Orchestra, Op. 30, is constructed similarly to the row for Op. 21 in terms of the pattern of set-class membership of its disjunct tetrachords (4-3 [0134], 4-17 [0347], 4-3 [0134]), but two different successions of isographic K-nets are possible, one built upon the sharing of two instances of interval class 3, the other built upon the common 3-3 [014] subset. The succession of
Webern’s Twelve-Tone Rows through the Medium of Klumpenhouwer Networks
381
networks for the Op. 30 row shown in Figure 5a appears similar to that of the Op. 21 row shown in Figure 4, with the complementary T-arrows and identical I-arrows in each network, but here the adjacent networks are positively, not strongly, isographic. The succession of networks for the Op. 30 row in Figure 5b highlights the trichordal set class (3-3 [014]) that is a shared subset of set classes 4-3 [0134] and 4-17 [0347]. While the networks in Figure 5a and 5b bear different structural features and the network operators between successive network pairs differ, the composite operation relating the first to the third network in both cases is the same,: that is, + = + = . This, of course, correlates to the equivalence of set-class membership between the first and third tetrachords. The disjunct segmental tetrachords of the row for the First Cantata, Op. 29, display the same a b a pattern of set-class membership: 4-2 [0124], 4-1 [0123], and 42 [0124]. Two representations of four-node K-nets are possible, each highlighting one of the two shared subsets (3-1 [012] and 3-2 [013]). These are shown in Figure 6. Figure 6a reveals a succession of positively isographic networks, while Figure 6b reveals alternating positively and negatively isographic networks.
(a)
(b) Fig. 6. Op. 29 row: K-net interpretations of disjunct tetrachords
The presence of two set classes among the disjunct tetrachords in Webern’s rows results in K-net successions whose isographies range from exclusively strong isography to a repeating positive isography to contrasting positive isographies to contrasting positive and negative isographies in the same succession. The wide range of possibilities in such a small sample reflects the relatively large number of tetrachordal set classes and interrelations. (See Lambert 2002, 178.) Three rows that have been discussed because of the set class membership of their disjunct tetrachords, the rows for Opp. 21, 28, and 30, also possess disjunct trichords
382
C. Nolan
that similarly belong to only two set classes, and also display a symmetrical pattern, a b b a, in their succession. The rows for Op. 21 and Op. 30 consist of disjunct trichords belonging to the same two set classes in complementary ordering: 3-2, 3-3, 3-3, 3-2 in the row for Op. 21, and 3-3, 3-2, 3-2, 3-3 in the row for Op. 30. The disjunct trichords of the row for Op. 28 belong respectively to set classes 3-2, 3-4, 3-4, 3-2. Figures 7 and 8 show the two three-node K-net successions for the rows for Opp. 21 and 30, reflecting the two shared interval classes between set classes 3-2 and 3-3, and Figure 9 shows the single three-node K-net succession for the row for Op. 28, reflecting the single shared interval class between set classes 3-2 and 3-4. The K-net representations of the rows just mentioned reveal the recurrence of three isographic patterns that we have already observed: the association of strong isography with complementaryand in the same succession, creating strong isography between adjacent and non-adjacent networks (Figure 7a and 7b); the alternation of two positive isographies (Figure 8a and 8b); and a succession of identical positive isographies (Figure 9).
(a)
(b) Fig. 7. Op. 21 row: K-net interpretations of disjunct trichords
Although I have referred only in passing to Webern’s actual musical settings of twelve-tone rows (particularly since Webern set single rows in linear fashion infrequently), I will briefly discuss in greater detail one setting from the first movement of the First Canata, Op. 29. Because of Webern’s proclivity for creating polyphonic spaces out of quartets of row transformations consisting of pairs of Tnand TnI-related rows, K-nets can provide an effective resource for interpreting the coherence of such passages, particularly in homophonic moments of extreme textural consistency.
Webern’s Twelve-Tone Rows through the Medium of Klumpenhouwer Networks
383
(a)
(b) Fig. 8. Op. 30 row: K-net interpretations of disjunct trichords
Fig. 9. Op. 28 row: K-net interpretations of disjunct trichords
Figure 10a shows a score reduction of Op. 29, first movement, mm. 1-6. Figure 10b sketches the pitch-class content of m. 1, and 10c sketches the pitch-class content of m. 6. Each horizontal strand in 10b and 10c represents the opening 3-3 [014] trichord of two pairs of inversionally related rows, and the four 3-3 trichords in m. 6 complete the rows begun in m. 1. In both m. 1 and m. 6, the lower two strands are intoned by the cello and viola, while the upper two strands are intoned by the trombone and trumpet. The T- and I-arrows in the K-net representations reflect the transformational relation between the pairs of rows, linking the strong isography of the four-node K-nets representing three different set classes directly to the work’s compositional design. In m.6, the return to the homophonic texture, the instrumentation, and the pitch classes of the first sonority (with the exchange of dyads between the string and the brass pairs) announce a clear reference to the opening. The identical graphs of the six networks in the two passages reflect coherence of register and texture in a manner that is more concrete through its association with the textural setting than the relationships among the row transformations alone can express.
384
C. Nolan
(a)
(b)
(c) Fig. 10. Op. 29 (First Cantata): (a) mm. 1-6, score reduction; (b) K-net representations, m. 1; (c) K-net representations, m. 6
Webern’s Twelve-Tone Rows through the Medium of Klumpenhouwer Networks
385
K-net theory has established itself as an important area of music theory, but it seems that it is still in its youth. Further investigation of properties of K-classes and families of four-node and larger networks is needed, as well as further consideration of the explanatory power of network isography. Also needed perhaps is the extension of the range of isographic types beyond the canonical strong, positive, and negative isographies. Michael Buchler’s recent critical study of K-net theory (Buchler 2007) raises important theoretical and analytical questions about K-nets and dual transformations, recursive hierarchies, and other issues, and the invited responses to Buchler’s study affirm the continuing currency and vibrancy of K-net theory. (See Foley 2007, Klumpenhouwer 2007, Losada 2007, Nolan 2007, O’Donnell 2007, Stoecker 2007, and Tymoczko 2007.) This brief study of Webern’s rows through the medium of Klumpenhouwer networks identifies the need for further investigation of the interaction of K-net formation and even partitions of the aggregate. The representations of Webern’s rows as successions of three- and four-node K-nets confirm the fluidity of set-class identification in connection with network isography, yet they also reinforce the significance of pitch-class subset content in assessing alternative K-net representations of the same objects, and raise questions about the perceived hierarchy of strong, positive, and negative isography. It is clear that the range of questions that arise through the continuing study and application of K-net theory will not recede any time soon.
References Buchler, M.: Reconsidering Klumpenhouwer Networks. Music Theory Online 13(2) (2007) Foley, G.: The Efficacy of K-Nets in Perlean Theory. Music Theory Online 13(3) (2007) Klumpenhouwer, H.: Reconsidering Klumpenhouwer Networks: A Response. Music Theory Online 13(3) (2007) Klumpenhouwer, H.: A Generalized Model of Voice-Leading for Atonal Music. Ph.D. dissertation. Harvard University (1991) Lambert, P.: Isographies and Some Klumpenhouwer Networks They Involve. Music Theory Spectrum 24(2), 165–195 (2002) Lewin, D.: Klumpenhouwer Networks and Some Isographies That Involve Them. Music Theory Spectrum 12(1), 83–120 (1990) Losada, C.: K-nets and Hierarchical Structural Recursion: Further Considerations. Music Theory Online 13(3) (2007) Morris, R., Alegant, B.: The Even Partitions in Twelve-Tone Music. Music Theory Spectrum 10, 74–101 (1988) Nolan, C.: Thoughts on Klumpenhouwer Networks and Mathematical Models: The Synergy of Sets and Graphs. Music Theory Online 13(3) (2007) O’Donnell, S.: Embracing Relational Abundance. Music Theory Online 13(3) (2007) Stoecker, P.: Without a Safety (k)-Net. Music Theory Online 13(3) (2007) Tymoczko, D.: Recasting K-nets. Music Theory Online 13(3) (2007)
Isographies of Pitch-Class Sets and Set Classes Tuukka Ilomäki Sibelius Academy, Finland [email protected]
1 Introduction One of the major differences between tonal and atonal music is the significantly larger number of available pitch-class sets in the latter. The categorization of pitchclass sets and the analysis and classification of their relations stand as a major strand in the 20th century music theory. The equivalence relation induced by the group of transpositions and inversions is the de facto standard classification of pitch-class sets. Music theorists, however, have recently suggested alternative approaches, for instance, relations based on similarity or voice leading. Klumpenhouwer networks or K-nets present yet another approach, this one based on transformation theory. K-nets provide a way to represent relations beyond the limits of set classes. Several authors have contributed to this research: Henry Klumpenhouwer, David Lewin, Philip Lambert, and many others. Furthermore, the idiomatic theory of George Perle provides means to tackle the K-nets (Perle 1977; Lewin 2002). A K-net contains some specified contents as vertices and transformations as arrows between the vertices. Typically, the vertices are pitch classes and the arrows are transpositions and inversions. The diagrams commute and for all arrows the pitch classes at the arrowheads are images of the pitches classes at the arrowtails under the transformations which are attatched to the arrows. The motivation for this paper is to examine what the isography of K-nets (with distinct pitch classes as vertices) reveals about the pitch-class sets involved. T5
I1
T5
I6
I8
T5
T5 1 I1
6 I6
I1
0
K-graph
omit nodes
0
5 8
K-net
1
6
I1
I8
omit transformations
0
0
5 8
Isographic pcsets
Fig. 1. From K-nets to K-graphs and from K-nets to the isography of pitch-class sets
The middle part of Figure 1 depicts a typical pair of positively isographic K-nets. Two K-nets with pitch classes {0, 1, 6} and {0, 5, 8} are isographic via. Taking this isography of K-nets as the starting point, the focus can be set either on the transformations or on the contents of the vertices. In other words, some information is omitted or ignored. First, to the left the contents of the vertices are omitted, leaving a K-graph. David Lewin (1990) has pursued this direction in his studies of the relations between transformation networks. Second, to the right the transformations are T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 386–391, 2009. © Springer-Verlag Berlin Heidelberg 2009
Isographies of Pitch-Class Sets and Set Classes
387
omitted, leaving the underlying pitch-class sets. This direction, and the distinction between moving left or right in this way, has been less explored. Philip Lambert (2002) has studied trichordal K-class families with fixed configurations of transpositions, inversions and pitch-class locations. In this paper, however, I focus on the underlying pitch-class sets, with fixed configurations of transpositions and inversions, but allowing any configuration of pitch-class locations. My aim here is to explore what kind of pitch-class sets can be arranged into strongly or positively isographic K-nets using maximally even configurations 2+1 for trichords, 2+2 for tetrachords, etc.1 I will move from isographic pitch-class sets to isographic set classes and group set classes by shared isographic potential.
2 Isography of Pitch-Class Sets and Set Classes The term ‘isography’ has been used to denote an automorphism of transformation networks (Lewin 1990) or, more specifically, a certain type of automorphism, since the automorphisms involving the M-operation are usually excluded. The change of context from transformation networks to pitch-class sets makes it necessary to redefine isography. I retain the term even if the two are algebraically rather different domains. Definition 1. Two pitch-class sets are (strongly) isographic if and only if the pitch classes can be arranged into K-nets that are (strongly) isographic. By definition, isography of K-nets implies isography of the pertinent pitch-class sets. Two K-nets with pitch classes of two isographic pitch-class sets are not necessarily isographic, however, since the isography of K-nets depends both on the pitch classes and on their configuration. For instance, K-nets with pitch classes {0, 1, 6} and {0, 5, 8} are isographic only if the transpositional relation in the K-nets is between pitch classes 1 and 6 and pitch classes 0 and 5, respectively (as in Figure 1). Note that the isography relation between pitch-class sets is not transitive as the concrete legitimating K-nets may change from one related pair to the next. In Definition 2 the isography of set classes is derived from the isography of pitchclass sets. Proposition 1 and Corollary 1 guarantee that the isography of set classes is well defined. Hence, the isography of set classes does not depend on the selection of pitch-class sets representing the set classes. Definition 2. Two set classes are (strongly) isographic if and only if there are two pitch-class sets belonging to the set classes that are (strongly) isographic. Note that Definition 2 does not imply that any two pitch-class sets belonging to (strongly) isographic set classes are necessarily (strongly) isographic.2 1
Without loss of generality it suffices to deal only with positive isography since, in the context of set classes, negative isography of pitch-class sets A and B implies positive isography of pitch-class sets A and TnIB. 2 Definition 2 is thus analogous to the definition of the inclusion relation between set classes. For example, we may state that the set class 3-1[012] is included in the set class 4-1[0123]. This does not imply, however, that an abritrary member of 3-1[012] would necessarily be a subset of an abritrary member of 4-1[0123].
388
T. Ilomäki
Proposition 1. If A and B are (strongly) isographic pitch-class sets, then TnA and TnB are also (strongly) isographic pitch-class sets and so are TnIA and TnIB. Corollary 1. Let A and B be (strongly) isographic pitch-class sets that belong to set classes SC(A) and SC(B), respectively. If A' is an arbitrary pitch-class set belonging to SC(A) then there exists a pitch-class set B' belonging to SC(B) such that A' and B' are (strongly) isographic. Pitch-class sets related by transposition are isographic but not necessarily strongly isographic. Pitch-class sets related by inversion are negatively isographic but not necessarily positively isographic. For instance, transpositionally related pitch-class sets {0, 1, 2} and {1, 2, 3} are positively isographic but not strongly isographic; inversionally related pitch-class sets {0, 1, 2, 3, 4, 6} and {0, 2, 3, 4, 5, 6} are negatively isographic but not positively isographic.
3 Tonality and Whole-Tone Scale Proportion Pitch-class sets can be classified using two concepts: tonality and whole-tone scale proportion. Originally, George Perle defined the tonality of tetrachords as the sum of the four pitch classes modulo 4 (Perle 1977, 138–141). In addition, sums 1 and 3 are considered equivalent.3 The following definitions generalize tonality in all cardinalities 1 n 12. For the sake of coherency, the definition must be stated in terms of the greatest common divisor of the cardinality n and 12. Hence, tonality is a useful concept only in cardinalities n such that gcd(n, 12) > 1. Definition 3. The tonalities of cardinality n are the equivalence classes that the automorphisms of the group Zgcd(n, 12) induce. Definition 4. The tonality of a pitch-class set of cardinality n is the equivalence class of the sum of its pitch classes mod n. For example, the tonalities of cardinality 4 are {0}, {1, 3} and {2}, and the tonalities of cardinality 6 are {0}, {1, 5}, {2, 4} and {3}. The tonality of pitch-class set {0, 1, 2, 3, 4, 5} is {3}. Proposition 2. Pitch-class sets A, TnA and TnIA have the same tonality. Since – according to Proposition 2 – all pitch-class sets that belong to the same set class have the same tonality, the concept of tonality can be inherited from sets to set classes. This motivates the following Definition: Definition 5. The tonality of a set class is the tonality of any of its constituent pitchclass sets. An interval between two pitch classes is even if and only if the two pitches belong to the same whole-tone collection. Therefore, the proportion of odd and even intervals in a pitch-class set depends only upon the distibution of its pitch classes between the two 3
I use sums of pitch classes as notational shorthand; I do not posit a group structure on the set of pitch classes.
Isographies of Pitch-Class Sets and Set Classes
389
whole-tone scales. As isography of set classes involves analogous behaviour with respect to odd and even intervals, the two whole-tone scales play a decisive role in the determination of isography. The proportion between even and odd intervals is invariant under transpositions and inversions (as well as under the M-operation). Hence, the number of odd and even intervals in pitch-class sets is constant for all sets belonging to the same set class. This is also evinced by the fact that pitch-class sets in the same set class have identical interval-class vectors.
4 Relations of Set Classes The relations that isography defines on the set of pitch-class sets and the set of set classes are reflexive and symmetric but not transitive. Hence, the resulting relations are not equivalence relations. Set classes are grouped into categories with two layers, however, since strong isography exists only within each category and positive isography (without strong isography) exists only between certain categories. In all cardinalities, the whole-tone scale proportion defines a necessary condition of isography. In the even cardinalities tonality imposes an additional necessary condition. In cardinalities 2 and 10, however, tonality and whole-tone scale proportion coincide. Furthermore, in cardinality 3 set classes with the same wholetone scale proportion are either strongly isographic or not isographic. Hence, positive isography without strong isography appears only between set classes with different whole-tone scale proportions. Correspondingly, in cardinalities 4 and 6 set classes of the same tonality are either strongly isographic or not isographic. Hence, positive isography without strong isography appears only between set classes of different tonalities. These conditions do not apply to the complement cardinalities 8 and 9.
015 012
014 027
037
036
013
025
016
024
026
048
Fig. 2. Strong isographies of trichords
Figure 2 depicts the strong isographies of the trichords. The set classes are divided into those that contain pitch classes from only one whole-tone scale and those that contain pitch classes from both whole-tone scales. Figure 3 depicts the strong isographies of the tetrachords. Whole-tone scale proportion and tonality divide the set classes into five distinct categories. Hence, the relations defined by strong isography are subsets of the equivalence relations that the whole-tone scale proportion induces
390
T. Ilomäki Tonality 0
0+4
0246
Tonality 1
Tonality 2
0268
0248 0236
0258
0124 0135 0148 0247
1+3
0126 0137 0134
0136
0358 0156 0125
2+2 0237
0146 0157
0123 0145
0127 0158 0257
0147 0167
0347
0369
0235
Fig. 3. Strong isographies of tetrachords
on the set of trichords and the whole-tone scale proportion and tonality induce on the set of tetrachords. Propositions 3 and 4 involve complementary pitch-class sets and set classes of complementary cardinality. Their proofs follow immediately from the observation that gcd(n, 12) = gcd(12 - n, 12) for all 1 n 12 and that there exists an automorphism of group Zgcd(n, 12) that maps k mod gcd(n, 12) into 66 - k mod gcd(n, 12).4 Proposition 3. The tonalities in cardinalities n and 12 - n are identical. Proposition 4. Complementary pitch-class sets have the same tonality. Propositions 3 and 4 show that complementary set classes group together correspondingly. Octachords, for instance, divide into five categories and three tonalities like tetrachords. The complementary set classes have the same tonality. The larger cardinalities are more flexible, however: there are more ways to arrange eight pitch classes than four pitch classes into the vertices of a K-net. Consequently, the larger cardinalities permit more strong isographies than the smaller ones. Even if the corresponding set classes in complementary cardinalities divide into corresponding tonalities, we cannot deduce the isography of two set classes from the (strong) isography of the corresponding set classes of the complement cardinality. For instance, the all-interval tetrachord set classes 4-Z15[0146] and 4-Z29[0137] are strongly isographic, but the corresponding complementary set classes 8-Z15[01234689] and 8Z29[02346789] are not even positively isographic. Similar examples can be found between pentachords and septachords (5-1[01234] and 5-Z12[01356] versus 7-1 [0123456] and 7-Z12[0123479]) and trichords and nonachords (3-1[012] and 3-11[037] versus 9-1[012345678] and 9-11[01235679A]). 4
The sum of all pitch classes 0, 1, 2, ..., 11 is 66. Consequently, if the tonality of a pitch-class set of cardinality n is k mod gcd(n, 12) then the tonality of the complementary pitch-class set is 66 - k mod gcd(12 - n, 12) = 66 - k mod gcd(n, 12).
Isographies of Pitch-Class Sets and Set Classes
391
The Z-related set classes provide a convenient test on whether the necessary conditions of the isography of set classes depend only on the interval-class content of the set classes. The answer is negative. The specific pattern of pitch classes in the pitch-class sets forming the set classes must be taken into account. For example, set class 6-Z4[012456] is strongly isographic with seven hexachords, whereas its Z-related set class 6-Z37[012348] is strongly isographic with four hexachords. Indeed, these two Z-related set classes are not even isographic.
References Lambert, P.: Isographies and Some Klumpenhouwer Networks They Involve. Music Theory Spectrum 24(2), 165–195 (2002) Lewin, D.: Klumpenhouwer Networks and Some Isographies That Involve Them. Music Theory Spectrum 12(1), 83–120 (1990) Lewin, D.: Thoughts on Klumpenhouwer Networks and Perle-Lansky Cycles. Music Theory Spectrum 24(2), 196–230 (2002) Perle, G.: Twelve-Tone Tonality. University of California Press, Berkeley (1977)
The Transmission of Pythagorean Arithmetic in the Context of the Ancient Musical Tradition from the Greek to the Latin Orbits During the Renaissance: A Computational Approach of Identifying and Analyzing the Formation of Scales in the De Harmonia Musicorum Instrumentorum Opus (Milan, 1518) of Franchino Gaffurio (1451–1522)* Herbert Kreyszig1 and Walter Kreyszig2 1
New York, New York, U.S.A. [email protected] 2 Saskatoon, Saskatchewan, Canada, and Vienna, Austria [email protected]
Abstract. On the occasion of the three hundredth anniversary of Leonhard Euler (1707-1783) and the focus of his contribution to the mathematical explanation of scales, the quest arises to examine similar contributions of music theorists that lived prior to Euler. In this paper we selected from the trilogy (Theorica musicae, 1492; Practica musicae, 1496; De harmonia musicorum instrumentorum opus, 1518) of the eminent Renaissance scholar, musician, theorist and composer Franchino Gaffurio (1451-1522) his third work of 1518. In particular, using a modern elementary number theoretic approach, we discovered a number generating function in closed form, which we called a Gaffurio number generating function, that generates the tones of the Pythagorean scale. The number generating function is of the form f (n, m) = pn . m q , where p and q are primes, and n and m are elements of /N0. Here, Gaffurio chooses the smallest primes p = 2 and q = 3, thereby with this ratio p / q denoting the interval of the diapente and he varies the integral exponents n and m from 0, 1, 2, 3, … and in the process constructing a series of numbers signifying the precise placement of the tones within the gamut. His approach is in line with medieval mathematics, which focuses on elementary number theory and can be seen as a precursor to Euler’s Potenzennetz (grid of proportions). This paper embeds our results in Gaffurio’s oeuvre and his times and includes a *
The authors gratefully acknowledge most valuable comments from the editor Timour Klouche, Staatliches Institut fr Musikforschung (National Institute for Music Research), Berlin, Germany; the latter and Thomas Noll, Department of Theory and Composition, Escola Superior de Musica de Catalunya, Barcelona, Spain for organizing and leading the First International Conference of the Society for Mathematics and Computation in Music, Berlin, May 18-20, 2007; technical help from Ana Miranda Kreyszig, LLM, JD, Attorney, New York, NY, David G. Olson, Ph.D., Political Scientist, New York, NY and some valuable discussions with the late Paul V. Reichelderfer (1913-1996), first as Professor of Mathematics at Ohio State University, Columbus, Ohio, USA and since 1968 at Ohio University, Athens, Ohio, U.S.A. See www.math.ohio-state.edu/history/biographies/PaulReichelderfer for biographical information.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 392–405, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Transmission of Pythagorean Arithmetic
393
concise of overview of his biography with an emphasis on his pedagogical and literary contributions. Our approach fills a gap in research, because although Latin music theorists from the era of musical humanism frequently include mathematical approaches to the Greek systema teleion and Guidonian system, as projected onto the monochord, they do not disclose the nature and specific details of the mathematical calculations which inform the natural numbers included in both the manuscript and early printed editions.
In 2007, on the occasion of the three hundredth anniversary of the birth of the mathematician and philosopher Leonhard Euler (1707-1783), our attention in mathematical and music theoretical circles is squarely focused on Euler’s mathematical explanation of musical intervals,1 with recourse to arithmetic proportions in the calculation of the intervals of the scale, beginning with the unison and continuing systematically through the octave and beyond. Though in his systematization of the calculations in the so called Potenzennetz (grid of proportions),2 Euler’s important contribution, was widely reviewed and partly adopted, beginning in the eighteenth century, with Denis Diderot (1713-1784) and continuing in the nineteenth century with François-Joseph Fétis (1784-1871),3 Euler’s arithmetic approach to the explanation of the scale had its precedent in the era of the Renaissance, especially in the musictheoretical thought of the late fifteenth and early sixteenth centuries. Latin music theorists from the era of musical humanism frequently include mathematical approaches to the Greek systema teleion4 and the Guidonian system,5 as projected onto the monochord,6 without disclosing the nature and specific details of the mathematical calculations which inform the natural numbers included in both the manuscripts and early printed editions. The music-theoretical discourse of Franchino Gaffurio provides a case in point.7 One of the key musical humanists to emerge during the last decades of the fifteenth century, Franchino Gaffurio, well rounded as a musicus in his embracing of both the practica musicae and the theorica musicae interests first kindled in his youth and continued in his adult years, concentrated on researching the history of the disciplina musicae, thereby following a mode of inquiry common among the musical humanists of his era.8 Gaffurio’s numerous appointments at court, in church and in the gymnasium as well as his manifold acquaintances with scholars, artists and poets in Italy9 offered an ideal platform as those contacts together with his quest for the retrieval of the ancient tradition, including the commissioning of Latin translations of
1
Euler. For an overview of Euler’s calculations, see Hesse. 3 Ibid. 4 On the systema teleion, see Dreyer; see also Neubecker; Michaelides. 5 On the Guidonian system, see Oesch; Allaire. 6 For an overview of the monochord tradition, see Adkins; Smith. 7 For an overview of Gaffurio’s writings, see Young (1954). 8 See, for example, Palisca (1985). 9 For a summary of activities, see Figure 1. 2
394
H. Kreyszig and W. Kreyszig
Greek documents,10 and the collecting of the books on the artes liberales with a focus on the origin of the disciplina musicae11 paved the way for an exceptionally broad inquiry situated within the artes liberales (comprising grammar, dialectic, rhetoric, geometry, arithmetic, astronomy, and harmonics). While his pre-Milan years showed rather modest literary activity, with a focus largely on the study of earlier and contemporary humanist writers, some attention to original writing, and above all, the careful planning of his three principal treatises, namely, the Theorica musice (Milan, 1492),12 the Practica musicae (Milan, 1496),13 and the De harmonia musicorum instrumentorum opus (Milan, 1518),14 as gleaned from a number of specific references to these volumes surfacing as early as the Theoricum opus musice discipline (Naples, 1480),15 the immediate precursor of the Theorica musice, the prestigious appointment as maestro di cappella at the Cathedral of Milan in 1484, as well as, the subsequent intimate association with the Court of the Sforzas offered Gaffurio the necessary infrastructure, that is, a balance between practical experience in his working as organist, choir director and composer of predominantly sacred repertoires destined by-and-large for the performance at the Cathedral of Milan16 and the exposure to music-theoretical thought, in his sharing of the ancient tradition of music as a theorist, with important access to the holdings of manuscripts and early printed materials at the Court of the Sforzas17 which proved to be an indispensable resource in terms of his own scholarship. Central to Gaffurio’s trilogy is the detailed discussion of Pythagorean arithmetic,18 necessary for a full comprehension of the scales, including both the systema teleion with its tonoi and octave species of the Greeks19 and the modes of the Latin West.20 Undoubtedly inspired by the lengthy and venerable tradition of Greek arithmetic, beginning with Pythagoras whose calculations sparked interest in a number of important writers, including Euclid (flourished ca. 300 BCE)21 and Nicomachus of Gerasa (flourished late 1st – early 2nd century CE),22 Gaffurio, in his wholehearted 10
Barbaro; Burana; Ficino. For an overview of these documents, see Kreyszig (1993); see also Gallo; Kreyszig (1998). 11 For an inventory of Gaffurio’s library, see Caretta. 12 For the early and facsimile editions, see Gaffurio (1492). For an English translation, see Kreyszig (1993). For an Italian translation, see Illuminati. 13 For the early and facsimile editions, see Gaffurio (1496). For an English translation, see Miller (1968a); Young (1969). For an overview of the Practica musicae, see Miller (1968b). 14 For the early and facsimile editions, see Gaffurio (1518). For an English translation, see Miller (1977). 15 For the early and facsimile editions, see Gaffurio (1480). 16 For an overview of his repertory, see Blackburn; Finscher and Kreyszig; Kreyszig (2002); Merkley. 17 Pellegrin. 18 For a survey of Pythagorean arithmetic, see Barbera (1980); Münxelhaus; see also Zaminer. On the application of Pythagorean arithmetic within the realm of musica practica, see, for example, Busse-Berger (1990), Busse-Berger (1993). 19 For an overview of the Greek tonoi within the Greek double octave systema teleion as an integral part of the Greek harmoniai, see Palisca (1984). 20 For an overview of the Latin modes, see Markovits. 21 Mathiesen; see also Barbera (1991); Bowen (1991); Busch. 22 Levin (1975); Levin (1994).
The Transmission of Pythagorean Arithmetic
395
emulation and embracing of the Greek tradition, much of which received ample vetting through the eminent auctoritas of Boethius (born ca. 480; died ca. 524),23 focuses on the significance, definition, classification, principle and progression of numbers, discrete and continuous quantity, genera (simple and composite), and species of proportions, as well as the concepts of equality and inequality, geometric and arithmetic proportions, proportionality and mean topics which lead to a consideration of the intervals on two plains, namely, as logos (proportionate relationship) and diastemata (distance between two pitches).24 At the forefront of the studia humanitatis lay the inquiry into the origin of the disciplina musicae. For Gaffurio, the agenda of his trilogy is one of exceptional breadth, as he engages in a self-imposed juxtaposition of the Greek tonoi as part of the systema teleion and of the Western modes as part of the Guidonian system of hexachords and solmization. Perhaps inspired by the opinion of Vitruv, who in Chapter 3 of Book 5 of his De architectura indicated that “Harmonics is an obscure and difficult subject to read and write about particularly for those who do not know Greek letters.”25 Gaffurio regards the similar terminology accorded to both Greek and Latin scalar systems as an ideal point of departure for his examination, obviously unaware of the vastly different meaning assigned to the respective systems of scales in the Greek and Latin orbits. Gaffurio’s self-tailored manner of inquiry culminates in the confusion of the terms octave species,26 modes,27 and tonoi,28 as the result of his misreading of the De institutione musica of Boethius with the ultimate resolution of Gaffurio’s misreading occurring only in his De Harmonia. Here, Gaffurio's study of two important treatises in Latin translation, namely the Introduction to Harmonics of Cleonides in a 1497 Latin translation by Giorgo Valla (1447-1500)29 and the Harmonics of Ptolemy in a 1499 Latin translation by Nicolo Leoniceno (1428-1524)30 helped clarify the scalar systems of the Latin and Greek traditions with the Pythagorean tuning at the core of the examination in the De harmonia. Notwithstanding the authors of the primary sources completed during the era of musical humanism, modern scholars, including editors and translators of Latin music theoretical texts from the era of musical humanism as well as twentieth-century commentators on the scientia musicae31 generally proceed little beyond the original documents in clarifying the arithmetic squarely placed at the root of the construction of the scales and our understanding thereof. 23
For an edition of Boethius’s two principal treatises, namely, the De institutione arithmetica and De institutione musica, see Friedlein. For English translations of these treatises, see Masi; Bower (1989). The pre-eminent status of Boethius has received copious acknowledgment in the secondary literature; see, for example, Heller; Bower (1981); Caldwell; Palisca (1990); also Palisca (1993, 168-188). 24 Riethmüller; see also Stenzel. 25 Vitruvius; as cited in Kreyszig (1993, 172). 26 For an overview of the octave species, see Barbera (1984). 27 For an overview of the modes, see Bower (1983). 28 For an overview of the tonoi, see Solomon (1984). 29 Palisca (1985, 67-87). For an English translation of Cleonides’s Harmonics, see Solomon (1980). 30 Palisca (1985, 117-122). For an English translation of Ptolemy’s Harmonics, see Solomon (2000). 31 See, for example, Haase; Bowen, William; Sachs.
396
H. Kreyszig and W. Kreyszig
In his calculations of the tones and intervals of the Pythagorean scale, Gaffurio uses a number generating function of the form f (n, m) = pn .qm , where p and q are primes, and n and m are elements of /N0. Here, Gaffurio chooses the smallest primes p = 2 and q = 3, with this ratio p / q denoting the interval of the diapente and he varies the integral exponents n and m from 0, 1, 2, 3, …, thereby constructing a series of numbers signifying the precise placement of the tones within the gamut. His approach is in line with medieval mathematics, which focuses on elementary number theory, details which follow below. Within Gaffurio’s trilogy, we shall now focus our attention on his calculations of tones and intervals as most comprehensively and succinctly shown in his De harmonia musicorum instrumentorum opus which reveals the following mathematical constructs. Elementary number theory gives us the unique factorization theorem,32 that is, Theorem. The Unique Factorization Theorem Every positive integer can be written as a product of primes in a unique way. Expressed as an equation this means that any positive integer n can be written as
n = a product of r primes
p1
p2
p3
pr ,
pi , where r is any positive integer. This equation can be
written more compactly by grouping the same primes together and using exponentiation. If we denote the integral exponents by ei and the unique primes by
qi we have
n = q1 1 e
q2
e2
q3
e3
qs s . e
This equation allows us to represent n in a standard form by grouping the primes by size, that is,
q1 < q 2 < q3 < < q s , where s is a positive integer. Since these equations are for any positive integer n, where n > 1, they lend themselves to generating infinitely many such numbers by varying either the exponents or the prime numbers or both. This is in tune with Medieval mathematics in which generating numbers and more importantly discovering number generating functions are of great importance to the mathematicians of the times who want to discover prime number generating functions.33 In a most general form such a number
32 33
See Dudley (1969, 14-15). To further look into this line of thought see Dickson. This seminal work is an attempt to give a short description of every work published in number theory from Antiquity to 1918. Further valuable references include Ore; Shanks; Crandall and Pomerance.
The Transmission of Pythagorean Arithmetic
397
generating function 34 based on the ideas derived from the unique factorization theorem would look like
f (q1 , q 2 , q 3 ,
, q s , e1 , e2 , e3 ,
, e s ) = q1 1 e
q2
e2
q3
e3
qs s . e
Examining the tones and intervals of the Pythagorean scale that Gaffurio calculated, we discover that a number generating function Gaffurio uses is of the form
=
f (e1 , e2 )
2 e1 ∗ 3e2 .
To simplify the notation, and since henceforth we concentrate only on this what we shall call Gaffurio Number Generating Function we rewrite the last equation in a much simpler form, where we denote the exponents by n and m, thereby avoiding the more complicated e1 , e 2 , and thus obtain in a simple form the following function consisting of a product of exponentiated primes
f (n, m)
=
2 n ∗ 3 m.
Here Gaffurio chooses the smallest prime numbers p = 2 and q = 3, with the ratio p / q denoting the interval of the diapente.35 He varies the integral exponents n and m from 0, 1, 2, 3, …, thereby constructing a series of numbers signifying the precise placement of the tones within the gamut.36 This is demonstrated most significantly in a table created by Gaffurio himself, and explained in terms of the number generating function we discovered, as shown in Figure 2, directly taken from Gaffurio37 and then interpreted. The center piece of this figure is a Gaffurio Number Generating Function in the third column. It is linked to the first column listing the frequencies of the tones. Here frequency denotes a number of proportionality which allocates each tone to a particular position within the Pythagorean scale. The system in itself is closed but movable. It extends over two octaves and represents the complete systema teleion. As is the case with medieval musicology, the scales and frequencies are in inverse 34
This number generating function is strictly in line with elementary number theory based on factorization, orginating with Euclid; see Shanks, 4-7. One of the first theorems on primes is by Euclid in his Elements, Book IX, Proposition 20 and deals with the existence of infinitely many primes. The next important result is a method of finding primes by Eratosthenes (276194 BCE), called the Sieve of Eratosthenes. But the systematic studies of finding primes is only invigorated with the studies by Pierre-Simon de Fermat (1601-1665) and Marin Mersenne (1588-1648), more than two generations after the age of Gaffurio. For details on this historical development, see Ore (1948, 50-85). 35 For simplicity we denote the two primes present by p and q, independent of the notation p i and q i , used prior in the formula for the most general number generating function. Indeed, our Gaffurio number generating function is a specific instance of such a general function. 36 Note that we allow the exponents n and m to take on the value of zero in this Gaffurio number generating function. This is less restrictive than in our more general theory but sacrifices in compactness of presentation. 37 Franchino Gaffurio, De harmonia musicorum instrumentorum opus, Book 1, Chapter 10.
398
H. Kreyszig and W. Kreyszig
movements, that is, increasing scale corresponds to decreasing frequency, and conversely. Thus it differs from our modern (Hertzian) understanding of (fixed) frequency. However, the Gaffurian system with its proportions can be seen as an early precursor to the Hertzian system. Date Beginning of ? until September 1473 16 September 1473
Place Monastery of San Pietro in Lodi Lodi
late 1473 or 1474 1474
Lodi
14 May 1474 May 1474 – 1475 ? ca. 1474
Lodi Mantua
1475-1476
?
1477
Genoa
1477 November 1478
Genoa Naples
1479
Naples
8 October 1480 1480
Naples ?
?
Lodi
1480-1483
Monticelli d’Ongino (territory of Cremona) Monticelli
ca. 1482
Lodi
Verona ?
Activity Theological study
Completion of the copying of the Lucidarium of Marchetto of Padua (flourished 1305-1319); this manuscript also comprising Pomerium of Marchetto of Padua and the Ars cantus mensurabilis of Franco of Cologne (flourished mid to late 13th century) Ordination to priesthood Studies of composition with Johannes Bonadies also known as Godendach (flourished in the 15th century) Participation as singer in Cathedral of Lodi Research in music Employment as public teacher Completion of Extractus parvus musicae [Manuscript Parma, Biblioteca Palatina, sezione Musicale 1158], consisting of extracts from Marchetto of Padua and Ugolino of Orvieto, with references to Johannes de Muris (born ca. 12901295; died after 1344), the Ars nova of Philippe de Vitry (1291-1361) , the Tractatus figurarum of Philippus de Caserta (flourished ca. 1370); several anonymous treatises, and unknown treatise by Guillaume Dufay (1397-1474); Compilation of Tractatus brevis cantus plani [Manuscript Parma, Biblioteca Palatina, sezione Musicale 1158] Completion of Flos musice [Manuscript lost], with reference in Theorica musice, Book 5, Chapter 8 and of Musice institutionis collocutiones [Manuscript lost] Appointment by Doge Prospero Adorno; teaching of music theory and composition (sacred and secular repertories) Performance of some of his motets in the Church of San Lorenzo Move to exile with Adorno to Court of King Ferdinand of Aragon; studies of musica speculativa, upon the recommendation of Phylippinus Bononius (Secretary of King Ferdinand) discussions with Johannes Tinctoris (ca. 14351511), Guglielmo Guarneri, Bernhard Ycart (flourished ca. 1470-1480); also directing of music at Church of San S. Annunziata Completion of Theorice musice tractatus [Manuscript London, British Library, Hirsch IV.1441] as early version of Theoricum opus musice discipline Publication of Theoricum opus musice discipline Musices practicabilis libellum [Manuscript Cambridge, Massachusetts, Harvard University, Houghton Library Mus. 142], with materials incorporated into Book 2 of Practica musicae. Return at invitation of Bishop Carlo Pallavicino of Lodi (1456-1497), owing to plague and Turkish invasion of Puglia Appointment as teacher of singers; preparation of Practica musicae
Completion of Micrologus vulgaris cantus plani [Manuscript Bologna, Biblioteca Civico Museo Bibliografico Musicale A 90] Completion of Tractatus practicabilium proportionum [Manuscript Bologna, Biblioteca, Civico Museo Bibliografico musicale A 69] with the material incorporated into Book 4 of Practica musicae
Fig. 1. The Biography of Franchino Gaffurio (born Lodi, 14 January 1451; died Milan, 25 June 1522) with Emphasis on His Pedagogical and Literary Contributions: An Overview
The Transmission of Pythagorean Arithmetic
399
Appointment as choirmaster of Church of San Maria Maggiore; supervision of installation of organ pedal of Battista da Martinengo Return from Bergamo because of war in Ferrara Appointment as choirmaster at Cathedral of Milan and as teacher and composer (after appointment of the theologian, lawyer and vicar of Lodi, Romanus Barnus, as Archbishop of Milan); publication of revision of Theorica musice (1492) and Practica musicae (1496) Completion of Liber primus musices practicabilis [Manuscript Bergamo, Biblioteca Civica Angelo Mai S 4.37] with material incorporated into Book 1 of Practica musicae Contact with Bartholomeo Ramos de Pareia (ca. 1440-after 1490) Persuasion of architect Luca Paperio to work on Cathedral of Milan Appointment to Gymnasium founded by Ludovico d’Este with thirty-eight-year tenure, first music theorist to call himself “professor musices” Publication of Theorica musice
19 May 1483
Bergamo
27 October 1783 22 January 1484
Lodi Milan
1487
Milan
since 1489 1490 1492
Milan Mantua Milan
15. December 1492 1493
Milan
July 1494 22 April 1495 1495 30 September 1496 14 December 1496 1497
Milan Milan Bergamo Milan
Series of written exchanges over dispute concerning music theory with Giovanni Spataro (1458-1541) Appointment as Rector of the Church of San Marcellino Letter to Ludovico Sforza (1452-1508), requesting a benefice Appointment as Cleric of the Church of Pontirolo Publication of Practica musicae
Milan
Letter to Marco Sanudo, accompanying copy of Practica musicae
Milan
1499
Milan
1500
Milan
1504 1506 1508
? Varese Milan
1509
Milan
24 March 1517
Milan
27 November 1518 1518 22 August 1520 4 October 1520
Milan
Appointment as Professor of Music at University founded by Ludovico Sforza in Milan Completion of Glossemata quaedam super nonnullas partes theoricae Johannis de Muris [Manuscript Milan, Biblioteca Ambrosiana H 165 inf.] Capture of Milan by French; Gaffurio’s stay as Regius musicus; completion of De harmonia musicorum instrumentorum opus Visit of Spas [places unknown] Organizing of Chapel of Santa Maria al Monte Publication of Angelicum ac divinum opus musice, largely based on Practica musicae Gaffurio’s publication of oration by Jacopo Antiquario of Perugia (1474? – 1512?), extending welcome to Louis XII after victory over Venice Criticism of the Libri tres de institutione harmonica of Pietro Aaron (ca. 1480after 1545) Publication of De harmonia musicorum instrumentorum opus
1520
Turin
1521 1521
Milan Milan
Milan
Milan Milan Milan
Donation of his materials to the Church of the Incoronato in Lodi Letter to Deputies of the Church of the Incoronato in Lodi, recommending a cleric Letter to Deputies of the Church of the Incoronato in Lodi, acknowledging the hiring of cleric Apologia Franchini Gafurii Musici adversus Joannem Spatarium et complices musicos bononienses Epistola primo in solutions obiectorum Io. Vaginarii Bononiensis Epistola secunda apologetica, addressed to Antonio Alberti in Florence
The biographical information has been compiled from: Pantaleone Melegulo of Lodi, [Biographical Sketch of Franchino Gaffurio], included at the end of Franchino Gaffurio, De harmonia; see English translation of biographical sketch in Miller (1977, 212-214); Blackburn; Finscher and Kreyszig; Kreyszig (2002). Fig. 1. (continued)
400
H. Kreyszig and W. Kreyszig
A Gaffurio Number
Frequency Intervals
2304
Greek Terminology
=
Generating Function f (n, m) 28 * 32
Helmholtz Nomenclature
Nete hyperboleon
a
=
25
* 34
Paranete hyperboleon g
=
22
* 36
Trite hyperboleon
f
=
210
* 31
Nete diezeugmenon
e
=
2 7 * 33
Paranete diezeugmenon
d
=
24
* 35
Trite diezeugmenon or Paranete sinemenon
c
212
* 3
Paramese
b
21 * 3 7
Trite sinemenon
bb
=
29
Mese
a
=
2 6 * 34
Lychanos meson
g
=
23
Parhypate meson
f
tonus
2592 tonus
2916
semi[tonium] [minus]
3072 tonus
3456 tonus
3888
semi[tonium] [minus]
=
4096
0
semi[tonium] maius
=
4374
semi[tonium] [minus]
4608
* 32
tonus
5184 tonus
5832
* 36
semi[tonium] [minus]
6144
=
211 * 31
Hypate meson
e
=
28
* 33
Lychanos hypaton
d
=
25
* 35
Parhypate hypaton
c
213
* 30
Hypate hypaton
B
22
* 37
210
* 32
tonus
6912 tonus
7776
semi[tonium] minus
8192
= semi[tonium] maius
8748
=
?
Bb
semi[tonium] minus
9216
=
Proslambanomenos
A
(For explanations and sample calculation see text.) Fig. 2. Integrum Systema Harmonicum Diatonici Generis
For example, in the third shaded row one computes the frequency of a particular tone
frequency of tone = 2916 = f (n, m) = f (2, 6) = 2 2 * 36.
The Transmission of Pythagorean Arithmetic
401
Furthermore, Column 4 shows the correspondence of this generated number38 to Trite hyperboleon. Column 5 converts the Pythagorean tone to the modern tone of f ’. The latter correspondence between the Pythagorean tone and the modern tone lends itself to modern computational tools in that a matching algorithm can be developed which determines the modern tone that is the best match for the frequency of an ancient tone.39 This tone by tone matching is naturally imperfect requiring that one subdivides a range of frequencies for the modern tone and examines into which of these ranges the ancient tone falls. We have shown a novel approach to the problem of examining a Latin music theoretical text by using modern elementary number theory. We have discovered a Gaffurio Number Generating Function that was able to generate a complete Pythagorean scale as shown in Figure 2. This modern approach to medieval problems would have undoubtedly satisfied the ancient and medieval music theorists. Indeed, according to Calvin M. Bower, “the means of coming to know music is uncompromisingly Pythagorean: number [emphasis by Kreyszig]. Boethius states and restates the Pythagorean creed, insisting on the quantitative nature of sound.”40 His view has elegantly paved the future of the analytical and computational investigation concerning the systema teleion, which is at the forefront of Gaffurio’s own deliberations and at the center of the research presented in this paper.
Bibliography Adkins, C.D.: The Theory and Practice of the Monochord. Ph.D. Dissertation, State University of Iowa (unpublished, 1963) Allaire, G.G.: The Theory of Hexachords, Solmization and the Modal System: A Practical Application. In: Carapetyan, A. (ed.) Musicological Studies and Documents, vol. 24. American Institute of Musicology (1972) Barbaro, E.: Translated Paraphrasis in Aristotelem. Treviso: Bartolomeo Confalomeri and Morello Gerardino (February 15, 1481); in modern edn. Barbaro, E.: translated, Paraphraseos de anima libri tres, Heinze, R. (ed.) Berlin (1899) Barbera, C.A.: The Persistence of Pythagorean Mathematics in Ancient Musical Thought. Ph.D. Dissertation, University of North Carolina at Chapel Hill (unpublished, 1980) Barbera, C.A.: Octave Species. The Journal of Musicology: A Quarterly Review of Music History, Criticism, Analysis, and Performance Practice 3, 229–241 (1984) Barbera, C.A.: The Euclidean Division of the Canon: Greek and Latin Sources. In: Mathiesen, T.J., Solomon, J. (eds.) Greek and Latin Music Theory. University of Nebraska Press, Lincoln, Nebraska and London (1991)
38
An actual factorization of 2916 is obtained by successive divisions by prime numbers as follows: 2916 : 2 = 1458; 1458 : 2 = 729; 729 : 3 = 243; 243 : 3 = 81; 81 : 3 = 27; 27 : 3 = 9; 9 : 3 = 3. Here the prime numbers (factors) are 2 and 3 and counting the number of divisions gives two divisions by 2 and five divisions by 3 with a remainder of 3 (together six 3s), hence 2916 = 2
39 40
6
2 ∗ 3 . Details for developing such types of algorithms are given in the seminal work by Knuth. Bower (1989, xx-xxi). Further on Boethius’s preoccupation with the concept of the number, see Illmer; Snider, 5-33 (Chapter 2).
402
H. Kreyszig and W. Kreyszig
Blackburn, B.J.: Gaffurius, Franchinus. In: Sadie, S. (ed.) The New Grove Dictionary of Music and Musicians, 29 vols., vol. 9, pp. 410–414. Macmillan, London (2001) Bowen, A.C.: Euclid’s Sectio canonis and the History of Pythagoreanism. In: Bowen, A.C. (ed.) Science and Philosophy in Classical Greece. Sources and Studies in the History and Philosophy of Science, vol. 2, pp. 164–187. Garland, New York and London (1991) Bowen, W.R.: Music and Number: An Introduction to Renaissance Harmonic Science. Ph.D. Dissertation, University of Toronto (unpublished, 1984) Bower, C.M.: The Role of Boethius’s De institutione musica in the Speculative Tradition of Western Musical Thought. In: Masi, M. (ed.) Boethius and the Liberal Arts: A Collection of Essays. Utah Studies in Literature and Linguistics, vol. 18, pp. 157–174. Peter Lang, Berne and Frankfurt am Main (1981) Bower, C.M.: The Modes of Boethius. The Journal of Musicology: A Quarterly Review of Music History, Criticism, Analysis, and Performance Practice 3, 252–263 (1984) Bower, C.M.: Translated. Anicius Manlius Severinus Boethius: Fundamentals of Music. In: Palisca, C.V. (ed.). Music Theory Translation Series. Yale University Press, New Haven, Connecticut and London (1989) Burana, G.F.: Translated. Bacchius the Elder, Introductio artis musicae, preserved in Manuscript Verona, Bibliotheca Capitolare CCXL (201) (April 15, 1494); also in modern edn. In: Bellermann, F. (ed.) Anonymi scriptio de musica Bacchi senioris introduction artis musicae, Berlin (1841); also in Greek-Latin edn. In: Meibom, M. (ed. & translator) Bacchii senioris introduction artis musicae, Marcus Meibom, Antiquae musices auctores septem graece et latine, 2 vols. Monuments of Music and Music Literature in Facsimile: Second Series – Music Literature, vol. 51. Broude Brothers, New York. facsimile of Amsterdam (1977); In English translation. Steinmeyer, O.: Bacchius Geron’s Introduction to the Art of Music (1652). Journal of Music Theory 29, 271–298 (1985) Busch, O.: Logos syntheseos: Die euklidische Sectio canonis, Aristoxenos, und die Rolle der Mathematik in the antiken Musiktheorie, im Anhang (in the appendix): KATATOMH KANONOȈ / Die Teilung des Kann: Die euklidische Sectio canonis in deutscher Übersetzung. In: Ertelt, T., von Loesch, H. (eds.) Studien zur Geschichte der Musiktheorie, vol. 3; and Veröffentlichungen des Staatlichen Instituts für Musikforschung, vol. 10; Olms, G.: Hildesheim und New York (1998). 2nd unchanged edn. Staatliches Institut für Musikforschung Preuischer Kulturbesitz, [Berlin] (2004) Busse-Berger, A.M.: Musical Proportions and Arithmetic in the Late Middle Ages and Renaissance. Musica Disciplina: A Yearbook of the History of Music 44, 89–118 (1990) Busse-Berger, A.M.: Mensuration and Proportion Signs: Origins and Evolution. Clarendon Press, Oxford (1993) Caldwell, J.: The De institutione arithmetica and the De institutione musica. In: Gibson, M. (ed.) Boethius: His Life, Thought and Influence, pp. 135–154. Basil Blackwell, Oxford (1981) Caretta, Alessandro, et al.: Franchino Gaffurio. Edizioni Dell’Archivio Storico Lodigiano, Lodi (1951) Crandall, R., Pomerance, C.: Prime Numbers: A Computational Perspective, 2nd edn. of 2001. Springer, New York (2005) Dickson, L.E.: History of the Theory of Numbers, vol. 3. AMS [American Mathematical Society]. Chelsea Publishing, Providence, Rhode Island (1919); Carnegie Institution of Washington, Washington, DC (reprint, 1999) Dreyer, E.-J.: Das Tonsystem der Griechen. Musiktheorie 3, 3–25 (1988) Dudley, U.: Elementary Number Theory. W.H. Freeman, San Francisco, California (1969) Euler, L.: Tentamen novae theoriae musicae. In: Leonhard Euler, Opera omnia, vol. 3/1, St. Petersburg (1739)
The Transmission of Pythagorean Arithmetic
403
Ficino, M.: Translated. Plato: Opera omnia. Officina of Heinrich Petri, Basel (1576) Finscher, L., Kreyszig, W.: Gaffurio, Franchino. In: Die Musik in Geschichte und Gegenwart: Allgemeine Enzyklopädie der Musik, begründet von (founded by) Blume, F.: zweite neubearbeitete Ausgabe von (second revised edition by) Finscher, L.: 29 vols., Bärenreiter, Kassel; Metzler, Stuttgart (1994–2008), vol. 7 (Personenteil, 2002) cols. 393–403 (2002) Friedlein, G. (ed.): Anicii Manlii Torquati Severinii Boetii: Institutione arithmetica libri duo e Institutione musica libri quinque accedit Geometria quae fertur Boetii. Frankfurt am Main, Minerva (1867); Teubner, B.G.: Leipzig (reprint, 1966) Gaffurio, F.: Theoricum opus musice discipline. Francesco di Dino, Naples (1480); also In: Ruini C. (ed.) vol. 15, Musurgiana: Collana di trattati di teoria musicale, storiografia e organologia in facsimile, ed. by Istituto di Bibliografia Musicale di Roma, Libreria Musicale Italiana, Lucca (1996) Gaffurio, F.: Theorica musice. Filippo Mantegazza, Milan (1492); also In: Cesari, G. (ed.) facsimile reprint, Presso l’Accademia, Rome (1934); also as facsimile reprint as part of Monuments of Music and Music Literature in Facsimile – 2nd series: Music Literature, vol. 21. Broude Brothers, New York (1967); also In: Vecchi, G. (ed.) facsimile reprint. Bibliotheca musica Bononiensis, vol. II/5. Forni, Bologna (1969) Gaffurio, F.: Practica musicae. Guglielmo Signerre, Milan (1496); also In: facsimile reprint. Gregg Press, Farnborough (1967); also In: Vecchi, G. (ed.) facsimile reprint, Bibliotheca musica Bononiensis, vol. II/6. Forni, Bologna (1972) Gaffurio, F.: De harmonia musicorum instrumentorum opus. Gotardus Pontanus Calographus, Milan (1518); also In: Vecchi, G. (ed.) facsimile reprint. Bibliotheca musica Bononiensis, vol. II/7. Forni, Bologna (1972) Gallo, F.A.: Le traduzioni dal Greco per Franchino Gaffurio. Acta Musicologica 35, 172–174 (1963) Haase, R.: Geschichte des harmonikalen Pythagoreismus. Publikationen der Wiener Musikakademie, vol. 3. Elisabeth Lafite, Vienna (1969) Heller, B.: Boethius im Lichte der frühmittelalterlichen Musiktheorie. Ph.D. Dissertation, Universität Wien (unpublished, 1939) Hesse, H.-P.: Intervall. In: Die Musik in Geschichte und Gegenwart: Allgemeine Enzyklopädie der Musik, begründet von (founded by) Blume, F.; zweite neubearbeitete Ausgabe von (second revised edition by) Finscher, L., 29 vols., Bärenreiter, Kassel; Metzler, Stuttgart (1994–2008), vol. 4 (Sachteil, 1996), cols. 1069–1097, especially cols. 1083–1084 (1996) Illmer, D.: Die Zahlenlehre des Boethius. In: Zaminer, F. (ed.) Rezeption des antiken Fachs im Mittelalter. Geschichte der Musiktheorie, vol. 3, pp. 220–252. Wissenschaftliche Buchgesellschaft, Darmstadt (1990) Illuminati, I.: Translated with commentary, and introduction by Illuminati, I., Ruini, C. Franchino Gaffurio: Theorica musicae. In: La tradizione musicale, vol. 11 and Le regole della musica, vol. 2. SISMEL Edizioni del Galluzzo, Florence (2005) Knuth, D.E.: The Art of Computer Programming. Sorting and Searching, vol. 3. AddisonWesley, Reading, Massachusetts (1973); 2nd edn. of 1973 (1998) Kreyszig, W.K.: Translated with introduction and notes. Franchino Gaffurio: The Theory of Music. In: Palisca, C.V. (ed.) Music Theory Translation Series. Yale University Press, New Haven, Connecticut and London (1993) Kreyszig, W.K.: Franchino Gaffurio und seine Übersetzer der griechischen Musiktheorie in der Theorica musice (1492); Ermolao Barbaro, Giovanni Francesco Burana und Marsilio Ficino. Musik als Text: Bericht über den Internationalen Kongreß der Gesellschaft für Musikforschung, Freiburg im Breisgau (1993); Danuser, H., Plebuch, T. (eds.) vol. 1, pp. 164–171. Bärenreiter, Kassel (1998)
404
H. Kreyszig and W. Kreyszig
Kreyszig, W.K.: Research and Teaching During the Era of Musical Humanism: Defending the Scholar-Teacher in Response to the Principles of Creation and Dissemination of Knowledge in the Italian University Curriculum and Cultural Milieu of the Court of the Sforzas, with Special Reference to Franchino Gaffurio (1451–1522). In: Marken, R. (ed.), What is a Teacher-Scholar?: Symposium Proceedings, November 9-10, 2001, pp. 97–132. University of Saskatchewan, Saskatoon, Saskatchewan (2002) Levin, F.R.: The Harmonics of Nicomachus and the Pythagorean Tradition. American Classical Studies, vol. 1. The American Philological Association, University Park, Pennsylvania (1975) Levin, F.R. (ed.): The Manual of Harmonics of Nicomachus the Pythagorean: Translation and Commentary. Phanes Press, Grand Rapids, Michigan (1994) Markovits, M.: Das Tonsystem der abendländischen Musik im frühen Mittelalter. Series II, Publikationen der Schweizerischen Musikforschenden Gesellschaft, vol. 30. Verlag Paul Haupt, Berne and Stuttgart (1977) Masi, M.: Translated, Boethian Number Theory: A Translation of the “De institutione arithmetica”. Studies in Classical Antiquity, vol. 6. Rodopi, Amsterdam (1983) Mathiesen, T.J.: An Annotated Translation of Euclid’s Division of a Monochord. Journal of Music Theory 19, 236–258 (1975) Merkley, P.A., Merkley, L.L.M.: Music and Patronage in the Sforza Court. In: Dunning, A. (ed.) Studi sulla storia della musica in Lombardia: Collana di testi musicologici, vol. 3. Turnhout, Brepols (1999) Michaelides, S.: The Music of Ancient Greece: An Encyclopaedia. Faber and Faber, London (1978) Miller, C.A.: Translated, Franchinus Gaffurius: Practica musicae. In: Carapetyan, A. (ed.) Musicological Studies and Documents, vol. 20. American Institute of Musicology (1968a) Miller, C.A.: Gaffurius’s Practica musicae: Origin and Contents. Musica Disciplina: A Yearbook of the History of Music 22, 105–128 (1968b) Miller, C.A.: Translated, Franchinus Gaffurius: De harmonia musicorum instrumentorum opus. In: Carapetyan, A. (ed.) Musicological Studies and Documents, vol. 33. American Institute of Musicology (1977) Münxelhaus, B.: Pythagoras musicus: Zur Rezeption der pythagorischen Musiktheorie als quadrivialer Wissenschaft im lateinischen Mittelalter. In: Vogel, M. (ed.) OrpheusSchriftenreihe zu Grundfragen der Musik, vol. 19. Verlag für Systematische Musikwissenschaft, Bonn-Bad Godesberg (1976) Neubecker, A.J.: Altgriechische Musik: Eine Einführung, part of Die Altertumswissenschaft; Einführungen in Gegenstand, Methoden und Ergebnisse ihrer Teildisziplinen und Hilfswissenschaften. Wissenschaftliche Buchgesellschaft, Darmstadt (1977) Oesch, H.: Guido von Arezzo: Biographisches und Theoretisches unter besonderer Berücksichtigung der sogenannten odonischen Traktate. Verlag Paul Haupt, Berne (1954) Ore, Ø.: Number Theory and Its History. Dover Publications, New York (1948); reprint of McGraw-Hill, New York (1988) Palisca, C.V.: Round Table The Ancient Greek Harmoniai, Tonoi and Octave Species, organized and chaired by Claude V. Palisca at the National Meeting of the American Musicological Society in Louisville, Kentucky (October 27, 1983). The Journal of Musicology: A Quarterly Review of Music History, Criticism, Analysis, and Performance Practice 3, 221 (1984) Palisca, C.V.: Humanism in Italian Renaissance Musical Thought. Yale University Press, New Haven Connecticut and London (1985)
The Transmission of Pythagorean Arithmetic
405
Palisca, C.V.: Boethius in the Renaissance. In: Barbera, A. (ed.) Musical Theory and Its Sources: Antiquity and the Middle Ages. Notre Dame Conferences in Medieval Studies, vol. 1, pp. 259–280. University of Notre Dame Press, Notre Dame (1990) Palisca, C.V.: Studies in the History of Italian Music and Music Theory, pp. 168–188. Clarendon Press, Oxford (1993) Pellegrin, E.: La bibliothèque des Visconti et des Sforzas ducs de Milan, auc XVe siècle. Publications de l’institut de recherche et d’histoire des textes, vol. 5. C.N.R.S., Paris (1955) Riethmüller, A.: Logos und diastemata in der griechischen Musiktheorie. Archiv für Musikwissenschaft 42, 18–36 (1985) Sachs, K.-J.: Musikalische Elementarlehre im Mittelalter. In: Zaminer, F. (ed.) Rezeption des antiken Fachs im Mittelalter. Geschichte der Musiktheorie, vol. 3, pp. 106–161. Wissenschaftliche Buchgesellschaft, Darmstadt (1990) Shanks, D.: Solved and Unsolved Problems in Number Theory, vol. 1. Spartan Books, Washington (1962) Smith, F.J.: The Medieval Monochord. Journal of Musicological Research 5, 1–33 (1984) Snider, G.C.F.: In Defense of Music’s Eternal Nature: On the Pre-eminence of musica theorica Over musica practica. Master of Arts Thesis in Philosophy, University of Saskatchewan January 2005 (unpublished, 2005), http://library2.usask.ca/theses/available/etd-01282005-145722 Solomon, J.: Cleonides ISAYOYI ARMONIKI: Critical Edition, Translation and Commentary. Ph.D. Dissertation, University of North Carolina at Chapel Hill (unpublished, 1980) Solomon, J.: Towards a History of Tonoi. The Journal of Musicology: A Quarterly Review of Music History, Criticism, Analysis, and Performance Practice 3, 242–251 (1984) Solomon, J.: Ptolemy Harmonics: Translation and Commentary. Mnemosyne: Bibliotheca Classica Batava – Supplementum, vol. 203. Brill, Leiden and Boston, Massachusetts (2000) Stenzel, J.: Zur Theorie des Logos bei Aristoteles. Quellen und Studien zur Geschichte der Mathematik, Astronomie und Physik 1, 34–66 (1931) Vitruvius Pollio, M.: De architectura; in modern edn. and translation, Vitruvius, On architecture, 2 vols., Granger, F. (ed. and translator). The Loeb Classical Library. W. Heinemann, London; Putnam, New York (1931–1934) Young, I.: Franchinus Gaffurius: Renaissance Theorist and Composer (1451–1522). Ph.D. Dissertation, University of Southern California (unpublished, 1954) Young, I.: Translated, The Practica musicae of Franchinus Gaffurius. Madison. University of Wisconsin Press, Madison, Wisconsin (1969) Zaminer, F.: Pythagoras und die Anfänge des musiktheoretischen Denken bei den Griechen. Jahrbuch des Staatlichen Instituts für Musikforschung Preußischer Kulturbesitz, 203–211 (1979–1980)
Combinatorial and Transformational Aspects of Euler's Speculum Musicum Edward Gollin Williams College [email protected]
Abstract. The paper examines the structural and conceptual differences between the Speculum musicum from Euler's 1774 De Harmoniae, and the nineteenth-century Tonnetz, but also examines how Euler's conception of intervals as paths within the Speculum anticipates a combinatorial conception of interval that underlies contemporary transformational theories.
Example 1 presents the diagram from Euler’s 1774 De Harmoniae Veris Principiis per Speculum Musicum Repraesentatis, known as the Speculum Musicum (Speculum), which has gained prominence in recent years for certain obvious structural similarities to the now well-known Tonnetz or Tonverwandtschaftstabelle, popularized in the nineteenth-century writings of Hugo Riemann. The similarity, first noted by Martin Vogel, has since been observed in a number of writings that also suggest that the Speculum was a forerunner or precursor to the nineteenth-century Tonnetz.1 Example 1. Euler’s Speculum Musicum.
The Speculum represents twelve tones in a 3 by 4 network, a graph of tones-classes in pure intonation, organized by pure major thirds vertically, and by perfect fifths horizontally. This organization of tones in Cartesian orthography with a major-third axis and a perfect-fifth axis is what Vogel and others have seized upon in claiming precedent for the nineteenth-century Tonnetz. The present paper examines why I believe this claim is not quite true, but also explores a less discussed aspect of Euler’s thought, reflected in his writing about the Speculum, that bears not only upon Riemann’s conception of harmonic relations, but also upon aspects of contemporary transformational theory. 1
Vogel (1960) observes the relation of the Speculum to the Tonnetz of Oettingen and Riemann. The idea that the Tonnetz evolved from the Speculum has been suggested by, among others, Mooney (1996) and Cohn (1997, 7).
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 406–411, 2009. © Springer-Verlag Berlin Heidelberg 2009
Combinatorial and Transformational Aspects of Euler's Speculum Musicum
407
Euler uses the Speculum in De Harmoniae to demonstrate and encapsulate the total pitch-class and interval content of what he calls the diatonic-chromatic genus in music. We shall discuss Euler’s genera shortly, but for the moment, it is enough to know that Euler conceives of the Speculum as a mathematical graph—the nodes represent tones, the edges represent the fundamental intervals of the perfect fifth and major third. Euler represents motion along each horizontal edge with the Roman numeral V for fifth; ascending motion by fifth—that is, motion to the right—is labeled +V; descending motion by fifth, motion to the left, is labeled -V. Similarly, downward motion along a third-edge, equivalent to ascent by a major third, is labeled +III; upward motion on the diagram, equivalent to descent by major third, is labeled -III. This symbolic language allows Euler to represent the interval between any two tones by the shortest pathway of connected edges between those tones. In some cases, several equally short paths can reify the interval between two tones, and the same symbol can sometimes reify the interval between several different tone pairs. Example 2. The interval of the augmented second, represented by specific pathways on the Speculum.
For instance, the top of Example 2 shows how Euler symbolically represents the interval from F to G# using three different pathways: one can ascend by two major thirds then ascend by one perfect fifth (+III +III +V); one can ascend by a major third, then perfect fifth, then major third (+III +V + III); or, one can ascend by perfect fifth, then by two major thirds (+V +III +III). Each different reckoning of the interval corresponds to a different path through the Speculum. Euler’s term for each such pathway is transitus. I call these classes of shortest paths between two tones a transitus class—in this case, a class that defines the interval of the augmented second. On Example 2, Euler also shows how the transitus class of the augmented second defines not only the interval from F to G#, but also the intervals from C to D# and from G to B-flat.2 Table 1 lists the fifteen distinct composite transitus classes Euler discusses in De harmoniae. 2
Euler, out of deference to the conventions of German tablature notation still prevalent in the eighteenth century, labels the tone in the lower right corner B (= B-flat). He nonetheless conceives the tone as A# (i.e. as the pure fifth above D# and the pure major third above F#). In Tentamen novae, Euler goes so far as to present his system as an improvement over an otherwise similar tuning system presented in Johann Mattheson’s Grosse General-Bass Schule (Hamburg, 1731), in which a true B-flat is derived as the perfect fifth below F (Euler 1739, 133). Euler provides tuning instructions in Tenatmen novae, which make explicit the A# derivation of the nominal B (Euler 1739, 146–47).
408
E. Gollin Table 1. Composite transitus classes on the Speculum
Transitus
Ratio
Name
+V+V
9:8
major whole tone
+V+III
16:15
major semitone
+V-III
5:6 or 5:3
minor third/major sixth
+III+III
16:25
augmented fifth
+V+V+V
32:27 or 16:27 Pythagorean minor third
+V+V+III
32:45 or 45:64 augmented fourth/diminished fifth
+V+V-III
5:9 or 10:9
greater minor seventh/minor whole tone
+III+III+V
64:75
augmented second
+III+III-V
24:25
minor chroma
+V+V+V+III
135:128 (*)
major chroma
+V+V+V-III
27:20 (*)
greater fourth
+V+V+III+III
225:128 (*)
augmented sixth
+V+V-III-III
36:25 (*)
greater diminished fifth
+V+V+V+III+III 675:512 (*)
augmented third
+V+V+V-III-III
—
27:25 (*)
One reason I am reluctant to believe that Euler’s Speculum was a precursor to the nineteenth-century Tonnetz is that there is no evidence that the nineteenth-century theorists involved in issues of tuning or tone relations—Helmholtz, Drobish, Naumann, Oettingen, or Riemann—appear to have known the particular article, published in the Proceedings of the St. Petersburg Royal Academy of Sciences in 1774, in which the Speculum appears. Euler was certainly well known to these theorists, but primarily through his 1739 treatise, Tentamen novae. Euler’s legacy in the nineteenth century, which derives from Tenatmen novae, was his observation that the frequency ratios of all intervals in just intonation can be exclusively represented as combined powers of 2, 3, and 5. It was this observation, I would argue, that underlies the structural convergence between the Speculum and the later nineteenth-century Tonnetze.
Combinatorial and Transformational Aspects of Euler's Speculum Musicum
409
But there are a number of structural and conceptual differences that underscore the difference between Euler’s diagram and those of the next century. The nineteenthcentury Tonnetz is a finite representation of an infinite array of tones in just intonation. Euler’s Speculum, in contrast, is a bounded, finite graph—there are no further tones beyond the twelve he shows. Moreover, unlike a nineteenth-century Tonnetz, in which one can conceive, at least potentially, any available interval of the system extending from any tone in the system, intervals of the Speculum are not extensible beyond the borders of the graph; the transitus underlying the augmented second, for example, can only be ‘walked’ from the three tones F, C and G—from any other tone, there exists no destination tone in potentia or in actu. Euler’s finite conception of the Speculum, and, consequently, its differences from the Tonnetz, proceed from Euler’s conception of a genus itself. For Euler, a genus is a collection of musically-distinct, octave-equivalent tones. Every genus can be represented by a composite number that is exclusively comprised by the prime factors 3 and 5. In Euler’s view, a genus is a manifestation, in a Platonic sense, of that composite number, sharing its fundamental properties with that number; just as every composite number has a unique prime factorization, every genus has a unique factorization into distinct tone classes. Example 3 illustrates how the diatonic-chromatic genus manifests the number 675: reckoning F as one, each tone of the Speculum represents a unique factor of 675. Euler’s numerical conception of the Speculum represents a fundamental inversion of relations from those of the Tonnetz. Whereas in the Tonnetz, a set of fundamental intervals generates a collection of tones, in the Speculum, a set of tones defines a collection of intervallic relationships. The tone F in Euler’s Speculum, notably is not a generator, but rather is a reference point; it, like the other tones, is generated by the genus of which it is a factor. Example 3. Tones on the Speculum represented by the factors of 675.
But it is Euler’s combinatorial conception of intervals on the Speculum—his recognition of intervals, and interpretations of intervals, as pathways therein—I would argue, that constitutes a more subtle connection between Euler, the nineteenthcentury tradition, and contemporary transformational theories. Hugo Riemann, for example, came to understand composite intervals as particular pathway classes on the Tonnetz, and even adopted a symbolic language using combinations of the symbols Q and T (for Quint and Terz, respectively) to identify those different pathways. For instance, Riemann (1914/15) symbolized the interval of the ascending whole tone as
410
E. Gollin
2Q (= QQ), a pathway comprising two ascending fifths; he symbolized the minor third as Q/T (= QT-1), a pathway comprising an ascending fifth and descending major third, and so on.3 I call Riemann’s intervals “pathway classes” because, unlike Euler, he does not make distinctions between differently-ordered steps or symbols (i.e. Riemann’s symbols commute). Riemann’s symbolic language and his path conception allowed him to make functional distinctions among tones, even when he later abandoned the premise of just intonation on the Tonnetz: to derive a functionally subdominant tone D in C-major Riemann goes down two fifths and up a major third from the tonic (Q-2T); a functionally dominant tone D, by contrast, lies two fifths above the tonic (Q2). I would argue that this path-combinatorial conception (consciously or unconsciously) underlies our contemporary interpretations of transformations as gestures within music-transformational spaces.4 For example, when we use the label PL to describe a harmonic gesture from E major to C major in neo-Riemannian theory, the label reflects our understanding of that gesture as a particular composite pathway within a space defined by fundamental relations such as L, P and R.5 Although Euler lacks the group-theoretical foundation of these later theories, his Speculum and his symbology for paths therein capture their essential idea.
References Cohn, R.: Neo-Riemannian Operations, Parsimonious Trichords, and Their Tonnetz Representations. Journal of Music Theory 41(1), 1–66 (1997) Euler, L.: Tentamen novae theoriae musicae ex certissimis harmoniae principiis dilucide expositae. St. Petersburg (1739) Euler, L.: De harmoniae veris principiis per speculum musicum repraesentatis Novi commentarii academiae scientarium Petropolitanae. 18, 330–353 (1774) Gollin, E.: Representations of Space and Conceptions of Distance in Transformational Music Theories. Ph.D. diss., Harvard University (2000) Mooney, M.K.: The Table of Relations and Music Psychology in Hugo Riemann’s Harmonic Theory. Ph.D. diss., Columbia University (1996)
3
Strings of letters such as QQ or QT-1 are known mathematically as “words” in the symbols Q and T. I explore words and their application to transformational music theory in “Form, Transformation and Climax in Ruth Crawford Seeger’s String Quartet, Mvmt. 3,” in the present volume, as well as in Gollin (2000). 4 I examine this idea at length in Gollin (2000). 5 L, P and R stand respectively for Leittonwechsel, Parallel and Relative, transformations that correspond to the German Leittonwechsel, Variante and Parallel relations respectively (relations that underlie the Scheinkonsonanzen in Hugo Riemann’s Funktionstheorie). In neoRiemannian theory, L, P and R are privileged because they represent ways of progressing between triads using minimal (or parsimonious) voice leading. For example, L maps a C-major triad to an e-minor triad (E and G are common tones and C moves by semitone to B) or viceversa; P maps an e-minor triad to an E-major triad (E and B are common tones and G moves by semitone to G#) or vice-versa. PL implies a composite motion in a space defined by parsimonious relations (i.e. the progression from E major to C major is implicitly mediated by e minor). Voice-leading parsimony and the centrality of the transformations L, P, and R in neoRiemannian theory is discussed in Cohn (1997).
Combinatorial and Transformational Aspects of Euler's Speculum Musicum
411
Riemann, H.: Ideen zu einer Lehre von den Tonvorstellungen. Jahrbuch der Musikbibliothek Peters, 21–22, 1–26, (1914/1915) Vogel, M.: Die Musikschriften Leonhard Eulers. In: Speiser, A. (ed.) Leonhardi Euleri Opera omnia sub auspiciis Societas scientiarum naturalium Helveticae. Ser. 3, pp. 11–12. Orell Füssli, Zurich (1960)
Structures Ia Pour Deux Pianos by Boulez: Towards Creative Analysis Using OpenMusic and Rubato Yun-Kang Ahn, Carlos Agon, and Moreno Andreatta Institut de Recherche et Coordination Acoustique/Musique, 1 place Igor Stravinsky, 75004 Paris, France {yun-kang.ahn,agonc,moreno.andreatta}@ircam.fr
Abstract. Pierre Boulez introduced the concept of creative analysis (Boissi`ere 2002) in the late 1980s suggesting that the aim of analysis should be the production of new pieces. Marcel Mesnage and Andr´e Riotte followed this path in their work on computer-aided analysis and composition (Mesnage and Riotte 1993). Our current study focuses on Ligeti’s analysis of Structures pour deux pianos Ia (first book in 19511952) by Boulez, where the compositional process is described in detail and further set a model. In Structures, Boulez uses series consisting of 12 pitches, 12 durations, 12 attacks and 12 dynamics borrowed from Messiaen’s Mode de valeurs et d’intensit´e (1949). Following Boulez’s analytic concept, our task is to start from his compositional model and consider how it might be used to produce another piece. Our study uses the software OpenMusic, which was developed at the IRCAM and is built on a functional paradigm. This graphical environment can be exploited to imitate the model used in Structures, stressing a functional point of view. Parallel to this, a second approach will be implemented using Rubato, an universal music software environment that has been developed at the University of Zurich. The keystone of this application is a categorical point of view, theorized by Guerino Mazzola. Since category theory and functional languages are strongly linked, the two software applications are complementary. However, Rubato brings a different level of abstraction and therefore offers new possibilities that have yet to be developed, for instance creating metapieces that could give different Structures.
1
Introduction
Musicology traditionnally distinguishes between analysis and composition. Indeed these represent two different disciplines although they cannot be completely separated. Boulez aims at reuniting them into what he calls creative analysis (Boissi`ere 2002; Boulez 1989). Thus, a musical work may be considered as containing potentiality, i.e. the possibility of opening perspectives to new compositions. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 412–418, 2009. c Springer-Verlag Berlin Heidelberg 2009
Structures Ia Pour Deux Pianos by Boulez
413
Evolution of the link between mathematics and music (Andreatta 2003) combined with the development of music through computer science, has been stimulating both composers and analysts to change their habits and has fostered interest in the formalisation of music. We begin by introducing the compositional process used by Pierre Boulez, and then present its implementation in OpenMusic. Afterwards we discuss the proposed implementation in Rubato, and how that would guide the future development of this work.
2 2.1
Compositional Process in Structures Ia Analysis of Constructional and Serial Principles: Decision and Automatism
As a basis for our study, we mainly use Ligeti’s famous analysis of this piece (Ligeti 1975; see also Eimert 1959; Feldman et al. 1952). He describes the respective roles played by compositional decision-making and a kind of automatism resulting from the serial techniques. Boulez was aiming to extend serial methods of composition (for details on his musical technique, see (Boulez 1963 and 2002)). He chose a twelve-tone series, which is arranged to follow the note-succession of “Division” from Messsiaen’s Mode de valeurs et d’intensit´es as an homage to his teacher (Fig. 1).
Fig. 1. The original series
From this series, Boulez generated two 12 ∗ 12 matrices : one with the 12 transpositions of the original series and a second one with the 12 transpositions of the inverted (by the first note) original series (Fig. 2). Paths through these matrices are chosen, organising serial threads defined by 12 pitches, durations, attacks, and dynamics, and thus extending usual serialist practice. 2.2
How to Create from an Analysis
This analysis of the compositional procedure leads us to ponder on the possibilities in order to compose. Ligeti copes with this problem by examining the balance between automatism and decision, i.e. what concerns the composer’s choice and what does not. From this model we can undertake a kind of metacomposition with two softwares packages proposing different approaches: OpenMusic and Rubato. Guerino Mazzola realised a piano sonata from an analysis he made of the Hammerklavier (op. 106) sonata written by Beethoven.
414
Y.-K. Ahn, C. Agon, and M. Andreatta
Fig. 2. One matrix and some paths used for the composition
3 3.1
An Implementation in OpenMusic: A Visual and Functional Environment Patches and Circularity
OpenMusic offers a human-machine interface which was originally explicitly dedicated to composition. Yet, as we will see, it may also be used in the context of compositionnally-oriented musical analysis (Agon et al. 2004). Furthermore, since at this time theorists are not so commonly using computers in their work there are few accessible softwares packages specifically designed for musical analysis. The basic unit of programming in OpenMusic is a patch. A patch may consist of functions linked up to create a flowchart representing a task. Based on a functional paradigm (LISP/CLOS), OpenMusic offers the advantage of recursivity a patch can contain other patches, inclunding copies of itself. Figure 3 shows the patch allowing the creation, given the original series as input, of the two sections of the piece. 3.2
Composing Following the Model with the Benefit of a Graphical Composition Environment
Given a twelve-note series, the patch applies the same process that Boulez defined. The patch produces a score where bundles defined by Boulez are displayed in tracks (Fig. 4). Each track can then be modified directly. According to Ligeti’s analysis, register (i.e. which octave do we select for the related pitch-set) is an important parameter which may be modified in the process of composition. We choose not to control dynamics as they are difficult to set up in OpenMusic. This emphasizes the distinction between automatism (the procedures contained in the patch) and decision (the allowed modifications on the result). Thus the user is able to create “errors” like Boulez did in his piece (according to Ligeti (1975)).
Structures Ia Pour Deux Pianos by Boulez
415
Fig. 3. The patch in OpenMusic reproducing the compositional process of Structures Ia
Fig. 4. The produced score in OpenMusic
4 4.1
Rubato: A Higher Level of Abstraction with a Categorical View Different Perspectives Delivered by Rubato
Rubato is based on a music theoretical paradigm introduced by Mazzola (2003) and on mathematical category theory (especially the theory of topos), pointing out transformations rather than objects (this concept exists in musical analyses using the methods of transformational analysis developed by David Lewin 1982). Figure 5 shows the Rubato environment.
416
Y.-K. Ahn, C. Agon, and M. Andreatta
Fig. 5. The Rubato environment and an example of network
4.2
Possibilities Brought by Rubato
This software involves higher levels of abstraction. For instance, it deals with generalized musical objects. Indeed, objects like melody are defined, allowing new perspectives. This frame fits the combinatoric point of view of Boulez’compositional process, allowing new perspectives in developing a kind of metacomposition which will be able to foster different pieces. 4.3
Scheme of the Construction
The aforementioned theory allows us to speak of generalized points in space (i.e. addressed points instead of traditional points known from standard mathematical approaches). This means that we are rather interested in morphisms (here module morphisms) than in objects. The key concept is to rely on the module Z11 which is strongly related to the structure of the piece, with a twelve-tone qualities series and the 12 ∗ 12 matrices. Then the idea is to associate the 12 basis vectors of Z11 to each note (i.e. create a module morphism) (Fig. 6), and generalize this process associating them to each row of the matrices. Then permutations of basis vectors will allow to generate the piece. The result is displayed in figure 7, which is a pianoroll -like display of the score produced by Rubato.
Fig. 6. Association of the 12 basis vectors of Z11 with 12 notes
Structures Ia Pour Deux Pianos by Boulez
417
Fig. 7. The Rubato “score” of Structure Ia
5
Conclusion
OpenMusic and Rubato offers tools to plan compositional models and consequently allows composers to modify their vision and enlarge their possibilities. The respective general models brought about by these analyses may be subjected to further modifications which subsequently generate new compositions. This approach may be initially generalized to other parameters (like instruments, etc.). Furthermore, the process used by Boulez may be thought of in terms of permutations (and permutation-groups) which could lead to new directions, i.e. metacomposition realised by another group of permutations which may create a set of new Structures.
References Agon, C.: OpenMusic: Un langage visuel pour la composition musicale assist´ee par ordinateur. PhD diss., Paris VIII University (1998) Agon, C., Andreatta, M., Assayag, G., Schaub, S.: Formal Aspects of Iannis Xenakis’ Symbolic Music: A Computer-Aided Exploration of Compositional Processes. Journal of New Music Research 33(2) (2004) Andreatta, M.: M´ethodes alg´ebriques dans la musique et musicologie du XXe si`ecle: aspects th´eoriques, analytiques et compositionnels. PhD diss., EHESS/Ircam (2003) Babbitt, M.: The function of Set Structure in the Twelve-Tone System. PhD diss., Princeton University (1946/1992) Boissi`ere, A.: Geste, interpr´etation, invention selon Pierre Boulez. Revue DEM´eter. Lille-3 University (2002) Boulez, P.: Jalons (dix ans d’enseignement au Coll`ege de France). Bourgeois (1989) Boulez, P.: L’´ecriture du geste. Bourgeois (2002) Boulez, P.: Penser la musique aujourd’hui. Ed. Gonthier (1963) Eimert, H.: The composer Freedom and Choice. Die Reihe III. Universal Edition London and Theodore Presser Company (1959)
418
Y.-K. Ahn, C. Agon, and M. Andreatta
Feldman, M., et al.: 4 Musicians at Work. Transformation 1(3) (1952) Forte, A.: The Structure of Atonal Music. Yale University Press. (1973) Lewin, D.: Generalized Musical Intervals and Transformations. Yale University Press (1982) Ligeti, G.: Pierre Boulez - Decision and Automatism in Structure IA. Die Reihe 4, 36–62 (1975) Mesnage, M., Riotte, A.: Mod´elisation informatique de partitions, analyse et composition assist´ees. Cahiers de l’Ircam 3 (1993) Mazzola, G.: The Topos of Music. Birkh¨ auser, Basel (2003)
The Sieves of Iannis Xenakis Dimitris Exarchos Goldsmiths, University of London, United Kingdom [email protected]
Abstract. Xenakis’s article ‘Sieves’ was published in 1990, but the first extended reference to Sieve Theory is found in the final section of ‘Towards a Metamusic’, published 1967. These two writings mark two periods, where there is a progression from the decomposed formula to the simplified one. This progression reflects Xenakis’s aesthetic of sieves as timbres and is here explored under the light of the idea of inner periodicities.
1 Introduction A sieve refers to a selection of points on a straight line and involves the combination of two or more modules. A module is notated by an ordered pair (M, I) that indicates a modulus (period) and a residue (an integer between zero and M – 1). These modules can be combined by the logical operations of union (+), intersection (·), and complementation (–). In this paper I will demonstrate that, in his later period, Xenakis used only the operation of union and that he abandoned the other two. The diatonic major scale (with period 12 semitones in twelve tone equal temperament) can be expressed as follows (where C = 0 and the unit distance is the semitone): (4, 0)·[(3, 0) + (3, 1)] + (4, 1)·[(3, 0) + (3, 2)] + (4, 3)·[(3, 1) + (3, 2)] + (4, 2)·(3, 2). In this expression, M = 12 is decomposed into elementary moduli M1 = 4 and M2 = 3. The Lowest Common Multiple (LCM) of the elementary moduli is equal to the period (12). There might be several equivalent formulae for a given sieve as we can choose among several alternative decompositions of the modulus. This redundancy of formulae is overcome by prime factorisation, as this is aimed at rendering a decomposed form of a composite number. According to the Unique Factorisation Theorem any integer α > 1 , can be uniquely written as
α = p1l ⋅ p 2 m ⋅ ... ⋅ p k n , where p1, p2, …, pk are prime numbers, p1 < p2 < … < pk and l, m, n ∈ *; this is called the canonical form of Į. It is easily inferred from the theorem, that since two
numbers Į, ȕ, are coprime then α , β (where Į, ȕ, m, n ∈ *) are coprime as well. The canonical form of 12 is 22·3. When these factors correspond to moduli, the literal intersection would have to be (2, I1)·(2, I2)·(3, I3). It is obvious that (2, I1)·(2, I2) is not a valid option. In the case that I1 = I2 the intersection implies that (2, I1) = (2, I2). m
n
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 419–429, 2009. © Springer-Verlag Berlin Heidelberg 2009
420
D. Exarchos
In the case they are different the intersection is empty. We should therefore resolve any exponentials before treating prime factors as the elementary moduli of a period:
12 = 2 2 ⋅ 3 = 4 ⋅ 3 . Consequently, there is one practical limitation in this process: the resolution of all powers prior to the decomposition of the moduli is not possible for all composite moduli. A modulus cannot be decomposed if it is a prime power. Thus, both primes and prime powers represent moduli that are non-decomposable.
2 Types of Formulae When the period of a sieve is taken into account its formula can have two forms: decomposed or simplified. The decomposed formula is the one that employs only moduli that are primes or prime powers. These are the prime factors (or the prime powers that derive from them) of the sieve’s period. Of this type is the formula that uses moduli 4 and 3 to express a sieve whose period is 12. The decomposition of a module is expressed by intersection; thus, a decomposed formula frequently includes intersections. Less frequently, a decomposed formula might involve only union of modules, such as sieve (4, 0) + (3, 0), with period 12.1 A simplified formula that expresses the period of the sieve consists only of unions of single modules and includes composite moduli.2 In the example of the diatonic major scale it is the formula that is based on the octave: (12, 0) + (12, 2) + (12, 4) + (12, 5) + (12, 7) + (12, 9) + (12, 11).
3 Symmetries/Periodicities A formula might not necessarily represent a sieve according to a single modulus, i.e. its period. There is another distinction between two types of formulae: the one that is based on the external period and the one that ignores it. Xenakis constructed an algorithm (and a computer program based on it) that produces a formula by assigning the smallest possible modulus to each point of the sieve. I will examine the construction of such a formula in Section 3. This distinction is parallel to the decomposed or simplified one: we can start with a simplified formula that ignores the external period and decompose each modulus, if it is a composite number. The result is a decomposed formula that is not based on the external period of the sieve. The information that a formula reveals when it is not based on the external period is a very important issue which is pertinent both to the aesthetics of sieve-construction and to sieveanalysis. 1
Note that one can always construct a formula with elementary moduli that are not primes or prime powers; e.g. 60 can be decomposed to 5·12 instead of 4·3·5 (60 = 22·3·5). This decision depends on whether one wishes to express the period as the LCM of two factors instead of three. This particular decomposition (i.e. 60 = 5·12) was used by Xenakis for the rhythmic sieves of Persephassa (1969, for six percussionists) (see Gibson 2003, 58ff). 2 Ariza refers to ‘sieve models’ instead. What I refer to as ‘decomposed formula’ and ‘simplified formula’, he refers to as ‘complex sieve’ and ‘simple sieve’. He argues that the latter model ‘fails to incorporate aspects of the original [the former]’ (2005, 44).
The Sieves of Iannis Xenakis
421
In Xenakis’s sieves the intervallic structure is highly irregular and asymmetric, while the period of the sieve was rarely intended to be audible. Sieve Theory was developed in order to study hidden symmetries. Xenakis demonstrated that an elementary modulus is a kind of tempered chromatic scale, with unit set to an interval other than the semitone. He pointed at an observation Bertrand Russell made in relation to the axiomatics of numbers and he showed that the tempered chromatic ‘has no unitary displacement that is either predetermined or related to an absolute size’ (Xenakis 1992, 195). We can have chromatic (or equidistant) scales of quarter-tones or semitones, but also of whole tones, perfect 4ths, etc (see Varga 1996, 94-5). The chromatic scale of the perfect 4th is a module with M = 5. The sieves following Jonchaies (1977, for orchestra) share a certain aesthetic: they are characterised by an irregular distribution of a set of intervals, between the semitone and the major 3rd. The construction of these sieves is inspired by the Javanese pelog with its interlocking 4ths (see Varga 1996, 144-5). In the following quotation Xenakis talks about the rhythmic structures of Jonchaies, but the same principle was applied to the construction of his sieves: We can illustrate regular events by points an equal distance apart. On a second, lower parallel line, more points represent other regular patterns with a different time unit, so they are shifted with respect to the first line’s points even if they start together. This procedure can be repeated with regular points on other lines. When we hear all these lines together, we obtain a flow of events which consists of a regular intervallic series, but which as a whole is impossible to grasp (Xenakis 1996, 148). The modules in a simplified formula represent ‘chromatic’ scales (with different units and starting points), periodicities, or symmetries (cf. Xenakis 1992, 270). Here symmetry has a broad meaning: it stands for regularity in general. In this sense, every irregular scale can be broken down into a multiplicity of regularities.
4 Inner-Periodic Formula 4.1 Inner Periodicities and Formulae Redundancy The inner periodicities of the sieve are shown by a simplified formula in the form of modules. The multiples of each modulus produce some of the points of the sieve. When the period is not known (or not taken into account), a simplified formula is based on the inner periodicities and is inner-periodic. The redundancy of inner-periodic formulae can be overcome by checking every single point of the sieve and assigning to it the smallest possible modulus. We can find for each point, the module with the smallest M, that either starts on this point or produces it later. These are in fact two different approaches. The former method was for Xenakis an earlier stage before arriving at the latter, implemented in his final analytical algorithm. 4.2 Construction of the Inner-Periodic Simplified Formula In order to demonstrate the progression to this inner-periodic conception of sieves, let us take for example the sieve of Akea (1986, for piano and string quartet) as found in
422
D. Exarchos
Xenakis’s pre-compositional sketches (see Figure 1).3 It consists of 37 points, in a range of 80 semitones, and its intervallic structure is asymmetric (non-palindromic). In Akea the sieve is not used in any way that might reveal the existence of a periodicity; this is confirmed by the fact that in the sketches Xenakis used only an innerperiodic simplified formula (where the bottom pitch C1 = 0 and C4 = middle C): (5, 40, 8) + (14, 9, 5) + (14, 52, 2) + (15, 10, 4) + (15, 36, 2) + (17, 28, 3) + (18, 5, 4) + (19, 18, 3) + (19, 22, 3) + (19, 27, 2) + (19, 32, 2) + (20, 31, 2) + (23, 14, 2) + (24, 7, 3) + (25, 0, 3) + (25, 16, 2).
Fig. 1. Sieve of Akea
There are 16 modules in the sieve and they share 10 moduli. Here the modules are notated as (M, I, R), which stands for: Modulus, Initial point, Reprises of the Modulus. We see that Xenakis did not reduce the residues according to the moduli. Thus, with (5, 40) Xenakis indicates that a periodicity of 5 semitones starts at point 40 (here E4). The third entry in the brackets (R) shows the number of repetitions of each module.4 4.3 Analytical Algorithm: Early Stage The method Xenakis used to arrive at the above formula is not identical to the algorithm he presented in his article ‘Sieves’ (1990, 64-5; 1992, 274-5). It is a precursor of this algorithm. This earlier algorithm goes through the following steps: (a) Each point is considered as a point of departure (I) of a modulus (M). We start testing the first point (In) with M = 2 and check if: (i) its multiples produce only points that belong to the given sieve, and (ii) it produces at least one of the not-yet-covered points of the given sieve. (b) If (i) is not satisfied we pass on to M + 1. If it is satisfied we keep the module and check if (ii) is satisfied: if yes, we keep the module and pass onto the next point (In+1); if not, we ignore the module and pass onto In+1. (c) We stop when every point of the sieve has been covered by a module.
3 4
This sieve was also used in Ata (1987, for orchestra). This is slightly different from Xenakis’s program, which uses R to denote the number of points covered. Following Xenakis’s own practice in the sketches, I will use R to denote the occurrences of a modulus, instead of the number of points covered. This indicates more effectively the contribution of each modulus to the inner-periodic structure of a sieve.
The Sieves of Iannis Xenakis
423
4.4 The Condition of Inner Periodicity The modules in a simplified formula sieve can be considered as inner periodicities only if they repeat at least twice. Thus, a module must cover at least three points. Three equally distant elements produce two equal intervals (modulus) that make it possible to compare them and identify them as (two occurrences of) a periodicity. This is precisely Xenakis’s view of sieves as outside-time structures. In his article ‘Symbolic Music’ he referred to the stages of temporal perception: three successive events a, b, c, ‘divide time into two sections [that] may be compared and then expressed in multiples of a unit’ (Xenakis 1992, 160). Although he talks about time here, he stressed that the temporal algebra time-intervals require is identical to that of outside-time structures (such as sieves). In order for R 2, the size of the modulus must be less than half the distance between the starting point of the module and the highest point of the sieve. If n is the highest point of the sieve, this condition of inner periodicity is formulated as follows: for each module in a simplified formula it must be true that
M≤
n−I 2
in order for R 2, and for the sieve to be inner-periodic. I will refer to modules with R 2 as periodic modules. The formula in the sketches of Akea includes only periodic modules. This corresponds to a an ‘inner-periodic’ conception of sieves that Xenakis maintained throughout his application of Sieve Theory in his later music. 4.5 Analytical Algorithm: Final Stage The method Xenakis used to produce the formula of the sieve of Akea differs from the final algorithm of 1990 in one significant aspect: the starting point of a module is not necessarily smaller than the size of the modulus. In other words, if it is enough for a modulus to end at a distance from n smaller than its own size, it is not necessary to depart as close to the lowest point. Such is the case of module (5, 40): modulus 5 is present only for the second half of the sieve. This means simply that the perfect 4th is characteristic for only a part of the sieve and it is a consequence of not limiting the size of I to be smaller than M. On the contrary, module (25, 0), whose range is 75 semitones, is present from the bottom almost to the top. More specifically, both the initial and the final points of (25, 0) lie at a distance from the edges of the sieve smaller than the size of M. As the algorithmic process goes through the higher points, the possibility of finding smaller periodicities naturally increases. A sieve is conceived as a multiplicity of periodicities and analysis should not depend on an unbalanced favouring of smaller moduli when progressing towards higher points. This is the reason why Xenakis added one more step in the final version of his analytical algorithm. The additional condition in the final algorithm reintroduces the idea of the residue: in all modules it must be true that M > I. This is shown in the final step of the algorithm as published in the 1990 article: ‘we ignore all the [modules] which, while producing some of the not-yet-encountered points of the given series, also produce, upstream of the index I, some parasitical points other than those of the given series’ (Xenakis 1990, 65; 1992, 275; italics added). Therefore, the algorithm
424
D. Exarchos
does not look merely for the smallest modulus that departs from the point under consideration, but for the smallest modulus that starts at a point smaller than its own size and that produces the point under consideration (unless this point is located early enough in the sieve so that it is itself the starting point). This is the algorithm that Xenakis’s analytical computer program is based on (see Xenakis 1990, 76ff; 1992, 285-8).5 The program checks for the residue class with the smallest modulus, whose members belong to the sieve and include the point under consideration. The formula that the program suggests for the sieve of Akea has 17 modules that share 13 moduli: (14, 9, 5) + (15, 10, 4) + (18, 5, 4) + (19, 18, 3) + (20, 5, 3) + (23, 14, 2) + (24, 7, 3) + (24, 22, 2) + (25, 0, 3) + (25, 16, 2) + (26, 10, 2) + (27, 5, 2) + (27, 25, 2) + (28, 0, 2) + (28, 27, 1) + (29, 22, 2) + (31, 9, 2). We see that the moduli are now in average greater than the ones in the formula Xenakis used. As I increases, the final version (contrary to the earlier) finds gradually greater moduli. 4.6 The Condition of Inner Symmetry If it is true that M > I for all modules in a simplified formula, then this formula can be used to produce the inversion of the sieve. To construct the formula of the inversion, we keep the moduli of the original and replace every I by n − ( R ⋅ M + I ) . When all the inner periodicities extend as far as possible to both directions in a sieve, and therefore could also produce its inversion, then the sieve bears a certain kind of symmetry. But this symmetry is still a hidden one: the intervallic structure of such a sieve might very well be asymmetric, i.e. non-palindromic. Thus another level of symmetry is revealed through the distribution of the inner periodicities. In analogy with the inner-periodic nature of Xenakis’s sieves of the later music, I will refer to this kind of symmetry as inner symmetry and to sieves that exhibit inner symmetry as inner-symmetric.6 An inner-symmetric sieve is necessarily inner-periodic. This is because the algorithm always finds a modulus that meets at least two points when I < n 2 .7 But with the condition of inner periodicity a module must meet at least three points, i.e. R 2, which affects the maximum size of M. With the additional limiting of the size of M to values greater than I, M depends on the varying values of I for both its minimum and maximum values. Since we operate in discreet space, I < M is equivalently expressed as I + 1 M. Therefore, the condition of inner symmetry, which incorporates that of inner periodicity, is formulated as follows:
I +1 ≤ M ≤ 5
n−I . 2
(1)
This program is provided in Squibbs (1996, 291-303) with no typographical errors. In (1996, 149) Xenakis refers to ‘inner symmetries’; this is what I here refer to as ‘inner periodicities’. 7 Since the absolute value of I affects the size of M, we should calculate the formula with the sieve’s lowest point set to 0. 6
The Sieves of Iannis Xenakis
425
Inner periodicity was defined only as a condition for inner symmetry. It is important to stress here that both of them are integrated into one. I used the former to demonstrate the earlier stage of Xenakis’s method, which was extended by his final algorithm. Xenakis’s prompt to study the hidden symmetry of a sieve referred to both symmetries and periodicities, in the form of moduli. These two notions are then themselves integrated; inner symmetry is achieved by the analysis and synthesis of inner periodicities. Thus, an outside-time characteristic (symmetry) is achieved through the treatment of an inside-time one (periodicity). In this sense, an inner-symmetric sieve is necessarily inner-periodic. 4.7 Inner-Symmetric Analysis In fact, most of Xenakis’s sieves of the later period include one or two non-periodic modules. In the sieve of Akea this is the case with module (28, 27). The points that module (28, 27) covers are 27 and 55 (D#3 and G5); from these points, 27 is covered only by this module, but 55 is also covered by two additional periodic modules. Therefore, D# is the only element of the sieve of Akea that does not belong to an inner periodicity. This slight deviation from complete inner symmetry is perhaps one of Xenakis’s typical aesthetic criteria; I will examine this further after I formulate in more detail the consequences of the condition of inner symmetry. The condition of inner symmetry also implies that the minimal and maximal permissible values of M for a given I, increase and decrease respectively as I increases. This allows for the determination of the absolute maximum permissible value for I, for inner-symmetric sieves, given a constant n. The smallest permissible I is naturally 0. The maximum permissible value of I in inner symmetric sieves is found at the point of convergence of the two tendencies of the value of M as I increases. Let us indicate the value of M at this point as Mc. Then from (1) follows that
I +1 = Mc =
n−I . 2
Consequently,
I +1 =
n−I n−2 2 I + 2 = n − I 3I = n − 2 I = . 2 3
Therefore, the maximum value of I for any inner-symmetric sieve is
I=
n−2 . 3
For the sieve of Akea, or for any inner-symmetric sieve with n = 80, the greatest possible I is (80 − 2) 3 = 26 . Since the value of I when M = Mc is the maximum and since it depends only on n, we can locate the value of M at this point (Mc), according to (1), as follows:
I +1 = M c I = M c −1
426
D. Exarchos
and
n − ( M c − 1) n−I n +1 Mc = 2 M c = n − M c + 1 3M c = n + 1 M c = 2 2 3 That is, in inner-symmetric sieves, Mc (i.e. the M-value when I is maximal) is
Mc =
Mc =
n +1 8 . 3
For any inner-symmetric sieve with n = 80, the inner-periodicity starting at the maximum I is M c = (80 + 1) 3 = 27 . In the sieve of Akea, module (28, 27) has the smallest M that could be assigned to I = 27, but this value of I exceeds by one semitone the maximum limit: I = ( n − 2) 3 = 26 . The values M can take a range 9
between 1 and n 2 . The critical point Mc is the value that determines, not merely the highest permissible I, but the behaviour of the values of I as M increases. In fact, Mc is the value of M that determines which part of the inequality in (1) defines the maximum values for I: when
M ≤ Mc ,
I +1 ≤ M I ≤ M −1;
when
Mc ≤ M ,
M≤
n−I 2M ≤ n − I I ≤ n − 2M . 2
As the value of M increases towards Mc, the maximum value of I also increases; as the value of M increases beyond Mc, the maximum value of I decreases. The upper curve in the chart of Figure 2 shows the maximal values of I for every possible M in any inner-symmetric sieve of n = 80. The x-axis represents all the consecutive values of M between 1 and 80 2 = 40 ; the y-axis represents the values of I. We see that the values of I increase linearly up to Mc: for the left part of the chart I M – 1. As the value of M increases beyond Mc, the values of I decrease linearly: for the right part of the chart I n – 2M. The chart demonstrates a general property of inner-symmetric sieves; for a given n, the extreme values of M and I are related as follows: when M is minimal (M = 1), I is minimal (I = 0); when M is maximal ( M = n 2 ), I is minimal; when I is minimal, M can take any permissible value; when I is maximal ( I = ( n − 2) 3 ), M = Mc. The graph of Figure 2 is a chart for the values of M and I, that accounts for the inner symmetry of a sieve. Since these values depend only on the size of n, the above relations of the values and their limits enable one to construct sieves that satisfy the condition of inner symmetry. If a module appears below the upper curve it repeats at least twice. Therefore, when all modules appear under this limit, the sieve is inner-symmetric. The modules in the formula of the sieve of Akea are shown in this chart by dots followed by the pair (M, I). The chart offers a synoptic view of the inner symmetry of the sieve: as I have shown, module (28, 27) repeats only once and therefore lies outside the upper curve. 8 9
When the result is not a whole number, Mc is equal to the lower integer. Although Xenakis (1992, 274-5) defines the first step of his algorithm to start at M = 2, his program starts testing at M = 1. Module (1,0) is then by extension the total chromatic.
The Sieves of Iannis Xenakis
427
30
28, 27 25
27, 25 24, 22
29, 22
20
19, 18 25, 16
I 15
23, 14 15, 10 14, 9
10
26, 10
31, 9
24, 7 5
18, 5
20, 5
27, 5 25, 0
28, 0
0 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 M
Fig. 2. Inner Symmetry Chart for the Sieve of Akea
In inner-symmetric sieves the number of reprises of the modulus (R) depends also on the value of I. Whereas the condition of inner symmetry determines the minimal R (R 2), the value of I determines whether R will take its maximal or its sub-maximal value: when
I ≤ n(mod M ) ,
when
n(mod M ) < I < M , R =
R=
n − n(mod M ) ; M n − n(mod M ) − 1. M
The value of R for a given module can be seen easily in the inner symmetry chart. The values of n(modM) for every M are shown by the dotted zigzag curve. When a module lies on or below the n(modM) curve [which implies that I ≤ n(mod M ) ] R is maximal; when a module lies above the n(modM) curve [which implies that n(mod M ) < I < M ] R is sub-maximal. When a module is located towards the right part of the chart, its M is higher and its R-value is lower. The additional n(modM) curve helps one to locate, in a glance, the most periodic modules in the sieve: these would be located towards the left and under the dotted curve. When the modules of a simplified formula tend to concentrate to this area they are more periodic (their R-value is higher). The area of the chart where 21
M 26 between the two curves and M 26 below the upper curve would accommodate modules with R = 2. The area bellow the dotted curve for 21 M 26 and above the dotted curve for 17 M 20 is for R = 3. The R-values increase as we move to the previous peaks of the n(modM) curve. I will refer to this type of chart as inner symmetry chart. Seven modules of the sieve of Akea lie beneath (or on) the n(modM) curve and therefore their R is maximal.
428
D. Exarchos
4.8 Modules and Degree of Symmetry
As I have shown, the greater the size of M, the smaller the value of R. The value of R also depends on the value of I. Specifically, the value of I determines whether R will have its maximal or its sub-maximal value for a given M. Finally, the value of R depends on the total number of modules in a formula. In the sketches of Akea Xenakis noted that there are many modules in the formula and this means that only few of them repeat continuously.10 In other words, only few will have a high R-value. The characteristic set of intervals of Xenakis’s sieves of the 1980s affect their density, which remains in a constant average value for most of the sieves of that period. This density can be denoted by the ratio of the number of elements to the range of the sieve. The sieve of Akea has density D = 37 80 ≈ 0.46 . Note that the number of modules in the simplified formula does not reflect this density, since a point might be covered by more than one module. For this reason, the relationship between density and the number of modules cannot be formulated rigorously. In general though, the larger the number of the modules in a formula, the lower the average value of R; and the lower the average value of R, the larger the average size of M. Consequently, the average size of M depends on the number of modules: for a given density, the large number of modules is compensated by large moduli. Sieves with similar density to the one of Akea are expected to appear either similarly in the chart of Figure 2, or with less modules but concentrated to the left. The large number of modules in a formula accounts for the ‘hidden’ symmetry. The leftmost part of the chart indicates a more superficial symmetry and the rightmost part a deeper one. When the inner-periodic simplified formula includes only periodic modules, a large number of modules indicates a low degree of symmetry, in the sense that absolutely symmetric sieves have only one module (such as a chromatic scale). But the sieves of Xenakis’s later music usually include one or two non-periodic modules: the irregularity of their intervallic structure reaches the limit of inner-symmetry. Analysing the inner symmetry of a sieve, is not aimed merely at classifying sieves in one of the two categories: (inner-)symmetric and asymmetric. Both notions of symmetry and asymmetry are crucial to the aesthetics of Xenakis’s music: they are ‘the two poles between which music goes back and forth, and the first suggestion of a solution comes from distributing points on a line’ (Xenakis 1996, 147). With the help of the inner symmetry chart, we can see not only whether a module is periodic or not (for this the R-value would suffice), but also how distant a module is from the limits of inner symmetry; and with the inclusion of the n(modM) curve, the inner symmetry chart shows both a general and a more detailed picture of the play between symmetry and asymmetry. 10
‘Beaucoup de (M, I, R) car #S = 17 vue[?] 37 points du crible. Donc peu de périodes d’une seule traite’. In my translation: ‘Many (M, I, R), since #S = 17, given the 37 points of the sieve. Therefore, few periods without stopping once’ (Xenakis, Iannis. Pre-compositional sketches of Akea. Bibliothèque Nationale de France, Musique). Here Xenakis uses ‘S’ to denote the number of modules. When he calculated the formula by hand, he apparently made a mistake and kept module (19, 46) which is included in (19, 27); this is not shown in the formula as reproduced in this paper (therefore the number of modules is shown to be 16). The question mark denotes illegible handwriting.
The Sieves of Iannis Xenakis
429
References Ariza, C.: The Xenakis Sieve as Object: A New Model and a Complete Implementation. Computer Music Journal 29(2), 40–60 (2005) Gibson, B.: Théorie et pratique dans la musique de Iannis Xenakis: À propos du montage. PhD diss., École des hautes études en sciences sociales (2003) Squibbs, R.: An Analytical Approach to the Music of Iannis Xenakis: Studies of Recent Works. PhD diss., Yale University (1996) Varga, B.A.: Conversations with Iannis Xenakis. Faber & Faber, London (1996) Xenakis, I.: Sieves. Perspectives of New Music 28(1), 58–78 (1990) Xenakis, I.: Formalized Music: Thought and Mathematics in Composition, edited by Sharon Kanach. Pendragon Press, Stuyvesant (1992) Xenakis, I.: Determinacy and Indeterminacy. Organised Sound 1(3), 143–155 (1996)
Tonal, Atonal and Microtonal Pitch-Class Categories Fernando Gualda Sonic Arts Research Centre Queen's University Belfast, UK [email protected]
Abstract. This paper reviews and generalizes Pitch-Class Set Theory using Group Theory (groups acting on pc-sets) and Category Theory, which provides methods for mapping the structure of a n-tone system onto another m-tone system. This paper also suggests a new implementation approach that represents pitch-class sets as bit-sequences, which are equivalent to integer values. Forte’s best normal order is shown to be equivalent to the smallest integer, among the cyclic permuted pc-sets. Further, transposition of pc-sets is shown to be equivalent to bit-shifts; and their inversion, to bit-reversal. The tonal (diatonic) pc-category is presented as a subset of the atonal (12-tone) pc-category, which, similarly, can also be contained in a microtonal pc-category. Functors between those categories present properties that preserve relationships while still using the same operations: tonal relationships are preserved even though atonal music operations, such as transposition and inversion are applied, allowing motives to be mapped into different modes, scales, or even microtonal scales. The appendix offers an implementation of this new approach to calculate and represent pc-sets with an arbitrary number of pitch-classes.
1 Introduction Pitch-classes are generally defined as musical notes assigned to fixed integer values given by the reminder of octave division by 12, as depicted in the top row of figure 1:
Fig. 1. Two representations of pitch-classes: 12-tone (top) and a diatonic major subset (bottom) T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 430–440, 2009. © Springer-Verlag Berlin Heidelberg 2009
Tonal, Atonal and Microtonal Pitch-Class Categories
431
This paper suggests different representations of pitch-classes depending on their context, such as the subset of white piano keys (bottom row of figure 1), which could be seen as a diatonic major scale starting at the pitch C. Pitch-classes are generally represented as mathematical sets. Nevertheless, agreements on a standard notation for sets of pitch-classes have not yet been reached. This paper uses the following notation conventions: [2, 0, (n-1), …]n representing unordered sets, and [0123…(n-1)]n meaning partially ordered sets. In both cases n represents the total number of pitch-classes in context, therefore not the cardinality of proper subsets. Whenever n is not given, it follows the traditional context of 12 pitch-classes. For the sake of clarity, nomenclature related to pitch-class set theory is henceforth italicized. Representation of notes as pitch-classes assumes octave and enharmonic equivalences. Within tonal music, however, enharmonic equivalence is not assumed: besides rules of voice-leading, pitch spelling depends on musical key or tonal center, according to which pitches are contextualized. Further, octave equivalence is not assumed in case of extended chords such as ninth or eleventh chords that are common in jazz and other musical styles.
2 Applying Pitch-Class Set Theory on Sets with Cardinality (Pitch-Classes) Other Than 12 The term Pitch-Class Set (pc-set) as used in musical analysis and composition was introduced by Babbitt (Forte 1973). Pitch-Class Set theory and terminology was developed by Forte (1973), Rahn (1980), and Straus (1990), among others. Equivalence of pc-sets had been extensively discussed. Morris (1982) compared four distinct approaches to partitioning the universe1 (power set) of pc-sets. The main operations on pc-sets are transposition and inversion. Forte defined the terminology for classification of pitch-class sets: pc-sets can be reduced to a normal order, in which pc-sets are ordered according to their pitch-classes. Normal orders are cyclic permutations (transpositions) of the best normal order, which is the permutation that presents the smallest, ordered intervals, starting at pitch class zero. Finally, the prime form is the ‘best’ (smallest) between the best normal order of a given pc-set and the best normal order of its inversion. Transposition equivalence of a pc-set within the universe (power set) of a set of twelve pitch-classes is isomorphic to chromatic transposition, in which all (chromatic) intervals between pitch-classes are maintained. Considering, however, a power set of a set of seven pitch-classes that form a diatonic major scale, diatonic triads can be seen as circular permutations of the same pc-set7, preserving their tonal relationships, while still being a result of the same operations used on pc-set12: 1
“U*, the universal set of 4096 pcsets, has been partitioned into collections of "equivalent" sets in a number of different ways. In the four systems we shall examine, U* is initially divided into collections of sets of equal size. Pcset equivalence is further defined by intervalclass content or by identity under certain twelve-tone operators, which may be any combination of Tn, I, and/or M.” (Morris 1982, 102) The ‘universal set’ described by Morris is a power set. Classification of pc-sets in this paper is based on group action and is in accordance with Forte (1973). The discussion on classification regarding interval classes is beyond the scope of this paper.
432
F. Gualda
Fig. 2. Permutations of diatonic triad chords: as pc-set7 (top) and pc-set12 (bottom)
Inversion equivalence of 12-tone pc-sets does not necessarily preserve tonal relationships. The prime form of major triads [047]12 are [037]12, which are minor triads. Mappings that preserve tonal relationships under transposition and inversion are presented in section 3 of this paper.
3 Pitch-Class Set Theory within a Bit-Sequence A power set is defined as the set of all subsets of a given set. Its cardinality is thus given by:
ȁܲሺܵሻȁ ൌ ʹ ,
(2.1)
in which n is the number of elements (cardinality) of the underlying set. Conveniently, power sets are isomorphic to binary representation, in which each element (of the underlying set) corresponds to a binary digit. So, if n elements of a set are assigned to values ranging from 0 to n-1, the kth element of the power set is given by:
ܲǡ ൌ σ ʹ ,
(2.2)
where k ranges from 0 to 2n-1, and p are integers assigned to the elements of the underlying set, ranging from 0 to n-1. Thus, the universe of pc-sets with n pitchclasses is isomorphic to the set of all bit-sequences with n bits. Table 1. Some examples of equivalences between transposed pc-sets and their binary and integer representations Set Class2 0-1 1-1 3-11 3-11B 4-27 4-27B 6-35 7-35
2
T0 ∅ [0] [037] [047] [0258] [0368] [02468A] [013568A]
Binary (T0) 000000000000 000000000001 000010001001 000010010001 000100100101 000101001001 010101010101 101010110101
Forte’s set name (set-class).
Integer 0 1 137 143 293 329 1365 2741
T5 ∅ [5] [058] [059] [157A] [158B] [13579B] [13568AB]
Binary (T5) 000000000000 000000100000 000100100001 001000100001 010010100010 100100100010 101010101010 110101101010
Integer 0 32 289 545 1186 2338 2730 3434
Tonal, Atonal and Microtonal Pitch-Class Categories
433
Table 2. Some examples of equivalences between inverted pc-sets and their binary representations Set Class3 1-1 3-11 3-11B 4-27 4-27B 6-35 7-35
T0 [0] [037] [047] [0258] [0368] [02468A] [013568A]
Binary (T0) 000000000001 000010001001 000010010001 000100100101 000101001001 010101010101 010110101011
Integer 1 137 143 293 329 1365 1451
I [0] [059] [058] [047A] [0469] [02468A] [13568AB]
Binary (I) 000000000001 001000100001 000100100001 010010010001 001001010001 010101010101 010110101011
Integer 1 545 289 1169 593 1365 1451
Table 3. Nomenclature correlations between pc-set theory, group theory and the new, suggested implementation PC-Set Theory PC-Set Total pitch-classes Transposition Inversion Best Normal Order Prime Form
Group Theory (group action) Element Group order Circular permutation (cyclic group) Reflection (dihedral group) Equivalence-class representative (cyclic) Equivalence-class representative (dihedral)
Binary representation Bit-sequence Number of bits Circular bit-shift Bit-reversal Smallest integer (Tn) Smallest integer (Tn / I)
Transpositionally equivalent pc-sets share the same best normal order: there is a group action that partitions the power set of pc-sets into equivalence classes (or setclasses) under an equivalence relation. The equivalence class created by the transpositions of a pc-set is isomorphic to a cyclic group. The best normal order is defined as the representative of an equivalence class under the binary relation4 smallest intervals5 among all circular permutations of a pcset. Similarly, this binary relation is isomorphic to the smallest powers of two within a bit-sequence. Hence, considering two natural numbers represented as bit-sequences, this binary relation is isomorphic to if and only if both natural numbers are transpositions of the same pc-set. Although this relation does not linearly order6 the transpositions of a pc-set, there is a unique element within the equivalence class that is minimal under this binary relation ( ). Inversion equivalence is isomorphic to reflection (flip) of a 2D dihedral group. In pc-set theory, inversion is given by subtracting7 the value of each pitch-class in the pc-set from the total number of pitch-classes.
ܲǡ ൌ ʹ൫ሺିሻ୫୭ୢ൯
3 4
5 6
7
(2.3)
Forte’s set name (set-class). Although a binary relation that well-orders all elements within an equivalence class does not exist for all equivalence classes, any element of a given equivalence class could be chosen to be its representative under a binary relation. Forte (1973, 3-5). With exception of pc-set 1-1 (or, more generally [0]n) which is linearly ordered under the binary relation ( ). a’ = 12-a (mod 12) (Forte 1973, 8).
434
F. Gualda
Inversion of pc-sets is thus equivalent to bit-reversal. In the example below, [059] is I([037]), which is also the T5([047]); similarly [058] is I([047]), which is T5([037]), as expected. The total number of best normal orders can be seen as the quotient of a group action; and prime forms, the quotient of a dihedral group action.
4 Pitch-Class Categories Let Pitch-Class Category (PCCn) be a category consisting of the power set of pc-sets as objects, and transposition and inversion of pc-sets as morphisms, with n being the total number of pitch-classes. In every PCCn exists a unique initial object (the empty set) and a unique terminal object (a set of all n pitch-classes), and they complement each other. Besides transposition and inversion, PCCn presents two other operations: sum, given by the union of objects (composition of inclusion functions); and product, which is the intersection of objects (pitch-classes shared by two pc-sets). The sum and the product obey commutative and associative laws. For any non-terminal object, there are several identity-preserving products. In addition, the most important features of the theory of pc-category are the functors between categories and their natural transformations. The internal structure (morphisms) of PCCn is completely preserved in PCCm provided8 that m n. Functors map every object9 and morphisms of PCCn onto objects and morphisms of PCCm according to PCCn. Table 4. Objects and morphisms of a pitch-class category
Objects and Morphisms
Symbol
Binary implementation
Initial
0
0000...000000 (all zeros)
Terminal
1
1111...111111 (all ones)
Identity
id
(pc-set bit-sequence itself)
Sum
∪
A | B (bitwise or)
Product
∩
A & B (bitwise and)
Id preserving product
⊆
(A & B) == A (Boolean equal)
Transposition
Tn
circular bit-shift
Inversion
8
I
bit-reversal
In the case of functors from PCCn to PCCm with n > m, only a subset of the domain is mapped onto the codomain, thus the functor becomes a ‘partial function’. Musically, notes that do not belong to the diatonic scale cannot be mapped as diatonic, even though they might have some relationship with the scale. 9 Objects being the sets in the power set of a set with cardinality (total number pitch-classes) m and n respectively.
Tonal, Atonal and Microtonal Pitch-Class Categories
435
In figure 2, diatonic triads were represented as all seven transpositions (T0 to T6) of [024]7, whereas the same triads mapped onto 12 pitch-classes are some of the transpositions of [047], [037], and even [036]. Therefore, the cyclic permutations of [024]7 are isomorphic to diatonic triads as represented in pc-set12. Figure 3 exemplifies functors between two pitch-class categories:
Fig. 3. Tonal mapping from PCC-12 to PCC-7, shift and subsequent mapping from PCC-7 to PCC-12
In the first row, a major triad is represented as pc-set12; a functor maps from twelve (first row) onto seven pitch-classes (second row); the third row is a transposition of the second row within PCC7; finally, another functor maps from seven onto twelve pitch-classes. As binary implementation, both functors are represented as the bitsequence: 101010110101, which is isomorphic to [024579B]12, [0123456]7, and the integer 2741. Natural transformations () guarantee structural equivalence. In the next example, the scale on which the triads are transposed is changed from diatonic major to harmonic minor.
Fig. 4. Change of scale as natural transformation
In figure 4, the functor that maps form PCC12 to PCC7 is based10 on [024579B]12 (row one to row two). The functor that maps from PCC7 to PCC12, however, is based on [023578B]12 (row three to row four). Now, permutations of the [024]7 will generate triads of the harmonic minor scale. Both [024579B]12 and [023578B]12 are isomorphic to [0123456]7, which serves as a ‘basis’ for this natural transformation. In fact, any seven-note scale is isomorphic to PCC7. 10
A category is built on the subset of pc-set12 given by the seven pitch-classes 0,2,4,5,7,9,B: a power set evolves from [024579B]; to it all morphisms (operations) in pc-set theory are added. The functor is then a mapping from PCC7 based on the original pitch-classes 0,2,4,5,7,9,B to another PCC7 (based on[0123456]7) therefore a bijective function.
436
F. Gualda
Let F and G be functors from category C to category D; A and B are objects; and f, morphisms from A to B; and , natural transformations. Then the following diagram commutes:
Fig. 5. Natural transformation
A musical analogy: the objects of a category can be seen as all subsets11 of a scale; morphisms, as transpositions and inversions of those subsets; a scale can be seen as functor (from a category with the seven pitch-classes to another category, with twelve pitch-classes). Then, the change of scale would be a natural transformation.
5 Discussion and Future Work “What happens when we change the “12” in [the] twelve-tone system? How does this affect the Z-relation or the possible properties of entities?” (Morris, this volume). This paper addressed the first question since it provided methods for mapping the structure of any n-tone system onto another m-tone system. It did not, however, explore Z-relations or even the implications of changing the number of interval classes. A pc-set with seven pitch-classes presents only three interval classes: seconds/sevenths, thirds/sixths, and fourths/fifths. Another important direction for further work is the tonal implications of pitch-class categories. It would most certainly correlate the underlying structure (PCCn and functor to PCCm) with tonal weights for each Tn-set. Parncutt (this volume) discusses tonal implications of Tn-sets (with cardinality three) within the context of harmonic profiles. Assuming octave similarity, pitch-classes can be seen as quantization of the division of one octave. In this case, microtonal relationships can be described and analyzed by selecting a higher number of pitch-classes within one octave. Conversely, higher number of pitch-classes could also be used to represent chords and motives that comprise more than one octave. Since this work focused on general theory for pitch-class categories, it did not offer musical examples. Tonal and serial-music examples along with microtonal scales and ragas could be explored in future works. 11
Combination of pitch-classes of a scale.
Tonal, Atonal and Microtonal Pitch-Class Categories
437
6 Conclusion This paper presented an alternative approach to pitch-class set theory. It was demonstrated that pitch-class categories do not only contain all features from pc-set theory, but also generalize its operations to any number of pitch-classes. The theory offers new possibilities for musical analysis and composition. Additionally, a binary representation of the theory has been implemented, in which objects (pc-sets) and functors are both represented as bit-sequences; and morphisms, as bitwise operations. Computer software for studying aspects of the theory described in this paper has been developed and offered here for academic research. For new versions of the software and source code, please, feel free to contact the author.
References Babbitt, M.: Twelve-Tone Invariants as Compositional Determinants. Musical Quarterly 46(2), 246–259 (1960) Babbitt, M.: Set Structure and a Compositional Determinant. Journal of Music Theory 5(1), 72– 94 (1961) Barr, M., Wells, C.: Category Theory for Computer Science, 2nd edn. Prentice Hall, London (1995) Forte, A.: The Structure of Atonal Music. Yale University Press, Haven (1973) Huron, D.: Sweet Anticipation: Music and the Psychology of Expectation. MIT Press, Cambridge (2006) Kopp, D.: Chromatic Transformations in Nineteenth-Century Music. Cambridge University Press, Cambridge (2002) Lawvere, F.W., Schanuel, S.H.: Conceptual Mathematics. Cambridge University Press, Cambridge (1997) Mazzola, G.: The Topos of Music. Birkhäuser, Basel (2006) Guerino, M., Weissmann, J., Milmeister, G.: Comprehensive Mathematics for Computer Scientists 1, 2nd edn. Springer, Berlin (2006) Morris, R.: Set Groups, Complementation, and Mappings among Pitch-Class Sets. Journal of Music Theory 26(1), 101–144 (1982) Morris, R.: Mathematics and the twelve-tone system: past, present, and future. In: Klouche, T., Noll, T. (eds.) MCM 2007. CCIS, vol. 37, pp. 427–437. Springer, Heidelberg (2009) Parncutt, R.: Tonal implications of harmonic and melodic Tn-types. In: Klouche, T., Noll, T. (eds.) MCM 2007. CCIS, vol. 37, pp. 427–437. Springer, Heidelberg (2009) Rahn, J.: Basic Atonal Theory, pp. 31–39. Longman, New York (1980) Schoenberg, A.: Theory of Harmony, 3rd edn. (1978/1922). University of California Press, Los Angeles (1922); Translated by Roy E. Carter Berkeley Straus, J.N.: Introduction to Post-Tonal Theory. Prentice-Hall, Cliffs (1990) Walters, R.F.C.: Categories and Computer Science. Cambridge University Press, Cambridge (1991)
438
F. Gualda
Appendix C++ Source code for calculating pc-sets with arbitrary number of pitch-classes (up to 31 or 63) //---------------------------------------------------------------------//-- TPitchClassSet -------------------------------------PitchClass.h -//---------------------------------------------------------------------//-- Copyright (C) 2006 --- Fernando Gualda ---------------------------//---------------------------------------------------------------------class TPitchClassSet { public: TPitchClassSet(int = 12, bool = true); ~TPitchClassSet(){}; const inline
int SetSize; void Reset(void) {shiftindex = primeset = normalset = pitchset = 0;};
// Accessor Members inline long PrimeForm(void) inline long BestNormalOrder(void) inline long InvertedForm(void) inline int Transposition(void)
{return {return {return {return
primeset;}; normalset;}; invertset;}; LastShiftIndex;};
// Independent Members static long BitReverse(long, int = 0, int = 0); static long BitShift(long, int, int); // Operations long operator * (long pcs); long operator * (TPitchClassSet *); // 'Multiplication' stands for intersection bool operator == (TPitchClassSet *); bool operator != (TPitchClassSet *); inline bool Isomorphism(long pcs) {return Best(pcs) == normalset;}; inline long operator = (long pcs){Best(pcs); return pitchset;}; inline long operator() (long pcs){Best(pcs); return pitchset;}; inline long operator() (void) {return pitchset;}; inline inline inline inline inline inline
long long long long long long
operator++(void) operator--(void) operator<<(int s) operator>>(int s) operator<<=(int s) operator>>=(int s)
private: long pitchset; long normalset; long invertset; long primeset; bool invertible; int LastShiftIndex; long Best(long); };
// // // // // // //
{return {return {return {return {return {return
BitShift(pitchset,1,true);}; BitShift(pitchset,-1,true);}; BitShift(pitchset,s);}; BitShift(pitchset,-s);}; BitShift(pitchset,s,true);}; BitShift(pitchset,-s,true);};
a pcs (pitch class set) Best Normal Order bit-reversal of pitchset prime form Inversion Equivalence of pitchset Calculates the Best Normal Order
Tonal, Atonal and Microtonal Pitch-Class Categories
439
//---------------------------------------------------------------------//-- TPitchClassSet ------------------------------------- PitchClass.cpp //---------------------------------------------------------------------//-- Copyright (C) 2006 --- Fernando Gualda ---------------------------//---------------------------------------------------------------------#include “PitchClass.h” //--------------------------------------------------------------------// TPitchClassSet – Constructor //--------------------------------------------------------------------TPitchClassSet::TPitchClassSet(int s, bool inv) :pitchset(0), primeset(0), normalset(0), LastShiftIndex(0), invertible(inv), SetSize(s) {}; //--------------------------------------------------------------------// PrimeForm – Calculates the prime form of a pc-set //--------------------------------------------------------------------long TPitchClassSet::PrimeForm(long pcs) { if (pcs == 0) return pitchset = normalset = invertset = primeset = 0; pitchset = pcs; /////////////////////// // Best Normal Order // /////////////////////// normalset = Best(pcs); /////////////////////////////////////////////////// // Inversion (bit-reversal) of a Pitch Class Set // /////////////////////////////////////////////////// if (invertible) { // Inversion Equivalence (dihedral group) invertset = BitReverse(pcs, SetSize); /////////////// // Prime Set // /////////////// pcs = Best(invertset); primeset = (pcs<normalset)? invertset: normalset; return primeset; } invertset = primeset = 0; return normalset; } //--------------------------------------------------------------------// BitReverse - Reverses and shifts bits of a pc-set //--------------------------------------------------------------------long TPitchClassSet::BitReverse(long p, int size, int shift) { if (size <= 0 || size >= sizeof(long)) return 0; while (shift < 0) shift += size; long q = 0; for (int i = shift, k = size+shift; i < k; i++) { if ((1 << (i%size)) & p) q |= (1 << (((i+1)*(size-1))%size)); } return q; }
440
F. Gualda
//--------------------------------------------------------------------// Best - Calculates the best normal order //--------------------------------------------------------------------long TPitchClassSet::Best(long p) { unsigned long pcs = p &((1 << (SetSize))-1); unsigned long min = pcs; unsigned long lastbit = (1 << (SetSize-1); int key = SetSize-1; for (int i = 0; i < SetSize; i++) { // Best Normal Order if (pcs & 1) // bitwise test pcs = ((pcs >> 1 ) | lastbit); // cyclic permutation else pcs >>= 1; // (shift only) if (min > pcs) { min = pcs; key = i; } } LastShiftIndex = (key+1)%SetSize; return min; } //--------------------------------------------------------------------// BitShift - Shifts a pc-set //--------------------------------------------------------------------long TPitchClassSet::BitShift(long p, int size, int shift) { if (size <= 0 || size >= sizeof(long)) return 0; while (shift < 0) shift += size; if (shift == 0) return p; /////////////////////////////////////////////////// // Fast Circular Bit-Shift (up to 31 or 63 bits) // /////////////////////////////////////////////////// long one = ((1<<size)-1); long pcs = p & one; long lowmask = ((1<<(size-shift))-1); long highmask = one-lowmask; return (((pcs&lowmask)<<shift) | ((pcs&highmask)>>(size-shift))); }
Using Mathematica to Compose Music and Analyze Music with Information Theory Christopher W. Kulp1,* and Dirk Schlingmann2 1
2
Department of Physics and Astronomy Department of Mathematics and Statistics Eastern Kentucky University Richmond, KY USA 40475 [email protected], [email protected]
Abstract. In this paper we present two case studies for the application of the technical computing software Mathematica in the domain of music creation and music research. The first section describes an experimental interface for the usage of random points, parametric curves and other mathematical objects in the role of three-dimensional musical scores. Similar to the technology of the old-fashioned player piano roles which encode any arbitrary piece for mechanical player piano in three basic dimensions (onset, pitch, duration) we provide an interface where a 3-dimensional score is created, visualized and played. With this software the scores can be created with the assistance of a rich arsenal of mathematical functions and also the sound of each single note can be controlled in terms of mathematical functions. The aim of this software is the creation of experimental musical pieces which explore the musical potential of certain mathematical functions. In this paper we restrict ourselves to sketch the interface. The more interesting aspects, namely the ‘musicality‘ of concrete sonifications of certain mathematical objects, are subject to our live demonstrations in Berlin. In the second section we show how information theory may be used in the analysis of musical scores and how specialized packages of the software Mathematica may assist such investigations. As a particularly interesting topic we describe the calculation of the transfer entropy between selected instrumental parts in Beethoven symphonies.
1 Composition of Music Using Mathematica With the technical computing software Mathematica, one can produce high-quality sounds. The Mathematica command, Play[f, {t, 0, tmax}], plays a sound with amplitude f as a function of time t in seconds. For example, Play[Sin[440*2*π*t], {t, 0, 1}] plays a pure tone with a frequency of 440 Hz (A4) for 1 second. Please * Permanent address: Department of Physics, Lycoming College, Williamsport, PA USA 17701. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 441–448, 2009. © Springer-Verlag Berlin Heidelberg 2009
442
C.W. Kulp and D. Schlingmann
consult Wolfram (2003) for all the information on Mathematica’s sound capabilities. We will explore Mathematica’s high-level programming language, graphics capabilities, and numerical and symbolical calculation tools to generate potentially exciting music. We will present programs that create musical compositions based on mathematical objects. The possibilities are endless for creating interesting and intriguing music and sounds. In a simplified way, music in the traditional sense is a collection of finite pitch tones (frequencies) that are played at certain times for certain durations. A threedimensional point {x, y, z} could be interpreted as a tone that is played at time x (measured in seconds), with frequency y (measured in Hz), for a duration z (measured in seconds). In this fashion, a list of three-dimensional points represents a musical composition. For example, the list of two three-dimensional points, {{0.6, 440.0, 1.5},{1.1, 554.365, 0.3}}, consists of two tones where the first tone is played after 0.6 seconds, with frequency 440.0 Hz, for 1.5 seconds, and the second tone is played after 1.1 seconds, with frequency 554.365 Hz, for 0.3 seconds. The Mathematica command to play such a sound is: Play[Piecewise[{{Sin[440.0*2*π*t], (t >= 0.6) && (t <= 0.6+1.5)}}]+Piecewise[{{Sin[554.365*2*π*t], (t >= 1.1) && (t <= 1.1+0.3)}}], {t, 0, 2.1}], where Piecewise[{{val1, cond1},{val2, cond2}, …}] represents a piecewise function with values vali in the regions defined by the conditions condi Wolfram (2003). If none of the conditions condi apply, the default value is 0. Based on the above Mathematica command, we will introduce a program called createMusic, written by Schlingmann (see Kulp, Machado and Schlingmann 2007), that takes any finite list of three-dimensional points, like the one mentioned in the previous paragraph, as input and turns it into a musical composition as described above. Of course, we might need to scale the coordinates of the three-dimensional points to reasonable values so that times (when tones are played), frequencies, and durations are in acceptable ranges. For example, we do not want to include negative durations or frequencies outside the range of human hearing. In the process of using the program createMusic, a three-dimensional graph of the scaled three-dimensional points will also be displayed (see Figure 1). This graph can serve as a visualization of the musical composition. We apply createMusic to data that originates from random points, parametric curves, sequences, or a variety of other mathematical objects.
Fig. 1. An example of a graph created by the program createMusic
Using Mathematica to Compose Music and Analyze Music with Information Theory
443
Instead of imitating and enhancing traditional music, Mathematica is also capable of playing continuously changing sounds that stem from mathematical functions. For example, Play[Sin[f[t]*2*π*t]+Sin[g[t]*2*π*t], {t, start, end}] plays continuously changing frequencies that are controlled by the functions f and g. Of course, the functions f and g might need scaling again to guarantee reasonable frequencies. We then explore these possibilities with a variety of mathematical functions. Besides using Mathematica to create music, we can also use it to develop time series analysis algorithms which will allow us to analyze musical compositions.
2 Nonlinear Time Series Analysis of Musical Compositions In this section, we will use time series analysis to study the relationships between string instruments in several of Beethoven’s symphonies. Classical music compositions contain a lot of very interesting structure which can be adequately described and analyzed using mathematics. One way of analyzing music using mathematics is to create a time series from the music. A time series is a list of numbers which represent the measurement of a physical quantity. The measurements are usually (but not always) taken at a regular time interval, Δt. For example, one can measure the temperature of a room every five minutes. In general we write a time series in the following format:
S = {s1
s2
s 3 s N },
where the element, si , represents the measurement of the physical quantity, S, taken at time (i – 1) Δt. There exists a large literature on extracting information about a system by performing calculations on time series measurements taken from the system (Kantz and Schreiber 2004; Abarbanel 1996; and references therein). By applying time series analysis methods to musical compositions, we hope to be able to learn more about specific compositions as well as demonstrate quantitatively the differences and similarities between compositions and composers. One of the longterm goals of this project is to be able to use time series analysis to classify compositions by composer without having to hear the piece. The application of time series calculations on musical compositions is not new. In Monroe and Pressing (1998) and Reiss and Sandler (2003), time series were generated from a digital recording of various compositions. In particular, the authors of Monroe and Pressing (1998) and Reiss and Sandler (2003) focused on phase space reconstruction and the estimation of Lyapunov exponents. Estimation of the Lyapunov exponent is typically done to determine whether or not a system is chaotic. In this paper, we will not use recordings of Beethoven’s symphonies to generate our time series; instead we will use the sheet music of the symphony. Furthermore, we will focus on estimating the rate of information transfer between instruments as a means of illustrating the relationships between string instruments in Beethoven’s symphonies. 2.1 Creating Time Series from Sheet Music We begin by demonstrating how to create a time series from sheet music. Consider the measure shown in Figure 2.
444
C.W. Kulp and D. Schlingmann
Fig. 2. A sample measure which was used to create the time series that appears below
The measure in Figure 2 contains a one-eighth A4 note, a one-eighth C5 note, and two quarter-rests. Each note in Figure 2 represents a sound with a specific frequency. For example, the A4 note corresponds to a sound with a frequency of 440 Hz while the C5 note has a frequency of 523.35 Hz in twelve tone equal temperament. A rest would correspond to a sound of zero amplitude and zero frequency. The elements of the time series will be the frequencies of each note sampled at one sixteenth-note time intervals. Hence, an eighth-note would be represented by two adjacent elements of the time series. Following this scheme, the time series represented by the measure in Figure 2 is:
{440
440 523.35 523.35 0 0 0 0 0 0 0 0} .
Please consult Kulp, Machado and Schlingmann (2007) for a Mathematica algorithm which aides in the process of creating time series in the above fashion. The time series in Monroe and Pressing (1998) and Reiss and Sandler (2003) were generated from digital recordings of the music. This can lead to multiple time series representations for the same composition. In our work we need a unique time series representation for each piece. However, the above process for creating time series is a slow and tedious one. In the near future, we hope to automate the above process by using a Mathematica program to read a MIDI recording of the composition and produce a time series. The advantage of the MIDI format, of course, is that the recording contains a lot of symbolically preprocessed information about the piece (note played, duration, velocity, etc…) and will lead to a unique time series representation of the piece. Once we have generated a time series from sheet music, we then convert it into a binary series. Sometimes this is referred to as symbolizing the series (Daw, Finney and Tracy 2003). Using a binary representation of a series is a powerful tool which can be used to speed-up time series computations. When done carefully, the symbolization retains much of the important temporal information. The symbolization is performed by choosing a threshold, a number calculated from the time series. If an element of the time series is greater than or equal to the threshold, then it is reassigned the value 1. If an element of the time series is less than the threshold then that element is reassigned the value of 0. For the work presented in this paper, we have chosen our threshold so that it maximizes the binary series’ Shannon Entropy (Shannon 1948). This allows us to retain as much of the information from the original system as possible. We used the process described above to generate a binary time series for each string instrument in several of Beethoven’s symphonies. Next, we will show how to use a set of time series to demonstrate relationships between physical systems. This will allow us to quantify the relationship between string instruments in each symphony.
Using Mathematica to Compose Music and Analyze Music with Information Theory
445
2.2 Transfer Entropy and the Relationship between Physical Systems We begin this section with a non-musical example. Suppose one were to measure the amount of rainfall in the Gulf of Mexico and the average price of gasoline in the United States each week for a period of ten years. By doing this, the observer has produced two time series, one for rainfall and one for the price of gasoline. A natural question to ask is, based on our data, does the amount of rainfall in the Gulf of Mexico influence the price of gasoline in the United States? Time series analysis provides a means of answering this question through the transfer entropy calculation (Schreiber 2000). The transfer entropy measures the amount of information exchanged between two systems based on their time series measurements. Consider two time series, X and Y, each measured from a different (but possibly related) system. The transfer entropy between the two systems can be computed using the equation below.
§ p ( x n +1 | x n , y n ) · ¸¸ T (Y → X ) = ¦ p ( x n +1 , x n , y n ) Log 2 ¨¨ © p ( x n +1 | x n ) ¹ § p ( y n +1 | x n , y n ) · ¸¸ T ( X → Y ) = ¦ p ( y n +1 , x n , y n ) Log 2 ¨¨ © p ( y n +1 | y n ) ¹ The summation is over all possible elements in the time series. The base-two logarithm gives a unit of bits for the transfer entropy. Hence, if T(X Æ Y) = 0.01 and T(Y Æ X) = 0.1, then we say that Y more strongly influences X than vice-versa. The function, p, is the probability that an element or set of elements occurs in the time series. For example, the function, p(xn+1 = 1, xn = 1, yn = 0), would give the probability of a 1 occurring in X one time-step after a 1 is measured in X and a 0 is measured in Y. For a Mathematica algorithm which computes the transfer entropy, the interested reader should consult Kulp, Machado and Schlingmann (2007). 2.3 The Application of the Transfer Entropy to a Symphony When applied to string instruments in a symphony, the transfer entropy gives the amount of information exchanged between instruments in the symphony. In other words, the transfer entropy gives a way of quantifying relationships between instruments in a symphony. As mentioned previously, in this paper we focus on the string section of some of Beethoven’s symphonies. For example, Figure 3 shows the result of the transfer entropy analysis of the first 30 measures of the first movement of Beethoven’s First Symphony. We see that for any string instrument, X, T(V1 Æ X) > T(X Æ V1). This tells us that the amount of information exchanged from the first violin to any other string instrument is larger than the amount of information exchanged in the opposite direction. Hence, we can say that the first violin influences the other string instrument during these 30 measures. Note that T(B Æ C) = T(C Æ B) = 0, this is due to the fact that parts for the bass and the cello are the same for the first 30 measures of the First Symphony. Extending this analysis further, one could then generate the necessary time series for a transfer entropy analysis on a later movement (or even a later time in the first movement) and see how these relationships change. Note that it is not important to restrict the analysis to 30 measure blocks.
446
C.W. Kulp and D. Schlingmann
Fig. 3. The transfer entropy analysis of the string section of Beethoven’s First Symphony. Note the change in notation, T(A,B) = T(A Æ B). Further note that: V1 = first violin, V2 = second violin, V = viola, C = cello, and B = bass.
Fig. 4. The transfer entropy analysis of the string section of Beethoven’s Fifth Symphony. Note the change in notation, T(A,B) = T(A Æ B). Further note that: V1 = first violin, V2 = second violin, V = viola, C = cello, and B = bass.
Using Mathematica to Compose Music and Analyze Music with Information Theory
447
Besides looking at the relationships between instruments in one symphony, we can also compare results from several symphonies. Figure 4 shows the results from the transfer entropy analysis of the first 30 measures of Beethoven’s Fifth Symphony. Notice that the relationship between the first violin and the other string instruments has changed. In the Fifth Symphony, the most influential instrument appears to be the bass (with the exception of when the bass is paired with the second violin). Figure 5 shows the results from the transfer entropy analysis for the first 30 measures of Beethoven’s Sixth Symphony. Here we see that the viola tends to be the most influential string instrument (with the exception of when the viola is paired with the second violin). The point here is that we are seeing different relationships between the string instruments in different symphonies. The relationship between instruments in a piece of music sometimes is a hallmark of the style of the composer. Using the transfer entropy, we can quantify the relationship between instruments in a piece of music. By sampling different parts of the same symphony, we can quantitatively study how the relationships between instruments change throughout the duration of the symphony. Furthermore, if we see that a particular instrument, such as the bass, is influential in one symphony but not in a later symphony; it may suggest a change in style of composition by the composer. In this case, the transfer entropy analysis can serve as one method of quantifying the change in the style of the composer from one symphony to another. By sampling other pieces written by the composer during different stages of his or her life, we may be able to quantify how the composer’s style changed during his or her life. However, to do such an analysis with Beethoven, we’d would need to do a complete transfer entropy analysis of each of his symphonies. Completing such an analysis is a goal of
Fig. 5. The transfer entropy analysis of the string section of Beethoven’s Sixth Symphony. Note the change in notation, T(A,B) = T(A Æ B). Further note that: V1 = first violin, V2 = second violin, V = viola, C = cello, and B = bass.
448
C.W. Kulp and D. Schlingmann
this project. It is also important to mention that any attempt to quantify a composer’s style using time series will most likely include several other analyses. However, we believe that the transfer entropy will be an important piece of the overall all quantitative analysis of a composer’s style of composition.
3 Conclusions Mathematica can be used to generate interesting music based on data sets and mathematical curves as shown above. Time series analysis provides a powerful tool with which we can learn more about musical compositions. By using the transfer entropy we can identify relationships between instruments in a symphony. By studying how these relationships change from one symphony to the next, we may be able to quantify how the style of a composer changes during his or her lifetime. With more extensive studies, it may even be possible to identify the composer of a piece of music of which little is known.
References Abarbanel, H.D.I.: Analysis of Observed Chaotic Data. Springer, New York (1996) Daw, C.S., Finney, C., Tracy, E.: A review of symbolic analysis of experimental data. Review of Scientific Instruments 72, 915–930 (2003) Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, 2nd edn. Cambridge, London (2004) Kulp, C., Machado, M., Schlingmann, D.: Composition and Analysis of Music Using Mathematica. Mathematica in Education and Research 12, 1–20 (2007) Monro, G., Pressing, J.: Sound visualization using embedding: The art and science of auditory correlation. Computer Music Journal 22, 20–34 (1998) Reiss, J.D., Sandler, M.B.: Nonlinear Time Series Analysis of Musical Signals. In: Proc. on the 6th Intl. Conference on Digital Audio Effects (DAFX 2003), London, United Kingdom, September 8-11 (2003) Schreiber, T.: Measuring Information Transfer. Physical Review Letters 85, 461–464 (2000) Shannon, C.E.: A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379–423 (1948) Wolfram, S.: The Mathematica Book, 5th edn. Wolfram Media, Inc., Champaign (2003)
A Diatonic Chord with Unusual Voice-Leading Capabilities Norman Carey City University New York [email protected]
Diatonic set theory, as established by John Clough and others (see Clough 1979, 1980, Carey and Clampitt 1989, Carey 1998), applies the tools of standard set theory of 12tone ET to the heptatonic set of seven tones of the diatonic scale. The two universes differ from each other in a number of ways other than simple cardinality. Although 12 is nearly twice as big as 7, the fact that 7 is prime and 12 composite contributes to a number of subtle differences between them. Every positive integer less than 7 is a unit mod 7, thus every diatonic interval generates the entire set. In the set of 12 tones, there are only four units, 1, 5, 7, and 11. Further, because of tuning, the geographies, if you will, of the sets also differ. In the equal-tempered 12-tone landscape, every place looks like everyplace else. In the diatonic scale, each generic span is inhabited by several different specific intervals. Because of this, the terrain is everywhere distinct, contributing to the phenomenon of gravitational asymmetry and of tonality. In this paper, I would like to call attention to a trichord in the ordinary diatonic that is endowed with unusual voice-leading capabilities. Recently, neo-Riemannian theory has uncovered voice-leading properties of the common chord, by positioning the triad on the cusp between standard set theory and diatonic set theory. This fluidity of approach gives the theory considerable explanatory power, as it accords with the repertoire for which it has proven particularly successful, namely, highly chromatic music that is on or near the disputed border between tonal and atonal repertories. It has led, at the same time, to some misunderstanding. Thus, in considering “parsimonious voice leading,” it has struck some as arbitrary that the L and P operations move by ic 1, but that the R transform involves ic 2. This discrepancy is a problem when viewed solely through the lens of standard set theory, where ic 1 and ic 2 are distinct objects. In diatonic set theory, these objects belong to the same class, namely the class of diatonic steps. Indeed, one of the most profound insights of diatonic theory is its recognition of the generic interval. Blurring distinctions is one of the most important things the theory does. Specific information becomes encoded within the generic. Here is an idea how powerful this may be. Select any diatonic scale. Choose any note from the scale. Go up a third higher in that scale from your starting note. Now go up another third from there. Certainly you’re a fifth higher than where you started, but most likely you’re a perfect fifth higher than where you started. Thus, if we know how to use it, the code allows me to extract specific intervallic information – the perfect fifth – from the generic information we put in – the two non-specific thirds. (Unless you began on the seventh degree!) It will be within the realm of diatonic set theory that this paper unfolds, however it will be useful to work into the main topic with a fuller examination of the ordinary triad. I would like to separate out the information conveyed by both set theories, standard and diatonic, regarding the common chord. It is easy to gloss over the fact that T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 449–463, 2009. © Springer-Verlag Berlin Heidelberg 2009
450
N. Carey
both theories assign the major triad and the minor triad to a single one of their classes. This is almost, you might say, a lucky accident. Example 1 should help to sort out the issues.
Example 1. In standard set theory, major triads and minor triads belong to the same set class (Forte’s 3-11) because major and minor triads may be converted into one another through inversion. (See the top part of the Example.) In defining diatonic set classes, John Clough recognizes transpositional equivalence only. Thus, they belong to the same diatonic set class not because of their inversional relationship, but because diatonic transposition converts these triad types into one another. (See the very bottom of
A Diatonic Chord with Unusual Voice-Leading Capabilities
451
the Example.) As we diatonically transpose a triad by step, we visit three major triads, three minor triads, and one diminished triad, in some order. The question marks below the triads in the example indicate the fact that, purely within the context of diatonic theory, we do not know which types these triads are without assigning actual pitch class names. The situation here is analogous to one in which we try to find our way around the piano keyboard with the black keys covered up. Both of these set theories, then, place major and minor triads into a single class but for different reasons. As we just saw, in the case of atonal set classes, major and minor triads belong to the same class because they are inversions of each other. This scheme emphasizes a Riemannian or dualist perspective on the triads. Major and minor triads belong to the same class because each is the reflection of the other. In Clough’s diatonic set classes, major and minor belong to the same class because of rotation. This scheme emphasizes a Stufen theory view of the triads. The difference between the two schemes is apparent with respect to their categorization of the diminished triad. In atonal set theory, although major and minor triads are members of 3-11, the diminished triad is not. In the 12-tone universe, neither transposition nor inversion will transform a major or minor triad into a diminished triad, which is associated with the Forte’s set class 3-10. In the diatonic, on the other hand, the major and minor triads are related to diminished triads by both transposition and inversion, and all three belong to Clough’s diatonic set, 223. The top part of Example 1 makes it clear that the inversion in standard set theory of some 037 is always some 047, however, in diatonic theory, the inversion of some triad is, simply, some triad. In standard set theory, major and minor triads are only related by inversion. In diatonic set theory, they are related by both transposition and inversion, but only transposition matters in the definition of the diatonic set classes. At least part of the reason for this choice, on Clough’s part, is that the great majority of diatonic sets exhibit (diatonic) inversional symmetry. Clough’s decision not to employ inversional equivalence
Example 2.
452
N. Carey
in defining diatonic sets has significance for only a few asymmetric diatonic sets, of which there are four; two trichords and their tetrachordal complements. These “nonretrogradable” sets – particularly the trichords – are worthy of new consideration for their unique properties. As we see in Example 2, the triad, diatonic class 223, is one of five diatonic trichords. Three of them are, in a sense, generated. Diatonic thirds, for example, generate the ordinary triad. Clough’s descriptor for the triad is 223, where each of the 2’s represents a diatonic third, and the 3 represents a diatonic fourth. The set CDE instantiates another generated diatonic trichord, labeled 115. The third type of generated trichord type is 133, an example of which is provided by the chord CDG. Each of these set classes includes the seven transpositions of its sample set. The three generated trichords have similar diatonic interval vectors. This is a threeplace vector, counting the number of steps, thirds, and fifths, that is, it counts the number of unordered diatonic interval classes. Because seven is not a multiple of 3, there are no diatonic trichords with just one interval type: all seconds, or all thirds, or all fifths. Each of the generated set classes has two intervals of one type – the one we are considering as its generator, one of another type, and none of the third. Thus, the 115, or CDE, type has two seconds, one third, and no fifths. The usual triad, CEG, has two thirds, one fifth, and no seconds. The third type, such as CDG, has two fifths, one second, and no thirds. Inversional symmetry in these sets is guaranteed by the presence of generators. Two out of three intervals are alike, which ensures that the transpositions of each of the generated trichords includes its own inversions. (Each may be abstracted as abb. Inverting yields bba, which is a rotation of abb.) In short, the issue of whether or not to include inversional equivalence in the definition of diatonic set classes is moot with respect to these three types.
Example 3. The two other types of diatonic trichords are my main focus today. They exhibit fewer symmetries than the generated trichords because these sets are, in fact, allinterval diatonic trichords. That is, each contains exactly one step, one third, and one fifth. The chord CDF is a member of one such set class, namely 124. Note that it contains exactly one step, CD, one third, DF, and one fifth, FC. If we invert this set
A Diatonic Chord with Unusual Voice-Leading Capabilities
453
around the C-F axis it becomes CEF. Thus, inversions of the 124 set are NOT found among its transpositions. Because inversion is not invoked in the definition of diatonic set classes, the set class that includes CEF belongs to the fifth trichord type; 142, as Clough calls it. The seven transpositions of trichords 124 and 142 are shown in Example 3. Although these two set classes, 124 and 142, are not generated in the additive sense of the three generated trichords, 115, 223 and 133 (the triad and its two cousins) they can be recursively derived through multiplication1. Example 4 shows how this works.
Example 4. If we begin with any non-zero element modulo 7 and double it, a cycle is generated that repeats after three elements. Beginning with 1 we multiply by 2 to get 2. Multiply by 2 again to get 4. Multiply by 2 once more, and get 8, which again equals 1, modulo 7. The construction yields the set 1, 2, and 4, one of our trichords. That is, powers of 2 modulo 7 belong to the set (1,2,4). Musically, we can put it this way (shown in Example 5a):
Example 5a. Begin with the interval of a second. Double it by adding on another second. This forms a third. Doubling the third yields a fifth, and doubling the fifth, yields a ninth, which, reduced modulo the octave is once again a step, returning us to our starting point. This three-element multiplicative cycle is highly significant with respect to the structure of diatonic sequences.
1
The 14 elements of the two set classes 124 and 142 are orbits of a single pitch class under an affine transformation f(x) = 2x + k mod 7. Mazzola (1990, 121) calls such orbits circle chords.
454
N. Carey
These same numbers, 1, 2, and 4, form the set of quadratic residues modulo 7 and the set containing 3, 5, and 6 are the quadratic non-residues. Any positive integer squared is congruent to 1, 2, or 4 modulo 7 as we see in Example 4b. Example 5b provides a musical realization of this information:
Example 5b. One step is – a step (1 x ic1 = ic1). Two thirds is a fifth (2 x ic2 = ic4). Three fourths is a third (3 x ic3 = ic2). Four fifths is also a third. Five sixths makes a fifth, and six sevenths make a step. While fascinating, I don’t know where or if this data finds its place in diatonic practice. Each set of seven trichords (the 124s and the 142s) constitutes an example of a Steiner triple system. A Steiner triple system is a simple type of block design. 2 In a Steiner triple system, a set of N elements is divided into a number of three-element subsets such that each pair of elements belongs to one subset. When N is 7, Steiner triple systems require seven three-element subsets. In fact, the transpositions of CDF (or CEF) constitute a Steiner triple system, as a glance back at Example 3 will verify. Now it is the case that each interval appears in exactly one chord, satisfying the definition of a Steiner triple system. This latter point is interesting, and shows how the 124 and 142 chords are different in their voice-leading potential compared to the triad. If you choose an arbitrary interval in the C Major scale, it will appear in two different triads if it is a third, in exactly one triad if you chose a fifth, and in no triad if you chose a second. In the 124 chord, the situation is much more democratic. As a consequence of its status of all-interval trichord, each of the 21 specific intervals of the scale appears in exactly one of the 124 chords. Steiner triple systems are not uncommon, but require a set whose cardinality is at least 3, and is equivalent to 1 or 3 modulo 6. Thus, scales that support Steiner triples are those with cardinalities 3, 7, 9, 13, 15, 19, 21, and so on. Note that 12 does not support a Steiner triple system. The Steiner triple in the case of 7 is of particular interest to mathematicians in that it is also the finite projective plane of order 2. A projective plane of order n is a figure with n2 + n + 1 points containing n + 1 points on each line with particular properties. The projective plan of order 2 may be represented by the figure known as the Fano plane. The particular projective plane of order 2, contains 7 points, because 22 + 2 + 1 = 7. We find the following conditions on the Fano plane:
2
For more musical block designs see Tom Johnson’s article Networks in this volume.
A Diatonic Chord with Unusual Voice-Leading Capabilities
455
1. Any two points determine a line, 2. Any two lines determine a point. 3. Every point has 3 lines on it. 4. Every line contains 3 points.
Example 6a. The points in the Fano plane of Example 6a are labeled with pitch class names in such a way as to demonstrate the diatonic set class 124 as a Steiner triple system. Example 6b does the same thing for the 142 chords. Each figure contains seven lines and seven points. The seven lines consist of the three making up the outer equilateral triangle, the three lines that connect vertices with their opposite sides, and the inner, smaller equilateral triangle. Within the context of the projective plane, this smaller triangle counts as a single line. (Sometimes it is represented as a circle, rather than as a triangle.) What would the contrapuntal possibilities be for music in which the referential sonority were either of these set classes, or both used in conjunction? In three-voice counterpoint, no restrictions apply to the content of any pair of voices. (Any two points determine a line.) Restricting to a single set class, any pair of notes completely
456
N. Carey
determines the third voice. If your counterpoint allows both 124 and 142 types, then any given pair of notes allows a choice of two notes for the third voice, one of which completes the unique 124 chord for that pair, the other yielding 142.
Example 6b. Consider 124 chords with regard to common tones: each 124 chord shares exactly one common tone with each of the others. (Any two lines determine a point.) Example 7a shows the common tones between CDF and each of the other 124 transpositions.
Example 7. The consistency of common tones does not imply the presence of parsimonious voice leading among the moving, non common tones. Although a set of minimally perturbed motions can be established for each of the connections, I simply prefer to celebrate the demise of voice-leading parsimony and the rise of extravagance. Happy times are here again. Go ahead, little tones, go and leap. Common tones between 124 and 142 types behave in a more complex, but highly regular manner. Example 7b
A Diatonic Chord with Unusual Voice-Leading Capabilities
457
shows the common tones between CDF and the transpositions of the 142 set class. Any given 124 chord shares 2 common tones with three of the 142s, 1 common tone with three others, and has no common tones with one. In Example 8, the 124 chords are labeled as type A, the 142s as B. CF is arbitrarily chosen as the fourth to serve as the starting point in the transpositions of each chord.
Example 8. The example also shows more detail regarding common tones between the A and B types. The multiplicities of the specific types of 124 and 142 chords also follow the pattern 1, 2, and 4. Consider the A types. The chord A4 (FGB) is the only 124 to contain a tritone. Let us call this specific type ‘z.’ The A3 and A7 chords are the only two to contain a minor second. This is specific type ‘y.’ The other four, A1, A2, A5, and A6, exemplify type ‘x.’ There is four xs, two ys, and one z. The same parsing can be done for the 142 chords, but this is not shown in the example. The types x’ y’ and z’ would
458
N. Carey
be found in reverse order on the 142 transpositions. For the 124 chords, the x type, which contains a major second and a perfect fourth, is the most common, and is also the least dissonant from the point of view of traditional consonance and dissonance values. (I am assuming the minor second to be more dissonant than the major second, the augmented fourth more than the perfect fourth, and no significant difference in thirds.) The y type is rarer, occurring twice, and contains the more dissonant minor second, while the unique z type contains the even more dissonant tritone. The correspondence between rarity and dissonance is, in a sense, more clearly articulated here than it is for triads. While it is true that the rarest triad, the diminished triad BDF, is the only one to contain a tritone, the major and minor triads each occur three times, and so are not distinct with respect to multiplicity. This is one of the factors that has allowed for the major and minor triads to be treated as functionally equivalent in diatonic settings, however, the three-way distinction evident in the A and B chords presents novel compositional opportunities. I composed the song “Amarantha” in order to explore the possibility of employing these chords as the referential sonorities in a diatonic setting, replacing the common triad. “Amarantha” contains chords of other types as well as the 124 and 142 types, however these two provide the predominant sonority. The 124 and 142 chords are also very effective in melodic contexts, and it is possible to combine the melodic and harmonic behaviors of these sonorities as we do with ordinary triads. Each vertical appearance of one of our A or B sonorities is marked below each system of the score. The annotated score of “Amarantha” is found in the Appendix. The analysis reveals some important features of the song. There is a 3-note motto found throughout the piece which places an A or B chord, starting on a strong beat, in the rhythmic pattern, long-short-long, where the first two of these are half note quarter note. The contour is always a rise and a fall, which can manifest as either low-high-middle or middle-highlow. The first instance of the motto is found in the tenor in measure 2. At the beginning of the measure, the voices form a vertical A6, while the tenor extends the sonority into the melodic realm. The motto next appears as an A2 in the soprano in the beginning of measures 5 and 6. The A3 chord is featured in measure 8. It sounds at the initiation of both strong beats, and appears as the first note of each syllable in the lower voices. The soprano carries the motto with the B3 chord, sharing pitch classes A and E with the A3 chord. A bit of word painting underlies the setting in measure 18. Beginning with the last eighth of the previous measure, the vertical sonorities interweave the A and B chords in sequence: A4, B6, A5, B7, A6, B1. At the same time, interlocking A and B chords are found in the soprano and alto, together with the pitch contour of the motto. In the soprano, the eighth note (end of m. 17) G initiates an A5 (G C A); the C on the downbeat begins a B6 (C A D), the A an A6, the D a B7. The alto line, F D G E A F imbricates B2, A2, B3, A3. The bass forms voice exchanges with pairs of soprano notes. This pseudo canon, together with the vertical sonorities, serves to highlight the text, “But neatly tangled at the best.” Another example of sequential treatment is found in measures 23-24, beginning on the third beat of measure 23. The rhythmic aspect of these sets are not greatly exploited here, although one may find rhythmic instances of the 124 (A) set quite frequently, in the form eighth, quarter, half. The first such instance begins on the eighth note C in the fifth measure of the soprano line.
A Diatonic Chord with Unusual Voice-Leading Capabilities
459
Until fairly recently, scale theory was focused on acoustical and psycho-acoustical aspects of musical systems. This has produced a long line of studies from Pythagoras through the dualists that aim to consider musical scales from the perspective of the purity – or lack thereof – of their foundational objects, such as the octaves, fifths, and thirds of the overtone series. Scale theories were concerned with reconciling the irreconcilable postulates of these systems. Recent scale theory has turned to the purely abstract properties of scales: To the patterns that the steps intervals make; to the cyclic properties inherent in prime versus composite cardinality; to the even distribution of its elements in a larger pitch-class universe. This paper might help to show how the turn to the abstract reveals ways in which a very old scale might still hold surprising and new compositional possibilities.
References Carey, N., Clampitt, D.: Aspects of well-formed scales. Music Theory Spectrum 11(2), 62–87 (1989) Carey, N.: Distribution modulo 1 and musical scales. Ph.D Dissertation. University of Rochester (1998) Clough, J.: Aspects of diatonic sets. Journal of Music Theory 23, 45–61 (1979) Clough, J.: Diatonic interval sets and transformation. Perspectives of New Music 18(1-2), 461– 482 (1980) Mazzola, G.: Die Geometrie der Töne. Birkhäuser, Basel (1990)
460
N. Carey
Appendix
A Diatonic Chord with Unusual Voice-Leading Capabilities
461
462
N. Carey
A Diatonic Chord with Unusual Voice-Leading Capabilities
463
Mathematical and Musical Properties of Pairwise Well-Formed Scales David Clampitt Ohio State University [email protected]
The short paper below presents the definition of the pairwise well-formed scale concept, and a few of the significant mathematical and musical features entailed by that definition. The verifications that are easily available are supplied here; for the more difficult proofs which are here omitted, the reader is directed to my dissertation (Clampitt 1997). While the definition itself is quite abstract, the body of implications and equivalences that constitute the theory include several musically attractive properties. With the significant exception of one structural subcategory, all other pairwise well-formed scales participate in modulating cycles that generalize the maximally smooth cycles defined in Cohn 1996 and intersect with the Cohn functions defined in Lewin 1996. I take this opportunity to celebrate the recent discovery of a strong relationship between musical scale theory and a well-developed branch of mathematics, algebraic combinatorics on words. Some translation between these domains will be done here; the paper by Manuel Domínguez (2008) (together with Thomas Noll and myself) provides more details. In particular, I will place the discussion of pairwise wellformed scales in a word-theory context. As far as I can tell, this study has not been pursued by the word combinatorialists, but the study is perhaps sufficiently interesting and mathematics is so vast as to warrant caution in any such assertion. Such a statement cannot be made about the first-order concept of well-formed scales, upon which the notion to be considered is built. The theory of well-formed scales may be mapped into the theory of Christoffel words and Sturmian words. The question is open as to whether parts of the existing mathematical theory may be mapped back to music theory to create new music-theoretical meaning. There are strong suggestions that this may be the case. The objects being considered here have interpretations in the domain of musical scales, and might also be interpreted as rhythmic patterns. Pairwise well-formed scales abound in world music, including the Japanese In-scale or hemitonic pentatonic, so-called gypsy or Hungarian minor, the octatonic-minus-one, and, in the microtonal world, the diatonic scale under just intonation. For musical purposes, one may wish to preserve this level of concreteness. For example, two distinct concrete musical instantiations of the same word are: Just scale: C 9/8 D 10/9 E 16/15 F 9/8 G 10/9 A 9/8 B 16/15 (C) ¢a b c a b a c ² ma-grama: 4 sa 3 ri 2 ga 4 ma 3 pa 4 dha 2 ni ¢a b c a b a c ² T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 464–468, 2009. © Springer-Verlag Berlin Heidelberg 2009
Mathematical and Musical Properties of Pairwise Well-Formed Scales
465
The intervals for the just scale are given as frequency ratios, and for ma-grama as a number of srutis, understood as the band of sound below the given scale step, which are identified for our purposes as units of an underlying division of the octave into 22 parts. (See the discussion of early Indian heptatonic scales in Clough, Douthett, Ramanathan, and Rowell 1993.) The mathematical properties of pairwise well-formed scales emerge most clearly, however, at the level of abstraction which is the domain of combinatorics on words. By words are meant, for now, strings of symbols, of finite length, over a finite alphabet A. Subwords are called factors, and words are combined via concatenation, so if z, q, and r are words, with z = qr, then q and r are factors of z. We define a set C of circular words. Let C = { w | w : ZN→A, where N is the length of w}. (It will turn out that A={a,b,c} for our purposes.) The dihedral group DN acts on C by permuting the arguments of w cyclically/by retrogression: Rotm(w)(j) : = w(j+m), and Retrom(w)(j) : = w(m−j), for m from 0 to N−1. The word classes of C will be the orbits under this action. In word theory (where words are not assumed to be circular), two words x and y are said to be conjugate if there exist words u and v such that x = uv and y = vu. It should be clear that x and y are conjugate if and only if they are cyclic shifts of each other, and the conjugacy class of a word w is the orbit of w under the rotations Rotm. (See Lothaire 2002 for further discussion of these definitions.)
1 Pairwise Well-Formed and Well-Formed Scales The pairwise well-formed scale concept depends on that of the well-formed scale. At the concrete level, a scale is well-formed if it is generated by an interval of constant size and span, that is, if all the notes of the scale may be linked together in a chain where the links are intervals of the same size that span the same number of scale step intervals (see Domínguez, Clampitt, and Noll 2008 for further information). Here is a definition of well-formedness in terms of multiples of a real number modulo 1, in other words, as fractional parts of multiples of a real number. For θ real and an integer N>1, let S = {nθ–¬nθ¼⏐0
466
D. Clampitt
cardinality N, and u is the multiplicity of one of its step intervals, then N−u is the multiplicity of the other step interval, and the spans of the generating intervals are u−1mod N and −u−1mod N. For example, in the usual diatonic, the multiplicity of the semitone is 2, the span of the generating perfect fifth is 4 ≡ 2−1mod 7; the multiplicity of the whole step is 5, the span of the generating perfect fourth is 3 ≡ 5−1mod 7. According to this theorem, the structure of a non-degenerate well-formed scale is determined by a matrix of integers, u, u−1mod N, N−u, and −u−1mod N. As these four numbers are themselves already determined by N and u, we may compactly represent a well-formed scale class as wfs(N,u). By convention we choose u to be the multiplicity of the less frequent step interval in a scale of cardinality N. (There must be smaller and larger multiplicities, because gcf(N, u)=1). Each wfs(N,u) corresponds to a circular word class on two letters. For example, wfs(7,2) is represented by , while corresponds to wfs(12, 5). Let w be a circular word on some alphabet A with k letters. Define the pairwise projections of w to be the words that result from identifying pairs of letters in A. w is pairwise well-formed if all the pairwise projections of w correspond to (nondegenerate) well-formed scale classes (following the definition in Clampitt 1997). If w is a word on k letters, its pairwise projections are words on k−1 letters. Since all words corresponding to non-degenerate well-formed scale classes must be on 2 letters, if w is pairwise well-formed, it must necessarily be a word on 3 letters. For example the circular word (exemplified by the Japanese In-scale or hirajoshi, E F A B C (E) ) has as its three pairwise projections (a = c), (a = b), and (b = c). Each is well-formed, as the reader may verify, so is pairwise well-formed.
2 Some Properties of Pairwise Well-Formed Scales Let ga, gb, and gc be the multiplicities of the letters a, b, and c (i.e., the step interval multiplicities) in a pairwise well-formed scale of cardinality N. Then gcd(ga, N) = 1, gcd(gb, N) = 1, and gcd(gc, N) = 1. This follows immediately from the requirement that each must serve in turn as a step interval multiplicity in a well-formed scale; recall from above that such multiplicities are necessarily units mod N. Pairwise well-formed scales are of odd cardinality. This is an immediate consequence of the previous result. Since all three multiplicities must be coprime with N, if N were even, ga, gb, gc would all be odd. But the sum of 3 odd integers is odd, so N would turn out to be odd. Pairwise well-formed scales exhibit trivalence: In a pairwise well-formed scale of cardinality N, intervals of a given non-zero generic span mod N come in 3 specific sizes (Clampitt 1997). (Trivalence does not characterize pairwise well-formedness: e.g., C D-flat E G A-flat (C), <13314>, is trivalent, but not pairwise well-formed.)
3 Classification of Pairwise Well-Formed Scales The heptatonic scale C D E-flat F-sharp G A-flat B (C), so-called Hungarian minor or gypsy minor, modes of which appear in Indian, Arabic, and Jewish music, is a
Mathematical and Musical Properties of Pairwise Well-Formed Scales
467
pairwise well-formed scale with a unique structure, represented by the word. The concrete instance of this word as Hungarian minor is not the only one available within the 12-pitch-class system—C D-flat E-flat E F G A-flat (C) also has this structure—but perhaps because it may be understood as a deformation of the usual diatonic, or because of the related fact that it supports a number of harmonic triads, the so-called Hungarian minor is the most prominent representative of this word-class as a scale. All pairwise well-formed scales are either of the so-called Hungarian or gypsy minor type, that is, of the class expressed by w = , or else they have two step intervals of the same multiplicity. We call pairwise well-formed scales of the type singular and all others non-singular. Only singular pairwise wellformed scales have 3 distinct step multiplicities (1, 2, 4); in non-singular pairwise well-formed scales the step interval multiplicities are (m, m, n), with m ≠ n. Singular pairwise well-formed scales are self-similar in that all interval cycles of span d, d nonzero modulo 7, have the same structure (that is, correspond to the word ). Considered as scales at the concrete level, singular pairwise well-formed scales may be generated (in the case where the unique interval c = a+b). Non-singular pairwise well-formed scales can not be generated. Non-singular pairwise well-formed scales participate in cycles that generalize Cohn’s maximally smooth cycles. That is, there exist cycles of length at least 3 where adjacent elements are inversionally related and differ by a single note. Singular pairwise well-formed scales cannot participate in such cycles; they are transformationally frozen in a way that nonsingular pairwise well-formed scales are not. Parallel to the definition of a maximally smooth cycle in Cohn 1996, I define a generalization, the Q-relation and Q-cycle. Two pitch-class sets are in a Q-relation if there exists a transposition or inversion mapping one set onto the other that leaves all but one pitch class invariant and moves the remaining pitch class by any interval class, where the moving pitch class slides between fixed pitch classes, rather than leaping over them. For example, {01267} and its inversion (about the 3/4 axis){01567} are Q-related (2 slides to 5), whereas {01267} and its inversion (about the 0/1 axis){0167e} are not (2 leaps over stationary pitch classes to 11). For Qcycles, let us first stipulate sets of cardinality greater than 3, because trichords with three distinct step sizes are trivially pairwise well-formed and support cycles of Qrelated sets, but require some special attention precisely because it is so easy to construct such cycles. For sets of cardinality greater than 3, a Q-cycle is a cycle of length greater than 2 where adjacent sets in the cycle are Q-related. As an example of a Q-cycle for pairwise well-formed scales, consider the Japanese hemitonic pentatonic, the In-scale, which may be notated as E F A B C (E). Like the usual pentatonic, it can be considered a connected segment of the diatonic cycle of fifths, but unlike the usual pentatonic, this subset embraces the diminished fifth as well: A E B F C. Its cyclic step-interval sequence modulo 12 is <1 4 2 1 4>. It is possible to modulate to an inverted form of the scale either by moving one note down a (chromatic) semitone, (exchanging the positions of adjacent 2 and 1), or by moving one note up a tone, (exchanging the positions of adjacent 4 and 2), and if one assumes 12-note equal temperament, through all 24 members of the Forte set class 5-20: E F A B C → E F A Bb C → E F A Bb D → . . . → B C E F# G → B C E F G → (B C E F A). The
468
D. Clampitt
product of two successive inversions is a T5-transposition of the original set. All nonsingular pairwise well-formed scales exhibit either such cycles (within equal-divisions of the octave) or infinite chains, where adjacent sets in the chain are Q-related. As an example of an infinite Q-chain, consider the diatonic scale in just intonation. Here the intervals of motion are alternately syntonic commas and larger limmas. The just major scale has step intervals (in frequency ratios) as follows: do
re 9/8
mi 10/9
fa 16/15
sol 9/8
la 10/9
ti 9/8
(do) 16/15
If re is lowered by a syntonic comma (81/80), this produces an inverted form of the scale: <10/9, 9/8, 16/15, 9/8, 10/9, 9/8, 16/15>. (Because 9/8 x 80/81 = 10/9. Since the intervals are expressed here in frequency ratios, “subtracting a syntonic comma” is expressed as division by 81/80, or multiplication by its inverse.) If we follow this operation by lowering ti by a larger limma (multiplication by 128/135), the result is an inversion again. The composition of the two operations is a transposition by a just (Pythagorean) perfect fourth. Alternating these two operations forms an infinite Qchain. The distinction between cycles and chains only arises for concrete instantiations of the scale. At the word-theory level of description, however, the transformations take place within the class of conjugates of the word and of its reversal. The Japanese pentatonic, for example, is represented by the word . The Q-cycle exchanges adjacent letters c a and b c (or c b), always yielding rotations or retrogressions of the original word: a b c a b → a b a c b → a b a b c → c b a b a → b c a b a → b a c b a → b a b c a → b a b a c → c a b a b → a c b a b → (a b c b a), a cycle containing all ten elements of the word class, under the action of D5.
References Carey, N., Clampitt, D.: Self-Similar Pitch Structures, Their Duals and Rhythmic Analogues. Perspectives of New Music 34(2), 62–87 (1996) Clampitt, D.: Pairwise Well-Formed Scales: Structural and Transformational Properties. Ph.D. diss., State University of New York at Buffalo (1997) Clough, J., Douthett, J., Ramanathan, N., Rowell, L.: Early Indian Heptatonic Scales and Recent Diatonic Theory. Music Theory Spectrum 15, 36–58 (1993) Cohn, R.: Maximally Smooth Cycles, Hexatonic Systems, and the Analysis of Late-Romantic Triadic Progressions. Music Analysis 15, 9–40 (1996) Domínguez, M., Clampitt, D., Noll, T.: Well-formed Scales, Maximally Even Sets, and Christoffel Words. In: Klouche, T., Noll, T. (eds.) MCM 2007. CCIS, vol. 37. Springer, Heidelberg (2008) Lewin, D.: Cohn Functions. Journal of Music Theory 40, 181–216 (1996) Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002)
Eine Kleine Fourier Musik Emmanuel Amiot CPGE Perpignan [email protected]
Introduction The discrete Fourier transform, or DFT, of a pc-set was first introduced, for musical purposes, by David Lewin in his very first paper (Lewin 1959). His aim was the characterization of a pc-set by its intervallic relationship with another. This approach fails for some specific cases which exhibit interesting symmetries. It was by and large forgotten when Ian Quinn exhumated it in his dissertation (Quinn 2004), this time as an efficient tool for pinpointing in the landscape of all chords the most salient ones (‘prototypes’). One of several fascinating by-products of Quinn’s study is that such prototypes appear as solutions of an optimization problem of the DFT. In this conference I showed that the DFT is not only a good way to introduce and indeed define some of these prototypes (the famous Maximally Even Sets) but also essential or handy for many other musical notions, such as interval content and Z-relation, tilings and rhythmic canons, and the very recent Flat Interval Distribution of pc-sets first introduced by Jon Wild on the same day.
1 DFT of a pc Set We will follow Lewin’s choice of definition: Definition 1. The DFT of a pc-set A ∈ Zc is the map FA from Zc to C defined as e−2iπkt/c FA = F (1A ) : t → k∈A
It is the Fourier Transform of the characteristic function 1A of set A. Several properties are easily derived from the definition of Fourier Transforms features. To begin with, periodic pc-sets (meaning a period dividing c) are easily recognized by looking at their Fourier transforms, as from standard harmonic analysis a k−periodic phenomenon is completely described by k Fourier coefficients. Moreover: – FA (0) = #A, the cardinality of A. – The DFT of the whole chromatic scale is FZc (t) =
c−1
e−2iπkt/c = 0
for all t ∈ Zc except t = 0.
k=0
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 469–476, 2009. c Springer-Verlag Berlin Heidelberg 2009
470
E. Amiot
– The Fourier transforms of a pc-set A and of its complement Zc \ A have opposite values, except when t = 0: ∀t ∈ Zc , t = 0,
FZc \A (t) = −FA (t)
– The DFT FA characterizes the pc-set A, by Inverse Fourier transform. – The absolute values of the Fourier coefficients, abbreviated as |FA | are invariant under (musical) transposition or inversion of the pc-set A. This last one, which shows |FA | as an invariant under the T/I group of musical transformations, is enough to vindicate the interest of the DFT of pc-sets1 . Moreover, the relationship with the other well-known invariant under such transforms is straightforward (a more general form is already stated in Lewin 1959): Theorem 1. Let us define the [oriented] intervallic content of a subset A ∈ Zc as ICA (k) = #{(i, j) ∈ A2 , j − i = k} Then the DFT of the intervallic content is equal to the square of the length of the DFT of the set: F (ICA ) = |FA |2 This entails from the famous result on the Fourier transform of a convolution product, as ICA is the convolution product of the characteristic functions of A and −A (see again Lewin 1959): in general, for two subsets A, B, 1A 1B : t →
k∈Zc
1A (k)1B (t − k) =
1A (k)1−B (k − t)
k∈Zc
So the famous and vexing Z−relation (equality of interval contents) can be simply stated as the equality of the absolute values of DFT of two pc-sets. This is reminiscent of the definition of pitch following Helmholtz’s experiments: one can (mostly) drop the phase, the argument of the complex numbers FA (t), as only the module |FA (t)| is musically significant, in the most familiar musicological context anyway; just as the pitch of a sound is mostly its (fundamental’s) frequency, phase being (usually) irrelevant.
2 Maximal Values Consideration of the pc sets exhibiting maximal values of their DFT enables a novel definition of Maximally Even Sets (Coven and Meyerowitz 1999, Clough and Douthett 1991). This was discovered by Quinn (2004) and elaborated upon in Amiot 2008a. Details and proofs are to be found in this last reference. 1
It is also essentially invariant under complementation, (see Amiot 2008a).
Eine Kleine Fourier Musik
471
2.1 Regular Polygons These are sometimes referred to as ‘degenerate’ ME sets. The following is an easy consequence of the definition and of Minkowski’s inequality: Proposition 1. A is a regular polygon in Zc , i.e. it is a translate (for some divisor r of c) of {r, 2 r, , c = d r}, iff |FA (d)| = d. Proof. All exponentials in FA (d) = k∈A e−2iπd k/c are of length 1. The sum is hence of length ≤ 1 + 1 + · · · = d, with equality iff all exponentials are equal. This happens only when the angular differences between the e−2iπd k/c , k ∈ A are integer multiples of 2π and this means the k ∈ A are, up to a translation, the multiples of c/d = r, hence A is a regular polygon. 2.2 The General Case This takes admittedly a little longer to prove properly and I refer the critical reader to Amiot 2008a, only sketching the ideas for lack of space. As Quinn had noticed, when the cardinality d of A is no longer a divisor of c, the exponentials e−2iπd k/c can no longer be equal; but maximizing their sum means having them as close as possible one to another. According to the value of (c, d), these points may be distinct or not, involving multisets in the latter case. Making this informal idea precise, with a little trigonometry, enables to distinguish three cases, well known since Clough and Douthett 1991: Theorem 2. The pc-set A ∈ Zc maximizes value |FA (d)| among all subsets of cardinality d if and only if: 1. When (c, d) = 1, A is some translate of {f, 2 f, . . . d f } where f is the multiplicative inverse of d modulo c [−f works as well]. 2. When d | c, A is a regular polygon [seen above]. 3. When (c, d) = m with 1 < m < d, A is periodic with period c = c/m: A = B ⊕ c Zc , where [the projection modulo c of] B itself is maximizing |FB (d )| among all subsets of Zc with d = d/c elements. This last case, with the heredity of the optimization condition, is particularly obvious on the graphs of the Fourier transforms of A and B. These three cases make up the Maximally Even Sets defined in Coven and Meyerowitz 1999 and later generalized in Clough and Douthett 1991. So we have proven that the following definition is acceptable: Definition 2. The pc-set A ⊂ Zc , with cardinality d, is a ME set, if the number |FA (d)| is maximal among all pc-sets with cardinality d: ∀A ⊂ Zc ,
#A = d
⇒
|FA (d)| ≥ |FA (d)|
472
E. Amiot
(0
0
24
)
2
4
ME7,3
ME28,12
12
12
10
10
8
8
6
6
4
4
2
2 4
4
8
12
16
20
24
Fig. 1. Maximizing for B is maximizing for A
In Amiot 2008a I develop all usual properties of ME sets, up to and including the fakc+α , k = 1 . . . d}, starting from this definition mous generating formula A = { d that features among others the advantage that the category of ME sets is immediately and obviously invariant under transpositions, inversion, and complementation. Alongside with the musical relevance of the DFT and the fact that it adresses sets (i.e. chords), not scales like in Coven and Meyerowitz 1999 and Clough and Douthett 1991, it makes a strong case for this definition to become the natural one for ME sets. 2.3 Other Maximal Values Some other maximal values are interesting. For instance as a step in the proofs of the results in last paragraph, we obtain a characterization of ‘chromatic clusters’, which is interestingly a dynamic property (consecutivness) of unordered sets. Theorem 3. A is a chromatic cluster, i.e. A ∈ Zc is some translate of {1, 2 . . . d} if, and only if, |FA (1)| has maximal value among all d−subsets of Zc .
Eine Kleine Fourier Musik
473
Fig. 2. Maximizing the sum
A similar characterization stand for generated scales (well-formed scales in the terminology of Carey and Clampitt 1989), at least when a coprimality condition is satisified: Theorem 4. A pc-set A with d elements is generated by some f coprime with c, i.e. A is a translate of {f, 2 f, . . . d f }, if and only if FA ∗ = maxt∈Z∗c |FA (t)| is maximal amongst d−elements subsets (Z∗c is the group of invertible elements in the ring Zc ). This widens a little further the usefulness of maximal values of the DFT to the category of generated scales, invariant under affine transforms which permute the Fourier coefficients.
3 Minimal Values Of course the minimal possible value of a |FA (t)| is 0. This was the annoying case in Lewin 1959. It turns out that these values are of the utmost importance in tiling problems. A notable exception to the oblivion around Lewin’s original idea is to be found in the formidable work of D.T. Vuza on rhythmic canons (Vuza 1991–1992), which are instances of tilings of Zc . Definition 3. A tiles Zc iff there exists B ⊂ Zc with A ⊕ B = Zc . As Vuza observed (lemma 9.1 in Vuza 1991–1992), this is equivalent to (the constant function equal to 1 on Zc ) c if t = 0 . and the fourier transform turns this into FA × FB (t) = 0 else 1 A 1 B = 1 Zc = 1
474
E. Amiot
The usual way to express condition A ⊕ B = Zc is to turn it into A(X) × B(X) = 1 + X + . . . X n−1
(mod X n − 1) where A(X) =
Xa
a∈A 2iπ t/c
2
This is the same thing as the above equation if one puts X = e . Moreover, it turns out (see Amiot 2004 for a survey) that the conditions for A to tile adress only the zero-set of FA , ZA = {t ∈ Zc , FA (t) = 0}. This set is highly organized, as for instance (rewording Vuza’s thm. 2.4) Proposition 2. If FA (t) = 0 then FA (t ) = 0 for all values of t with the same order in the group (Zc , +) – that is to say the multiples of t by any integer invertible modulo c. This is a non obvious fact of Galois theory (essentially the irreducibility of cyclotomic polynomials in Z[X]). The structure of ZA is very closely related to the question of tiling (whether chord, or rhythm). For instance, Proposition 3. A tiles in Zc whenever there exists a subset B such that ZA ∪ ZB = {1, 2, . . . c − 1} and #A × #B = c. The tiling conditions of Coven and Meyerowitz enable to predict whether a subset A will tile, without trying all possible B’s (probably a NP problem). They can be reworded in terms of these common orders of roots of FA : setting RA as the set of orders of all the t ∈ Zc , FA (t) = 0 and SA the subset of these orders that are prime powers, (T1 ) d = #A = p pα ∈SA
(T2 )
pα ∈ SA , q β ∈ SA , · · · ⇒ pα × q β · · · ∈ RA
These conditions appear (Laba 2002) to be related to the famous conjecture of Fuglede, recently disproved in high dimension by Fields medalist Terence Tao, and still open in dimension 1. Fuglede stated that a set tiles iff it is spectral. In this context, ‘A is spectral’ means that Z(A) is part of a difference set, i.e. there exist some set Λ with #A elements and {0} ∪ Z(A) ⊂ Λ − Λ
4 Mean Value(s) Jon Wild has introduced the notion of FLID – Flat Interval Distribution [pc set]– as a generalization of all intervall sets (Wild 2008). With the notation of section I, A has FLID iff ICA takes values d, m, m, m, m . . . (m is some integer value). This a fascinating notion, especially musically, quite the reverse of well-formed scales in terms of intervallic polarization, but I will only state a last illustration of the power of DFT: Theorem 5. A has FLID ⇐⇒ |FA | is flat, meaning that all values |FA (t)|, t = 1, 2 . . . c − 1 are equal. 2
As a polynomial with degree ≤ c is determined by c particular values.
Eine Kleine Fourier Musik
475
DFT of 0,2,3,4,8 mod 11 5 4 3 2 1
2
4
6
8
10
Fig. 3. Each possible interval occurs twice, the DFT is flat
For instance, if c = 11 and A is the all-interval {0, 2, 3, 4, 8} (and hence √ IC(A) takes values 5,2,2,2,2. . . ) then the values of |FA | are 5, and then ten times 3. The direct implication is easy; the reciprocal necessited yet another representation of the pc-sets, another travesty of their DFT: Definition 4. The matrix of map f : Zc → C is Mf = f (i − j) 1≤i,j≤c . The matrix product corresponds to the convolution product of maps. As an example, if we set M (A) for the matrix of 1A , then A tiles with B iff M (A) × M (B) is the matrix with only ones. There is an isomorphism between two algebras: maps from Zc to C (with convolution product, not ordinary product), and circulating matrixes. The link with DFT is sweet: the FA (t) are the eigenvalues of M (A) ! So up to a change of basis, this is just another way of looking at the DFT. This enables to prove the above characterization of FLIDs with one page of simple matrix calculations (yet unpublished), relying on three simple facts: – All these circulating matrices are simultaneously diagonalizable, and 1 – The change-of-basis matrix can be taken unitary: Ω = √ ω (i−1)(j−1) 1≤i,j≤c = c −1 (Ω T ) , ω = e2iπ/c . – Matrixes M (A) are real valued. Of course, the proof can be expressed directly in terms of DFT, but it is not at all natural in that context. Another instance perhaps of the Yoneda philosophy, wherein an object is determined from looking at it from all possible angles.
5 Coda Let us summarize what DFT enables to characterize: – |FA | = |FB |: Z-relation. – ||FA ||∗ maximal: A is generated.
476
E. Amiot
– |FA (#A)| maximal: A is M.E. – |FA | flat: A has FLID. – |FA | = 0: crossroads between both tiling and spectral conditions. There are also other ways to compute Fourier coefficients, some apply outside the field of tempered scales. One such computation vindicates Lehman’s ‘discovery’ of the way Bach tuned his harpsichord (Amiot 2008b). It is high time to make true D.T. Vuza’s prediction (Vuza 1991–1992, T. 4, section 9): It is therefore my conviction that in the near future music theory will integrate convolution and Fourier transform as effective investigation tools, music theorists being able to use them in the same way as presently they make use of groups, homomorphisms, group actions, and so forth.
References Amiot, E.: Why Rhythmic Canons are Interesting. In: Lluis-Puebla, E., Mazzola, G., Noll, T. (eds.) Perspectives of Mathematical and Computer-Aided Music Theory, pp. 190–209. Verlag epOs-Music, Osnabr¨uck (2004) Amiot, E.: David Lewin and ME Sets. Journal of Mathematics and Music 1(3), 157–172 (2008a) Amiot, E.: Discrete Fourier Transform and Bach’s Good Temperament. Music Theory Online (2008b) (submitted) Carey, N., Clampitt, D.: Aspects of Well-formed Scales. Music Theory Spectrum 11(2), 187–206 (1989) Clough, J., Douthett, J.: Maximally Even Sets. Journal of Music Theory 35, 93–173 (1991) Coven, E.M., Meyerowitz, A.: Tiling the integers with one finite set. Journal of Algebra 212, 161–174 (1999) Laba, I.: The spectral set conjecture and multiplicative properties of roots of polynomials. Journal of the London Mathematical Society 65, 661–671 (2002) Lewin, D.: Re: Intervalic Relations between two collections of notes. Journal of Music Theory 3, 298–301 (1959) Quinn, I.: A Unified Theory of Chord Quality in Equal Temperaments. Ph.D. Dissertation, Eastman School of Music (2004) Vuza, D.T.: Supplementary Sets and Regular Complementary Unending Canons. Perspectives of New Music (in four parts) 29(2), 22–49, 30(1), 184–207 30(2), 102–125, 31(1), 270–305 (1991) Wild, J.: Flat Interval Distribution and Other Generalisations of the All-Interval Property. In: Lecture at the MCM 2007 in Berlin (2008)
WF Scales, ME Sets, and Christoffel Words Manuel Dom´ınguez1 , David Clampitt2 , and Thomas Noll3 1
3
Universidad Complutense de Madrid [email protected] 2 Ohio Sate University [email protected] Escola Superior di Musica di Catalunya [email protected]
Abstract. With a few exceptions (Chemillier and Truchet 2003), (Chemillier 2004), musical scale theory and combinatorial word theory have remained unaware of each other, despite having an intersection in methods and results that by now is considerable. The theory of words has a long history, with many developments coming in the last few decades; see Lothaire 2002 for an account. The authors thank Franck Jedrzejewski for an initial reference in word theory. The purpose of this paper is to translate between the language of two closely related scale theories and that of the theory of words.
1
Well-Formed Scales
A scale of N notes is a subset Σ ⊂ R/Z of cardinality N . Geometrically, one can represent a scale Σ as a set of N points around the circle. One can associate to every scale Σ a bijection (natural order of Σ) σ : ZN → Σ so that σ(k) =kth note of the scale, (0 ≤ σ(0) < σ(1) < · · · < σ(N − 1) < 1). We say that a scale Σ is generated if Σ = {kθ mod 1, k = 0, . . . , N − 1}. The number θ is called a generator of Σ and determines a bijection γ (generation order of Σ) γ ZN → Σ k −→ γ(k) = kθ mod 1 Definition 1. A generated scale is well-formed if the natural order is a multiplicative permutation of the generation order (formally a scale Σ is WF ⇔ γ −1 ◦ σ ∈ Aut(ZN ) ⇔ σ(k) = {(gk)mod N θ} with g ∈ Z∗N ). Well-formed scales were introduced in Carey and Clampitt 1989 as the scales that share the main properties of the diatonic scale, with equal-interval scales as limiting scales (degenerate well-formed scales). A scale Σ is non-degenerate well-formed if and only if it satisfies Myhill’s property, i.e., there are exactly T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 477–488, 2009. c Springer-Verlag Berlin Heidelberg 2009
478
M. Dom´ınguez, D. Clampitt, and T. Noll
two specific interval types of each non-zero generic length (two seconds or steps, two thirds, etc.), demonstrated in Carey and Clampitt 1996. Let W F (N, g) be the class of WF scales of N notes in which g is the multiplier that transforms generational order into natural order. N. Carey and D. Clampitt proved that the scales of W F (N, g) share the following properties: – Given Σ as defined above, if it is a member of W F (N, g), then the first generated step is a = {gθ} and the last one is b = 1 − {(N − g)θ}. They appear N − g and g times, respectively, within the octave. – g and N are coprime, and the numbers {g −1 mod N, (N − g)−1 mod N } are the generic lengths (number of steps)of the generator θ, and of 1 − θ. – The generator contains yg = steps of length b.
−1 gmod N ·g N
−1 steps of length a and xg = gmod N −yg
The scales which share the same number of notes and the same multiplier have therefore all discrete parameters in common. The real lengths of both steps a, b and generator θ may vary within limits given by the following: Proposition 1. If (Σ, σ) ∈ W F (N, g) is a scale with steps a = σ (1), b = 1−σ(N −1) and generator θ, then a, b and θ are verify the following restrictions: 0
⎫ yg a + xg b = θ ⎬ (N − g)a + gb = 1
⎭
1 N −g
0
(1) xg g
<θ<
1 g yg N −g
⎫ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎭
(2)
Proof. The equations (1) are clear. For the inequalities, we have the following limiting cases: 1. If we take θ big enough so that b becomes 0, we lose g notes, obtaining a scale of pattern aa . . . a This is a well-formed degenerate scale of length N −g
y
g N − g, hence a = N 1−g and the generator has generic length yg : θ = N −g 2. If we reduce θ so that a = 0, we lose N − g steps and we get a well-formed x degenerate scale of length g. In this scale b = g1 and θ = gg .
In accordance with proposition 1, the set of scales W F (N, g) can be represented geometrically as the segment of the line given by the equations (1) contained in the parallelepiped defined by the inequalities (2):
WF Scales, ME Sets, and Christoffel Words
479
Fig. 1.
2
Christoffel Words
In this section we introduce a pair of geometrical definitions for the word corresponding to the step patterns of well-formed scales. These words are the rational degenerate cases of Sturmian words (infinite aperiodic words with minimal complexity). Given a real number α such that 0 ≤ α < 1, we define the (lower) mechanical word as the infinite word: sα : N −→ Z2 n −→ α(n + 1) − αn In figure 2 we observe the geometrical interpretation of mechanical words. The line y = αx determines a succession of points of coordinates Pn = (n, αn) (the succession of points with integer coordinates which are closer to the line and bellow it). sα (n) = 0 (resp. sα (n) = 1) iff the segment Pn Pn+1 is horizontal (resp. diagonal). If α = np ∈ Q, the mechanical word sα is periodic and its minimal period is n letters long. In the following lines we will identify each periodic word with its minimal period. Definition 2. Christoffel words are rational mechanical words s np . Remark 1. A second way of generating Christoffel words is known as cutting sequences. Let {Qn }n∈Q be the intersection of the line y = αx with the integer grid (the set of lines {x = n, y = m}n,m∈Z ). We will say that Qn = (xn , yn ) is horizontal (resp. vertical) if yn (xn ) is an integer (if both coordinates are integer
480
M. Dom´ınguez, D. Clampitt, and T. Noll
numbers, we replace Qn by two points, the first one horizontal, the second one vertical). We define the cutting sequence of the line y = ax as the word Cα (n) =
0 if Qn is horizontal 1 if Qn is vertical
α , and therefore Christoffel It can be proven (see Lothaire 2002) that Cα = S α+1 words can be generated also as cutting sequences.
Fig. 2. Cutting sequence C 2 5
Fig. 3. Mechanical word S 2 ,0 7
We define the slope of a Christoffel word as the slope of the line that generates it as a cutting sequence. If w is a Christoffel word of length n and slope pq , then p = |w|1 , q = |w|0 and w can be generated also as a mechanical word with a |w|1 |w| line of slope |w| +|w| = |w|1 = np . 0
3
1
Well-Formed Classes and Christoffel Words, Duality
In this section we identify the step pattern of well-formed scales with Christoffel words. An identification between dualities obtaining in both domains can also be established. Let Σ ∈ W F (N, g) be a well-formed scale with steps a, b, and generator θ. We call the step pattern of Σ the finite binary word w given by the following characteristic function: wΣ (i) =
0 if the step σ(i) − σ(i − 1) = {gθ} = a 1 if the step σ(i) − σ(i − 1) = 1 − {(N − g)θ} = b
Proposition 2. A scale Σ is non-degenerate well-formed of the class W F (N, g) if and only if its step pattern is a Christoffel word. Proof. Let us compute the step pattern of Σ : σ(i) − σ(i − 1) = {(i + 1)gmod N θ} − {igmod N θ} = = {((i + 1)gmod N − igmod N )θ} =
{gθ} {(N − g)θ}
WF Scales, ME Sets, and Christoffel Words
481
and therefore w(i) =
0 ⇔ (i + 1)gmod N − igmod N = 1 ⇔[
(i + 1)g ig ]−[ ]= N N
g g−N
0 1
Thus the step pattern of Σ is: w(i) = [
(i + 1)g ig ]−[ ] N N
This is exactly the definition of a Christoffel word of slope
g N −g .
By definition, the Christoffel word of a given slope is unique. Therefore, two scales have the same pattern if and only if they are in the same class W F (N, g). Hence, these classes can be interpreted as equivalence classes modulo scale patterns. The only exceptional case is the degenerate scale; it is the unique scale of W F (N, g) (the only point of the segment defined in proposition 1) with a different pattern (steps a and b are indistinguishable). In Carey and Clampitt 1996 the notion of duality between classes of WF scales was established as follows. Given a class of WF scales with parameters N and g, the multiplicative inverse of g mod N is the span (generic length) of the generating interval. In the dual class, the notions ”generic lengths of the generators” and ”number of steps of each type” exchange roles. Definition 3. Given a well-formed class W F (N, g), its dual class is W F (N, g −1 mod N ). Every Christoffel word w of length n and slope pq decomposes in a unique way as w = x · y, with (x, y) a Christoffel pair (see proposition 6) The lengths of the subwords x and y are, respectively q −1 mod n and p−1 mod n. Definition 4. The dual word of w is the only Christoffel word w∗ of length n −1 mod n and slope pq−1 mod n. The lengths of the subwords that comprise the Christoffel pair associated with w∗ are given by (p, q). Thus, with duality the notions ”frequencies of 1’s and 0’s” and ”length of the associated Christoffel pair” exchange roles. The Christoffel word of the dual class W F (N, g −1 mod N ) is the Christoffel −1 N g−1 mod N word of slope Ng−g−1mod mod N = (N −g)−1 mod N . This word coincides, by definition, with the dual word w∗ of the step pattern of the original class W F (N, g). Conclusion 1. Duality between WF scale classes and duality between their related step patterns (viewed as Christoffel words) coincide.
482
4
M. Dom´ınguez, D. Clampitt, and T. Noll
Christoffel Words, Maximally Even Sets and Musical Modes
In this section we relate Christoffel words with the patterns of maximally even sets. These sets were introduced in Clough and Douthett 1991 to describe the scales in which notes are distributed as evenly as possible within a chromatic universe. (Geometrically, one can consider the problem of distributing d black points and c − d white ones around the circle, so that both colors are as mixed as possible.) A d note scale in a universe of c notes is a maximally even set if it is equivalent modulo rotations to the subset Jd,c = {[k dc ]; k = 0, . . . , d−1} ⊂ Z/cZ. The set of maximally even sets of d notes in a universe of size c is denoted by M E(c, d). The characteristic set of a well-formed scale can be defined as: ! Place of the b steps XΣ = {i ∈ Z/N Z so that.wΣ (i) = 1} = of Σ in scale order N. Carey and D. Clampitt have computed the characteristic set of a WF scale Σ of N notes and multiplier g and have shown that it is conjugated with the 0 set Jg,N = {[k Ng ]; k = 0, . . . , g − 1} (see Carey 1998). If we denote by Tk the translation of pc-sets (that is, Tk X = {(x + k) mod N with x ∈ X}) and by I the inversion (IX = {−x mod N ,x∈ X}), we can reformulate their results as follows: Proposition 3. Let Σ ∈ W F (N, g) be a well-formed scale. The characteristic set of Σ is given by the expression: ! kN + k(−N )mod g XΣ = T−1 ; k = 0, . . . , g − 1 g Proposition 4. The characteristic set XΣ of the scale Σ ∈ W F (N, g) verifies 0 T1 XΣ = IJg,N
Corollary 1. The characteristic set of a class W F (N, g) is maximally even. Proof. It is a consequence of the properties of ME sets that are proved in (Clough and Douthett 1991): ME sets are invariant under inversion I and ro tations Tk . As noted above, one can consider a maximally even set as a distribution of points around a circle. In this context it is convenient to introduce cyclic words: Definition 5. Two words u and v are conjugated if and only if u = xy, v = yx, for some words x, y. If one understands words as written circularly, conjugation can be thought of as an equivalence relation — via circular rotations — whose classes are called cyclic words.
WF Scales, ME Sets, and Christoffel Words
483
Fig. 4. Conjugation can be viewed as a rotation congruence
Definition 6. Given a scale Σ ∈ W F (N, g) with pattern wΣ each of the representatives of the cyclic word [wΣ ] is called a mode of the scale Σ. In the next table we present the traditional modes of the diatonic scale, which are rotations of the scale pattern associated to W F (2, 7). Mode Pattern ME set Lydian 0001001 XC = {3, 6} Mixolydian 0010010 T6 XC = {2, 5} Aeolian 0100100 T5 XC = {1, 4} ’ Locrian 1001000 T4 XC = {0, 3} Ionian 0010001 T3 XC = {2, 6} Dorian 0100010 T2 XC = {1, 5} Phrygian 1000100 T1 XC = {0, 4} Proposition 5. Given g < N, two coprime integers, there is a natural bijection between the set M E(N, g) and the modes of non-degenerate scales Σ in W F (N, g). Proof. One just has to associate, to every set X in M E(N, g), its characteristic word wX . Every ME set of M E(N, g) is a rotation of X. Hence, by corollary 1, wX is a rotation of the scale pattern of W F (N, g). Remark 2. Maximal evenness is one of the modern topics in mathematical music theory with influence and applications in other fields: 1. A new formulation of maximal evenness has been given by E. Amiot in (Amiot 2007). The author uses the language of complex analysis, in particular Discrete Fourier Transforms, and shows in a very elegant way the basic properties of ME sets and a equivalence between this formulation and the classic definition. 2. G. Toussaint, F. Gomez-Mart´ın and other authors have studied recently geometric aspects of musical rhythm. In particular they translated the idea of maximizing the evenness between onsets (d) and time-span (c) of the desired rhythm. The resulting rhythms, called Euclidean rhythms, turned out to be particularly attractive when c and d are coprime. For more details see (Damiane and Gomez-Mart´ın 2006), (Toussaint 2005a) and (Toussaint 2005b). For a connection between maximally even sets and many other topics of science, such as calendar design, euclidean algorithm for
484
M. Dom´ınguez, D. Clampitt, and T. Noll
computing the g.c.d., spallation neutron source in nuclear physics,etc., see (Damiane and Gomez-Mart´ın 2006). 3. T. Noll introduced the term Clough word in (Noll 2007) for the cyclic words whose characteristic set is maximally even with coprime frequencies. In this paper he also established a recursive construction of these kind of words. Considering our previous discussion, it is easy to set a canonical bijection between Christoffel words of frequencies |w|0 = N − g, |w|1 = g and Clough words of diatonic length g and chromatic length N.
5
Christoffel Tree and the Monoid SL(2, N)
In this section we analyze the algebraic structure of Christoffel words, and its consequences in well-formed scale theory. The decomposition of Christoffel words is extracted from (Lothaire 2002, Chap. 2), which is the central reference for algebraic combinatorics on words. In this book, the demonstrations are based on standard words, which are conjugated to Christoffel words (if 0c1 is a Christoffel word, c10 is standard). For a detailed adaptation of the proofs in terms of Christoffel words see (Dom´ınguez 2007). Furthermore, standard words and Sturmian morphisms play very special roles in the description of the modes of a scale (see (Noll, Clampitt and Dom´ınguez 2007)). Let G and D be two applications that transforms the set {0, 1}∗ × {0, 1}∗ (pairs of finite words on the alphabet {0, 1}) into itself, defined by: G, D : {0, 1}∗ × {0, 1}∗ −→ {0, 1}∗ × {0, 1}∗ G
(u, v)
−→ G(u, v) = (u, uv)
(u, v)
−→ D(u, v) = (uv, v)
D
Definition 7. The set of Christoffel pairs is the smallest set of pairs of words containing the pair (0, 1) and closed under {G, D}. By construction, the set of Christoffel pairs can be represented in a tree diagram with root (0, 1) and nodes the Christoffel pairs. The following proposition identifies the set of Christoffel words with the pairs of words in the Christoffel tree: Proposition 6. A pair of words is a Christoffel pair if and only if it has one of the following shapes: 1. (0, 0n 1) with n ∈ N (these are the pairs Gn (0, 1)) 2. (01n , 1) with n ∈ N (these are the pairs Dn (0, 1)) 3. (x, y) with x and y Christoffel words. Furthermore, the next application is a bijection that identifies the Christoffel words with the pairs generated by G and D : ∼
Christoffel pairs −→ Christoffel words (x, y) −→ x·y
WF Scales, ME Sets, and Christoffel Words
485
Fig. 5. The tree of Christoffel pairs
" # " # 10 11 , and let SL(N, 2) and L = 11 01 be the monoid of matrices 2 × 2 with natural entries and determinant 1. It is well known (see Noll 2007 for a proof) that:
Let us consider now the matrices R =
Proposition 7. SL(N, 2) is freely generated by {R, L}. Corollary 2. There is a canonical identification between the set of finite words {G, D}∗ and the monoid SL(N, 2). For every Christoffel word w = x · y there is a unique path that leads from (0, 1) to (x, y), or equivalently, there is a word W ∈ {G, D}∗ such that W (0, 1) = (x, y). This word W is called the generating word of w. The associated matrix Aw ∈ SL(N, 2), obtained exchanging G by L, D by R and the concatenation by the product of matrices, is called the incidence matrix of the word w. Proposition 8. The incidence matrix of a Christoffel word w = x · y verifies: # " |x|0 |x|1 Aw = |y|0 |y|1 Proof. The matrix A(0,1) = Id satisfies the formulation. We prove by induction that AG(x,y) = L · A(x,y) (the assertion AD(x,y) = R · A(x,y) can be proven in a similar way). # " " # " # |x|0 |x|1 10 |x|1 |x|0 · L·A= = = 11 |y|0 |y|1 |x|0 + |y|0 |x|1 + |y|1 " # |x|0 |x|1 = = A(x,xy) = AG(x,y) |xy|0 |xy|1
486
M. Dom´ınguez, D. Clampitt, and T. Noll
We conclude by determining the relationship between the step pattern of dual scales. Let be the application
SL(N, 2) −→ Q+ # " ab −→ a+b c+d cd This application (called the mediant ratio by T. Noll in Noll 2007) transforms the incidence matrix Aw of a Christoffel word w = x · y into |x| |y| , that is, the slope of the dual word w∗ . Denoting by A∗w the matrix dc ab (with the main diagonal |w|1 is the slope of the original word w. In conclusion: flipped), then (A∗w )= |w| 0
Proposition 9. A∗w is the matrix of the dual word of w and the transformation ∗
SL(N, 2) −→ SL(N, 2) # # " " db ab −→ ca cd is an anti-automorphism which induces the duality in Christoffel words, and therefore in the step patterns of well-formed scales modulo conjugation (Clough words). Proposition 10. The step patterns of dual scales are related to retrograde paths in the Christoffel tree. In other words, if w = x · y is the scale pattern of a scale and we have W (0, 1) = (0, 1) for some word W ∈ {G, D}∗ , then w∗ = y ∗ · x∗ is $ (0, 1). the pattern of the dual scale, where (y ∗ , x∗ ) = W # " ab ∈ SL(N, 2) and let us suppose we have the decomposiProof. Let Aw = cd # # " " 01 01 tion Aw = Λ0 · . . . Λk with Λi ∈ {R, L}. One has that A∗ = · At · 10 10 and therefore: # # # " # " " " 01 01 01 01 ∗ t t t · Λk · · · · · Λ0 · Aw = · (Λ0 · · · · · Λk ) · = 10 10 10 10 #2 # " 01 01 = R and Observe that L = R, ·L· = Id, thus: 10 10 # # ## "" ## "" " " 01 01 01 01 ∗ t t · Λ0 · · Λk · · ···· = Aw = 10 10 10 10 = Λk · · · · · Λ0 = AW $ "
t
6
01 10
#
"
Final Remarks
1. Throughout the paper we have considered Christoffel words to be positive, that is, the path over the integer grid that lies under the semiline. If we
WF Scales, ME Sets, and Christoffel Words
487
want to generate the negative Christoffel words (the path under the integer grid), we have just to change the root of the Christoffel tree and to consider (1, 0) instead. 2. This paper should be understood as a point of departure in which we have identified the study of scale patterns with the algebraic theory of words. Therefore, one should continue research into related topics which potentially offer music-thoretical interpretations. For example, Sturmian words, which are obtained geometrically in the same way as are Christoffel words, but with rays of irrational slope, were discussed in Carey and Clampitt 1996, where they were referred to as quasi-periodic sequences. 3. The transformational theory for well-formed scales as proposed in (Noll 2007) is mainly covered by the theory of Sturmian morphisms. It is therefore challenging to review the music-theoretical interpretations in (Noll 2007) within the full algebraic context of Sturmian morphisms.
References Amiot, E.: David Lewin and Maximally Even Sets. Journal of Mathematics and Music 2, 157–172 (2007) Berstel, J., de Luca, A.: Sturmian words, Lyndon words and trees. Theoretical Computer Science 178, 171–203 (1997) Berth´e, V., de Luca, A., Reutenauer, C.: On an involution of Christoffel Words and Sturmian morphisms, Publications du LIRMM (Laboratoire d ´ Informatique de Robotique er de Micro´el´ectronique de Montpellier), no.06044 (2006) Carey, N.: Distribution modulo 1 and musical scales, Ph.D. diss., University of Rochester (1998) Carey, N., Clampitt, D.: Self-Similar Pitch Structures, Their Duals, and Rhythmic Analogues. Perspectives of New Music 34(2), 62–87 (1996) Carey, N., Clampitt, D.: Aspects of Well-formed Scales. Music Theory Spectrum 11(2), 187–206 (1989) Chemillier, M., Truchet, C.: Computation of words satisfying the rhythmic oddity property (after Simha Arom’s works). Information Processing Letters 86(5), 255– 261 (2003) Chemillier, M.: Periodic Sequences and Lyndon Words. In: Assayag, G., Cafagna, V., Chemillier, M. (eds.) Formal Systems and Music, special issue of Soft Computing, vol. 8(9), pp. 611–616 (2004) Clough, J., Douthett, J.: Maximally Even Sets. Journal of Music Theory 35(1/2), 93–173 Damiane, E.D., Gomez-Mart´ın, F.a.o.: The Distance Geometry in Music. Computational Geometry: Theory and Applications (submitted) Dom´ınguez, M.: Teor´ıa Matem´ atica de Escalas Bien Construidas, Temas avanzados de Geometr´ıa Diferencial. Universidad Complutense de Madrid (2007) Lothaire, M.: Algebraic combinatorics on words. Cambridge University Press, Cambridge (2002) Noll, T.: Musical Intervals and Special Linear Transfomations. Journal of Mathematics and Music 1(2), 121–137 (2007)
488
M. Dom´ınguez, D. Clampitt, and T. Noll
Noll, T., Clampitt, D., Dom´ınguez, M.: What Sturmian Morphisms Reveal about Musical Scales. In: Proceedings of WORDS 2007, Marseille (2007) Toussaint, G.: The Geometry of Musical Rhythm. In: Akiyama, J., Kano, M., Tan, X. (eds.) JCDCG 2004. LNCS, vol. 3742, pp. 198–212. Springer, Heidelberg (2005) Toussaint, G.: The Euclidean algorithm generates traditional musical rhythms. In: Proc. of BRIDGES: Mathematical Connections in Art, Music and Science (2005)
Interval Preservation in Group- and Graph-Theoretical Music Theories: A Comparative Study Robert Peck Louisiana State University [email protected]
Interval preservation—wherein intervals remain unchanged among varying musical objects—is among the most basic means of manifesting coherence in musical structures. Music theorists since Milton Babbitt’s (1960) seminal publication of “TwelveTone Invariants as Compositional Determinants” have examined and generalized situations in which interval preservation obtains. In the course of this investigation, two theoretical contexts have developed: the group-theoretical, as in David Lewin’s (1987) Generalized Interval Systems; and the graph-theoretical, as in Henry Klumpenhouwer’s (1991) K-net theory. Whereas the two approaches are integrally related—the latter’s being particularly indebted to the former—they have also essential differences, particularly in regard to the way in which they describe interval preservation. Nevertheless, this point has escaped significant attention in the literature. The present study completes the comparison of these two methods, and, in doing so, reveals further-reaching implications of the theory of interval preservation to recent models of voice-leading and chord spaces (Cohn 2003, Straus 2005, Tymoczko 2005, among others), specifically where the incorporated chords have differing cardinalities and/or symmetrical properties. In the group-theoretic approach, we associate an interval i with the action of a member of a group on a set. Then, we say that i is preserved if its conjugation by some operation h is trivial; that is, ih = h-1ih = i. As such, the set of all operations that commute with the members of a group, its centralizer, defines the collection of interval-preserving operations for that group. A canonical example of group-theoretic interval preserving operations is the action of the neo-Riemannian Schritt/Wechsel group on the set-class of consonant triads, which preserves intervals that derive from the usual T/I group of transposition and inversion operators (Clough 1998). As the action of the latter group on this set-class is regular, it is isomorphic to its centralizer. In particular, the S/W group may be generated by an order 12 Schritt, which moves major and minor triads equally in opposite directions, and any Wechsel. We may generalize the structure of a centralizer in the group-theoretical approach as follows. First, we have the case of a group H with a transitive action. Theorem 1. (Dixon and Mortimer, 1996, Let S be a set, x be a point in S, Sym(S) be the symmetric group S, H be a transitive subgroup of Sym(S), and C be the centralizer of H in Sym(S). Then: 1) C is semiregular, and C ≅ NH(Hx)/Hx; 2) C is transitive if and only if H is regular; 3) if C is transitive, then it is conjugate to H in Sym(S) and hence C is regular; T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 489–492, 2009. © Springer-Verlag Berlin Heidelberg 2009
490
R. Peck
4) C = 1 if and only if Hx is self-normalizing in H (that is, NH(Hx) = Hx); 5) if H is abelian, then C = H. Next, we examine the special case of a group D with an intransitive action, where D is a diagonal subgroup of the direct product of its orbit restrictions. Theorem 2. Put D ≤ Sym(S), and C = CSym(S)D. Further, let R be the set of n Dorbits, and assume that D is a diagonal subgroup of × ͳܴܦǤǤǤ × ܴ݊ܦ, where Ri ∈ R. Then, ൌ݉ݕܵܥሺܴ ሻ ܴ݅ܦ ݅
wr Sym(R), for any such Ri ∈ R. Finally, we have the generalized structure for an intransitive group H. Theorem 3. Let G = Sym(S). Put H ≤ G, and C = CGH. Further, let P = {P1, ..., Pn} be a partition of H-orbits into unions, such that ݅ܲܪis a maximally embedded diagonal subgroup. Then, ܥൌ ܥௌ௬ሺభሻ ܪభ ൈ ǥൈ ܥௌ௬ሺሻ ܪ Ǥ In the graph-theoretical approach, we identify musical objects with a graph’s nodes, and intervals with directed arrows that connect those nodes, forming a network. Then, whenever we have some operation h on the nodes of a network, we have also a corresponding conjugation by h on the labels of its arrows; and, as we have observed, h is interval-preserving if ih = i. For example, Figure 1a presents a network N1, in which the nodes associate with consonant triads (i.e., members of set-class [0,3,7]), and arrows with intervals that derive from the T/I group. Then, as Figures 1b and c are transformations of N1 by members of the S/W group (the Schritt S7 and the Wechsel P [Parallel Exchange], respectively), they preserve the intervals of N1. T1
E+ I11
C-
F+ I0
B+ I11
T1
F-
C+ I0
T1
EI11
F-
C+
I0
͵ൌሺͳ
ሻ
ሻͳሻʹൌሺͳ
ሻ ሻ
Fig. 1. Three networks with identical T/I-intervallic content (E+ = E major, C- = C minor, etc.)
Another situation exists for cases in which the action of the group of intervals is merely transitive, and herein lies an important distinction between the two approaches. Whereas a network N and its transform under a member of the centralizer will always have the same intervallic content, these are not the only networks that preserve those intervals. A canonical example appears in the K-classes of K-net theory (O’Donnell 1998), as illustrated in Figure 2. As the action of the T/I group is transitive on pitch-classes—but not simply transitive—its centralizer consists only of the center of the group: the subgroup generated by T6 (Peck 2004). Hence, the K-nets in Figures 2a and b have the same intervals. Nevertheless, the entire K-class (consisting of all networks with identical intervallic content) for Figure 2a contains twelve networks. For instance, whereas Figure 2c does indeed contain the same intervals as
Interval Preservation in Group- and Graph-Theoretical Music Theories
491
the previous two, it does not obtain from either by a well-formed group-theoretical operation on pitch-classes: we cannot define a permutation on a set that sends all its members in opposite directions simultaneously. It derives rather via some quasiSchritt, S′1. T1
C I11
B
C# I0
T1
F# I11
F
G I0
T1
C# I11
A#
D I0
a) N1
b) N2 = (N1)
T6
c) N3 = (N1)
S′1
Fig. 2. Three strongly isographic K-nets
The K-class structure is possible because the nodes of a network are defined by members of a group of intervals in relation to a given point, not by permutations of a set. Accordingly, each node in the network is unique, even if any two or more associate with the same member of the set, just as the members of the group are unique. Then, as Cayley’s Theorem demonstrates, Theorem 4. (“Cayley’s Theorem”) (Dixon and Mortimer, 1996, 6). Every group G is isomorphic to a subgroup of the symmetric group on G. Any group G is isomorphic to a regular representation on itself. Hence, the centralizer of any Cayley representation is isomorphic to the representation, and we may define |G|/|Gx| networks with the same intervals (where |G| is the size of the group, and |Gx| is the size of a point stabilizer in the group). In the example of a K-class, the T/I group contains twenty-four members, and each pitch-class x is stabilized by two members of the group (T0 and I2x); consequently, the K-class contains 24/2 = 12 members. We may generalize these methods to groups with intransitive actions (i.e., those with more than one orbit). In the group-theoretical approach, the overall centralizer is a direct product of orbit centralizers, which, in the special case of a group with a diagonal action, may also be a wreath product that permutes the orbit centralizers (Dixon and Mortimer 1996, 109). A canonical example of this latter situation occurs in Uniform Triadic Transformations (Hook 2002), which preserve transpositional intervals among consonant triads. In the graph-theoretical approach, however, the resulting Cayley representation allows us to consider intervals in any network as deriving from a diagonal group, therefore always permitting permutations of constituent orbits. As such, for any intransitive network N with m connected components and n orbits, we may construct n!/(n – m)! ⋅ |G|/|Gx| networks with the same intervallic content. This structure enables us ultimately to describe interval-preserving operations among all pitch-class sets, regardless of their cardinalities and/or symmetrical properties, thus applying them to recent geometric models of all chords.
492
R. Peck
References Babbitt, M.: Twelve-Tone Invariants as Compositional Determinants. The Musical Quarterly 46, 46–59 (1960) Clough, J.: A Rudimentary Model for Contextual Transposition and Inversion. Journal of Music Theory 42(2), 297–306 (1998) Cohn, R.: A Tetrahedral Graph of Tetrachordal Voice-Leading Space. Music Theory Online 9.4. (2003) Dixon, J.D., Mortimer, B.: Permutation Group Theory. Springer, New York (1996) Hook, J.: Uniform Triadic Transformations. Journal of Music Theory 46, 57–126 (2002) Klumpenhouwer, H.: A Generalized Model of Voice-Leading for Atonal Music. Ph.D. diss., Harvard University (1991) Lewin, D.: Generalized Music Intervals and Transformations. Yale University Press, New Haven (1987) O’Donnell, S.: Klumpenhouwer Networks, Isography, and the Molecular Metaphor. Intégral 12, 53–79 (1998) Peck, R.: Centers and Centralizers: Commutativity in Group-Theoretical Music Theory. In: Presentation to the 27th Annual Meeting of the Society for Music Theory, Seattle, Washington (2004) Straus, J.N.: Atonal Pitch Space. In: Presentation to the 28th Annual Meeting of the Society for Music Theory, Cambridge, Massachusetts (2005) Tymoczko, D.: A Map of All Chords. In: Presentation to the 28th Annual Meeting of the Society for Music Theory, Cambridge, Massachusetts (2005)
Pseudo-diatonic Scales Franck Jedrzejewski Atomic Energy Commission (CEA-INSTN) [email protected]
Abstract. The generalization of diatonic scales in a given tone system has been investigated by Eytan Agmon (see Agmon 1989, Agmon 1991), John Clough (see Clough 1979, Clough and Myerson 1985), and in relation with microtonality by Gerald Balzano and Mark Gould (see Gould 2000). Recently, Thomas Noll (2006) gave a new synthetic approach of pseudo-diatonic scales (Model A). From our first essay (Jedrzejewski 2002) until the last article (Jedrzejewski 2008), we developed a new model of generalized diatonic scales based on a new arrangement of the Stern-Brocot tree (Model B). With our new definition of diatonicism, we recover Wyschnegradsky’ diatonic scales in the quarter tone universe, a concept that he called diatonicized chromatism (Wyschnegradsky 1979), studied in (Jedrzejewski 1996) and (Jedrzejewski 2003). In the present article, we point out the differences of the two models (A and B).
1
Shuffled Stern-Brocot Tree
The Stern-Brocot tree is a binary tree (see Brocot 1862, Stern 1858) of positve rational numbers obtained by inserting the mediant of two adjacents fractions. The mediant of two adjacent fractions p1 /q1 and p2 /q2 is defined by (see (Farey 1816)) the Farey sum p1 p2 p1 + p2 ⊕ = q1 q2 q1 + q2 Each node of the tree is associated with a word ω of the free monoid L = {S, T }∗ of words on the alphabet of two letters S and T . The word ω describes the path in the Stern Brocot to reach the node from the root of the tree: S means a left step and T means a right step. In this way, every positive fraction is encoded as a finite sequence of symbols S and T . At each letter is also associated the two functions T (x) = x + 1 and S(x) = x/(x + 1). At each node, a function is defined by substituting in the word ω of this node the concatenation of letters by the composition of functions. The fraction of this node is the value of the function at x = 1. For example, the word ω = T S means that, starting from the top of the tree at the value 1, we go a step on the right and then a step downwards on the left. The associated function is T S(x) = T (x/(x + 1)) = (2x + 1)/(x + 1) and the value of this function for x = 1 is the fraction 3/2. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 493–498, 2009. c Springer-Verlag Berlin Heidelberg 2009
494
F. Jedrzejewski
The Shuffled Stern-Brocot tree is a different arrangement of the same binary tree. It starts at x = 1, but uses the functions L(x) = T (x) = x + 1 for a left downwards step and R(x) = 1/(x + 1) for a right downwards step. Each node is associated with a word of the free monoid {L, R}∗ of words on the alphabet of two letters L and R. The fraction of the node is the value of the function ω(x) at x = 1. In the Stern-Brocot tree, a chromatic number (a notion introduced in Noll 2006) is a positive rational number x = 1, 1/2, 2/1, whose associated word ω ends by two different letters (ST or T S). Equivalently, in the shuffled Stern-Brocot tree, a rational number is a chromatic number if and only if its word ω ends by R (see (Jedrzejewski 2008))
2
Construction of Pseudo-diatonic Scales
In model A, a pseudo-diatonic scale is the union of two subsystems: the Agmon subsystem and the Balzano subsystem linked by concordance. For each chromatic number r/(2n + 1) with odd denominator, we construct a cycle of pseudo-thirds generated by r/(2n + 1) mod 1 (or equivalently by r mod 2n + 1) and a cycle of pseudo-fifths generated by 2r/(2n + 1) mod 1 (or equivalently by 2r mod 2n + 1). In this model, a generic pseudo-fifth equals two pseudo-thirds. The generic pseudo-fifth has two successors along the Stern-Brocot tree. The chromatic successor m/N determines the N -chromatic universe into which the Balzano subsystem is embedded as a specific scale. In model B, a large ME* scale A is a non-degenerated well-formed scale of generator k whose complement Ac has no adjacent pitch classes (k is a nonnegative integer). A large ME* scale A (k ≥ 2) is tightly generated if there is no non-degenerated well-formed k-generated scale W between Ac and A (Ac ⊂ W ⊂ A). In the N -tone equal temperament (N ≥ 12), a generalized diatonic scale (or simply a diatonic scale) is k-generated large ME* with generator k > 2 minimizing the difference of cardinality #A − #Ac (condition of minimality). If there is no such scale (N = 15 is the only known example so far), the trivial 2-generated scale might be chosen as the generalized diatonic scale. So for each N , there is always one generalized diatonic scale. In order to compare the pseudo-diatonic scales of model A with the generalized diatonic scales of model B, we implement an algorithm for searching the scales. For each chromatic number m/N of the shuffled Stern-Brocot tree, we look at its predecessor m1 /N1 and construct the generic cycle of fifths as part of the (generalized) Balzano subsystem U = {m1 k mod N1 , k = 0, 1, ..., N1 − 1} The rational number m2 /N2 = (m − m1 )/(N − N1 ) is the Farey complement of m1 /N1 satisfying m m1 m2 = ⊕ N N1 N2
Pseudo-diatonic Scales
495
With m2 /N2 , we complete the (generalized) Balzano subsystem in terms of the cycle: V = {m2 k mod N2 , k = 0, 1, ..., N2 − 1} The permutation σ(x) = x mod N induces an embedding of U and V in the chromatic universe ZN . This algorithm produces the two types of scales A and B. The N1 white keys are the set σ(U ) and the N2 black keys are a transposition of σ(V ). In model A, the generator m1 = 2r is even, and the number of white keys N1 = 2n+ 1 is always odd. Moreover, the rational number m1 /2N1 = r/N1 must be a chromatic number. In model B, the generalized diatonic scale is determined by the condition of minimality. In both models, the chromatic number m/N and the number (N − m)/N leads to the same scales.
3
Comparison of the Two Models
Doubles of chromatic numbers are not chromatic. In model A, since the chromatic number m1 /2N1 ends by R, the word of the generic pseudo-fifth m1 /N1 ends by L. If ω is the word of m2 /N2 (the black keys), the chromatic number m/N of the N -chromatic universe ends by LR. In model B, for 12 ≤ N ≤ 42, the generalized diatonic scales have the same associated words ωLR if N = 13, 14, 15, 18, 21. For these values (N = 13, 14, 15, 18, 21), there are no pseudodiatonic scales in model A. This suggests to compare the two models only in this case (N = 13, 14, 15, 18, 21). A computer program shows the following results (see also the following table). • The pseudo-diatonic scales of model A are not uniquely determined by the chromatic cardinality N since — for example — there are two pseudo-diatonic scales for N = 42. In model B, for each N , there is by definition only one generalized diatonic scale. • The pseudo-diatonic scale of model A do not always exist for each N ( in contrast to model B), even though m1 is even. In the quarter tone universe (N = 24), there is no pseudo-diatonic scale, because the number m1 /2N1 = 3/13 is not a chromatic number. In model B, the pseudo-fifth has 13 quarter tones. It is composed of two different types of pseudo-thirds. In model A, the two pseudothirds are always of the same size, determined by the number r/N1 = m1 /2N1 . • In model A, the number N1 of white keys is always odd and the generator m1 is always even. In model B, the number of notes N1 of a generalized diatonic scale is sometimes odd and sometimes even. Moreover, contrary to model A, the number N1 can be even and the generator m1 odd (N = 17, 25, 27, 29, 37, 39, 41) or the number N1 is odd and the generator is also odd (N = 36). • If m1 is even, the scales of the two models are different in two cases; (1) if the number m1 /2N1 is not a chromatic number (N = 24, 28, 32, 35, 40) and (2) if the pseudo-diatonic scale does not satisfy the condition of minimality. In this case, m1 is even in model A and odd in model B (N = 29, 37, 39, 41). • If, for both models, the pseudo-fifths or their complements (v) are evaluated in cents 1200 log2 (v) it turns out, that the range of the pseudo-fifth in model
496
F. Jedrzejewski
A varies between 660 and 1057 cents and in model B it varies between 630 and 978 cents. This shows that the pseudo-fifth must be understand as a new interval of the N -chromatic universe, and should not literally be compared with the tempered fifth (700 cents). N Model m/N = m1 /N1 ⊕ m2 /N2 12 A, B 7/12 = 4/7 ⊕ 3/5 16 A, B 7/16 = 4/9 ⊕ 3/7 17 B 5/17 = 3/10 ⊕ 2/7 19 A, B 7/19 = 4/11 ⊕ 3/8 20 A, B 11/20 = 6/11 ⊕ 5/9 22 A, B 17/22 = 10/13 ⊕ 7/9 23 A, B 7/23 = 4/13 ⊕ 3/10 24 B 11/24 = 6/13 ⊕ 5/11 25 B 9/25 = 5/14 ⊕ 4/11 26 A, B 7/26 = 4/15 ⊕ 3/11 27 B 5/27 = 3/16 ⊕ 2/11 28 B 15/28 = 8/15 ⊕ 7/13 29 A 17/29 = 10/17 ⊕ 7/12 29 B 9/29 = 5/16 ⊕ 4/13 30 A, B 7/30 = 4/17 ⊕ 3/13 31 A, B 11/31 = 6/17 ⊕ 5/14 32 A 27/32 = 16/19 ⊕ 11/13 32 B 15/32 = 8/17 ⊕ 7/15 33 A, B 7/33 = 4/19 ⊕ 3/14 34 A, B 25/34 = 14/19 ⊕ 11/15 35 B 11/35 = 6/19 ⊕ 5/16 36 B 17/36 = 9/19 ⊕ 8/17 37 A 7/37 = 4/21 ⊕ 3/16 37 B 13/37 = 7/20 ⊕ 6/17 38 A, B 29/38 = 16/21 ⊕ 13/17 39 A 17/39 = 10/23 ⊕ 7/16 39 B 16/39 = 9/22 ⊕ 7/17 40 A 7/40 = 4/23 ⊕ 3/17 40 B 19/40 = 10/21 ⊕ 9/19 41 A 25/41 = 14/23 ⊕ 11/18 41 B 13/41 = 7/22 ⊕ 6/19 42 A, B 11/42 = 6/23 ⊕ 5/19 42 A 37/42 = 22/25 ⊕ 15/17
ω Pseudo-Fifths Pseudo-Third R3 700 2/7 = RL2 R RLRL 675 2/9 = RL3 R RL2 R 847 − RLR2 758 2/11 = RL4 R R 3 L2 660 3/11 = RL2 R2 2 2 R L R 927 5/13 = RLR3 2 RL RL 835 2/13 = RL5 R 3 RLRL 650 (3/13 = RL3 RL) 2 RLR L 768 − RL2 R2 877 2/15 = RL6 R RL4 R 978 − R 3 L4 643 (4/15 = RL2 R2 L) R3 LR 703 5/17 = RL2 RLR 2 2 RL RL 828 − RL3 RL 920 2/17 = RL7 R RLR2 L2 774 3/17 = RL4 R2 2 4 R L R 1013 8/19 = RLRLR2 5 RLRL 638 (4/17 = RL3 RL2 ) 3 2 RL R 945 2/19 = RL8 R 2 2 R LR L 882 7/19 = RLR2 LR 2 3 RL RL 823 (3/19 = RL5 RL) 6 RLRL 633 − RL4 RL 973 2/21 = RL9 R RLR2 L3 778 − R2 L2 RL2 916 8/21 = RLR4 RLRL2 R 677 5/23 = RL3 R3 RLRLRL 708 − RL4 R2 990 2/23 = RL10 R RLRL7 630 (5/21 = RL3 RL3 ) 5 R L 732 7/23 = RL2 RL2 R 2 4 RL RL 820 − RL2 R2 L2 886 3/23 = RL6 R2 R 2 L6 R 1057 11/25 = RLRL2 R2
The table gives for each N -chromatic universe (first column) the Farey relation (third column) needed to construct the pseudo-diatonic scale of model A or the generalized diatonic scale of model B (given in the second column). The last column provides the chromatic number (m1 /2N1 ) for the pseudo-third of model A. Some numbers (m1 /2N1 ) in parenthesis are not chromatic numbers. The
Pseudo-diatonic Scales
497
word ω associated with the generic pseudo-fifth is given in the fourth column. The value in cents of the pseudo-fifth is in column 5. Some scales, available in both models, have a chromatic number 2/(2n + 3) associated with a word of the form RLn R, n ≥ 2. In this case, it is easy to show that the pseudo-diatonic and the diatonic scale of models A and B are the same. If n = 2k, (k = 1, 2, ...) is even, in the (7k + 5)-chromatic universe, this scale has (4k + 3) white keys and (3k + 2) black keys, 7 4 3 = ⊕ 7k + 5 4k + 3 3k + 2 and if n = 2k +1 is odd, the scale has, in the (7k +9)-chromatic universe, (4k +5) white keys and (3k + 4) black keys: 4 3 7 = ⊕ 7k + 9 4k + 5 3k + 4 Acknowledgements. I would like to thank Thomas Noll and the organizers of the Conference for helpful information and comments.
References Agmon, E.: A Mathematical Model of the Diatonic System. Journal of Music Theory 33(1), 1–25 (1989) Agmon, E.: Linear transformations between cyclically generated chords. Musikometrika 3, 15–40 (1991) Brocot, A.: Calcul des rouages par approximation, nouvelle m´ethode. Revue Chronom´etrique 3, 186–194 (1862) Carey, N.: Coherence and Sameness in Well-formed and Pairwise Well-Formed Scales. Journal of Mathematics and Music 1(2), 79–98 (2007) Carey, N., Clampitt, D.: Aspects of Well-Formed Scales. Music Theory Spectrum 11(2), 187–206 (1989) Clough, J.: Aspects of Diatonic Sets. Journal of Music Theory 23, 45–61 (1979) Clough, J., Myerson, G.: Variety and Multiplicity in Diatonic Systems. Journal of Music Theory 29(2), 249–270 (1985) Farey, J.: On a Curious Property of Vulgar Fractions. Philosophical Magazine 47, 385– 386 (1816) Gould, M.: Balzano and Zweifel: Another Look at Generalized Diatonic Scales. Perspectives of New Music 38(2), 88–105 (2000) Jedrzejewski, F. (ed.): La loi de la Pansonorit´e. Ivan Wyschnegradsky, Contrechamps, Gen`eve (1996) Jedrzejewski, F.: Math´ematiques des syst`emes acoustiques, Temp´eraments et mod`eles contemporains. L’Harmattan, Paris (2002) Jedrzejewski, F.: Dictionnaire des musiques microtonales. L’Harmattan, Paris (2003) Jedrzejewski, F.: Mathematical Theory of Music. Editions IRCAM/Delatour, Sampzon (2006) Jedrzejewski, F.: Generalized Diatonic Scales. Journal of Mathematics and Music 2(1), 21–36 (2008)
498
F. Jedrzejewski
Lewin, D.: Generalized Musical Intervals and Transformations. Yale University Press, New Haven (1987) Noll, T.: Facts and Counterfacts: Mathematical Contributions to Music-Theoretical Knowledge. In: Bab, S., et al. (eds.) Models and Human Reasoning - Bernd Mahr zum 60. Geburtstag. W&T Verlag, Berlin (2006) Noll, T.: Musical Intervals And Special Linear Transformations. Journal of Mathematics and Music 1(2), 121–137 (2007) ¨ Stern, M.: Uber eine zahlentheoretische Funktion. Journal f¨ ur die reine und angewandte Mathematik 55, 193–220 (1858) Vicinanza, D.: Paths on the Stern-Brocot Tree and Winding Numbers of Modes. In: Proceedings of the ICMC, Barcelona (2005) Wyschnegradsky, I.: 24 Pr´eludes in Vierteltonsystem. M.P. Belaieff, Frankfurt (1979)
Affinity Spaces and Their Host Set Classes José Oliveira Martins Eastman School of Music, University of Rochester [email protected]
Abstract. This paper proposes the organization of pitch-class space according to the notion of affinities discussed in medieval scale theory and shows that the resultant arrangement of intervallic affinities establishes a privileged correspondence with certain symmetrical set classes. The paper is divided in three sections. The first section proposes a pitch-class cycle, the Dasian space, which generalizes the periodic pattern of the dasian scale discussed in the ninthcentury Enchiriadis treatises (Palisca 1995). The structure of this cycle is primarily derived from pitch relations that correspond to the medieval concepts of transpositio and transformatio.1 Further examination of the space’s properties shows that the diatonic collection holds a privileged status (host set class) among the embedded segments in the cycle. The second section proposes a generalized construct (affinity spaces) by lifting some of the intervallic constraints to the structure of the Dasian space, while retaining the relations of transpositio and transformatio, and the privileged status of host set classes.2 The final section examines some of the properties of host set classes, and in turn proposes “rules” for constructing affinity spaces from their host sets. The study of affinity spaces will give us insights regarding scalar patterning, inter-scale continuity, the combination of interval cycles, voice leading, and harmonic distance.3
1 Affinities in the Medieval Dasian Scale The idiosyncratic dasian scale discussed in the late ninth-century Enchiriadis treatises is constituted by a series of tone-semitone-tone tetrachords consistently separated by a tone of disjunction (see Example 1a). This arrangement replicates the four modal qualities (protus, deuterus, tritus, and tetrardus) of the finales tetrachord throughout 1
For a discussion of the interpretative attitudes involving these concepts see Pesce (1986 and 1987). 2 In earlier work, I propose the Dasian and other affinity spaces as suitable analytical frameworks to address harmonic and melodic aspects that result from the combination of different scales in twentieth-century music. For analytical accounts of Béla Bartók’s polymodality and Witold Lutoslawski’s “12-note” music see Martins (2006a); for the analysis of Igor Stravinsky’s harmonic syntax in a neoclassical work, see Martins (2006b); and for the interpretation of Darius Milhaud’s polymodality see Martins (2007). 3 This study intersects in interesting ways with work developed by several theorists: Philip Lambert and Edward Gollin on the combination of interval cycles and multi-sets, see Lambert (1990) and Gollin (2007); Carey and Clampitt (1996); and Tymoczko (2004) on scalar theory. T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 499–511, 2009. © Springer-Verlag Berlin Heidelberg 2009
500
J.O. Martins
the scale, since the semitone placed in the center of each tetrachord between deuterus and tritus creates equidistant intervallic markers and allows for establishment of intervallic affinities at the perfect fifth throughout the scale. This scale structure allowed the transfer of melodic segments at both the upper and lower fifth to retain their modal identities. Such transfer was referred to as transpositio and preserved the segments’ local intervallic structure, as illustrated in Example 1b, where the arrows point to the preservation of the modal quality deuterus. Example 1. The dasian scale (Enchiriadis treatises)
The intervallic arrangement of the dasian scale, however, overrides modal affinity at the octave since the same pitch class is assigned to different modal qualities in different tetrachords. In Example 2, the circled pitch class (pc) A appears as deuterus, protus and tetrardus in the scale.4 The use of B-flat in the Graves tetrachord
4
Guido d’Arezzo considered affinity (likeness) to be completed only at the octave, which is the reason why Guido and subsequent theorists rejected the dasian scale. This is not to say that the Enchiriadis treatises do not acknowledge the privileged status of the octave relation in their musical system, but rather that affinities at the fifth are of a different consonant “quality” (consonae) than octave “hyper-consonant” relations (aequisonae). See Pesce (1986, 337), and Mengozzi (2006).
Affinity Spaces and Their Host Set Classes
501
inflecting B-natural in the Superiores changes the modal quality of the adjacent pc A from protus in the Superiores to deuterus in the Graves tetrachord; similarly, the use of F# in the Excellentes tetrachord instead of F-natural makes pc A sound as tetrardus. In other words, pc A has different modal qualities in each tetrachord given the different distribution of tones and semitones surrounding each appearance in the scale. The same could be said for other pcs in the scale. In the medieval notation of chant, Guido d’Arezzo warned against the change of modal quality created by note alterations to a pure diatonic framework. He referred to such modal change as transformatio. But even though disapproved by Guido, transformatio became a powerful practice extending the flexibility of the tonal system. Transpositio and transformatio thus capture significant pitch relationships: transpositio refers to the transfer of modal quality (local interval pattern) to various positions in the tonal system; whereas transformatio refers to the change of modal quality (local interval pattern) on a given pitch class effected by a note alteration, as well as to the note alteration itself. Example 2. Different modal qualities for the same pitch-class in the dasian scale
2 The Dasian Space Example 3 proposes a pitch-class generalization of the dasian scale by extending the species of fifth beyond the medieval scale’s boundaries (the disjunct tetrachords of the scale are marked by brackets and Roman Numerals). This generalization closes the space in two ways: by exhausting the 12 transpositions of the tetrachordal interval pattern, and by creating a pitch-class cycle.5 This framework renumbers the modal quality of each note in the dasian space, such that tritus is 0, tetrardus is 1, protus is 2, and deuterus is 3. In this arrangement, semitones always fall between adjacent positions 3 and 0, and each of the 12 semitones occurs only once in the space.
5
Edward Gollin (2007) refers to pitch-class cycles that exhaust the aggregate multiple times as multi-aggregate cycles. Although Gollin does not address specifically the structure of the Dasian space, the space exhausts the aggregate four times.
502
J.O. Martins
The affinity of the space is expressed by the interval of transpositio 7, which encompasses as a modular unit of three recurrent whole tones and a semitone, i.e., 7 = 3 • 2 + 1 (mod 12). Example 3. The Dasian space and the transpositio relation.
Example 4 shows that the transformatio relation has two outcomes in the Dasian space: it either produces a note substitution, by ascending a pitch class by a semitone; or it retains the same pitch class. In both cases, transformatio corresponds to moving seven steps in clockwise direction.
Affinity Spaces and Their Host Set Classes
503
Example 4. The transformatio relation in the Dasian space.
3 Four Properties of the Dasian Space The examination of continuous segments embedded in the Dasian space allows us to establish the following four properties: (1) Set-class consistency: every continuous 7-note segment is a diatonic collection. Segments of different cardinalities yield different set-classes, but 7-note segments yield a single set class, which I call the host set class of the space. (2) Set-class completion: The space contains all 12 transpositions of this scale, which are related by perfect fifth, the interval of transpositio (t). (3) Set locality: Each diatonic scale appears in a single region of the space, extending through a major-10th segment. Regions are defined by Carey and Clampitt (1996, esp. 124-125) and constitute the largest space in which both parallel octaves and fifths can be maintained.
504
J.O. Martins
(4) Every single “step” or “shift” of a 7-note segment in clock- or counterclockwise direction either substitutes a pitch class for its replication or for a pitch class a semitone away. Both of these relations are structured by transformatio. In Example 5a, successive clock- or counterclockwise steps of 7-note segments yield the same collection four times before a new pitch is introduced. Example 5b shows that the transformatio relation that produces note change corresponds to what David Lewin refers to as a Cohn flip, i.e., when inverted around a given axis, a diatonic set produces a near symmetrical mapping, where one of the notes is substituted by another adjacent note in the space (Lewin 1996, 181-216). In the example, the whitenote diatonic collection can Cohn-flip in two familiar ways, each retaining six of the pitches and exchanging B for Bb, and F for F#. The transformatio relation thus has an important function in defining the host set of the Dasian space as the number of steps spanned by the relation coincide with the cardinality of the host set class. Example 5. “Step” motion and transformatio in Dasian space
The role of the diatonic collection as a host set class, however, is not unique to the Dasian space. Example 6 shows three additional cycles: the Guidonian space, the Hauptmann’s line-of-thirds, and the circle-of-fourths.6 In all of these spaces, diatonic collections are maintained as 7-note segments and the transformatio relation spans seven contiguous steps. As with the dasian space, transformatio either retains the same pitch-class seven steps away, or as shown by the dotted arrows, connects pitchclasses related by semitone belonging to adjacent host sets in the space. In addition, the four properties that characterize the pitch structure of the Dasian space also apply to these spaces. 6
The Guidonian space derives its name from maximally overlapping the Guidonian hexachordal arrangement.
Affinity Spaces and Their Host Set Classes
505
Example 6. The diatonic collection as the host set-class of several cycles
4 Affinity Spaces Example 7 proposes a formula (t = n • r + s (mod 12)) that models the interval relations in the modular unit of a generalized construct I call affinity spaces. Let us call affinity spaces to any concrete realization of such construct. The formula lifts the constraints on interval size and interval recurrence specific to the Dasian space, but retains the roles performed by the relations of transpositio and transformatio, and continues to grant privileged status of host set classes.7 In this formula, t corresponds to the interval of transpositio, measuring the affinity of the space; r is the interval of recurrence, and generalizes the “whole tone” of the dasian space; n counts the recurrence of r; and s generalizes the “semitone,” the unique interval of the modular unit. There are two outcomes for transformatio in any affinity space (Example 7b): it either retains the pitch class, in which case the order position of that pitch class descends one order position within the modular unit. When transformatio is applied to a pitch class in the zero order position within a modular unit, it substitutes it for another one in a position n, such that the interval between the two pitch classes is given by r – s. When the interval of transpositio t is co-prime with 12 (i.e., when it is 1, 5, 7, or 11), we can then calculate the number y of steps a transformatio spans (Example 7c). y is also the cardinality of the host set class. In the case of the Dasian space as addressed above, a transformatio corresponds to moving seven steps. 7
Although conceived independently, this generalization intersects in a number of ways with the work of Gollin (2007).
506
J.O. Martins
Example 7. Affinity spaces
Let us now examine the embedding of host set classes in some affinity spaces. Example 8 shows twelve different affinity spaces, which share the same interval of transpositio (t = 5), the recurrence n = 2, and consequently also the cardinality of space (mod 36). A circle-of-fourths in each space is highlighted in order to point out the affinity at-the-fourth in all spaces. In each space, arrows indicate the number of STEPs spanned by transformatio.8 Solid arrows indicate pitch-class retention and dotted arrows indicate note substitution. In each space, the cardinality of the host set-class equals the number of STEPs involved in a transformatio. Notice that six of the spaces have positive STEP values, and the other six have negative values. I have represented the spaces this way in order to keep the shortest number of STEPs spanned by the spaces’ transformatio operation.
8
Let us define STEP as the operation that moves clockwise in a cycle, such that STEP1 moves clockwise one station, between adjacent elements; STEP2 moves clockwise 2 stations, etc. STEP-1 moves counterclockwise one station, etc.
Affinity Spaces and Their Host Set Classes
507
Example 8. Twelve affinity spaces structured by 5 = 2 • r + s
The table of Example 9 organizes the twelve affinity spaces presented in Example 8. Going down the list of spaces in the left column shows that the interval r (the recurrence within the modular unit) increases by five semitones (or a perfect fourth) and the interval s (the unique single interval) increases by two semitones. The second column shows that are only three values for the interval of transformatio that produces note substitution, i.e., 1, 4, 7, and 10 (mod 12). The third column shows that the increasing number of STEPs involved in a transformatio increases by three for positive values and then decreases by three for negative values. The fourth column is subdivided, by the dotted line, pairing complementary host set classes for each space.
508
J.O. Martins
Example 9. The table for the twelve affinity spaces under 5 = 2 • r + s
Example 10 examines this complementary relation in two spaces: 5 = 2 • 2 + 1 (mod 12) (the Guidonian space), and 5 = 2 • 3 + 11 (mod 12). Example 10a shows that starting with pc C in position 0, the transformatio relation spans STEP -7 from C to C#. This relation signals the boundaries of two adjacent host sets, which in the case of the Guidonian space is the diatonic collection. Another way of inflecting C to C# in the same positions of the space is to go clockwise by STEP 29. This path produces another set class that is also consistent throughout the space. The host set corresponding to STEP 29 is the pentatonic collection plus two-times-the-aggregate. In other words, the space in fact has two host set classes, which are complements of each other plus twice the exhaustion of the aggregate. Example 10b presents another space, whose transformatio spans STEP 8, and the host set class has cardinality 8. The alternative counterclockwise path of STEP -28 produces a complementary set class (plus twice the aggregate) to the host set produced by STEP 8. In short, the transformatio relations produced by clock- and counterclockwise paths signal complementary host sets (plus a number of aggregates) for a given affinity space.9
9
It is also interesting to notice that those spaces that share the same host set class (plus or minus the aggregate) also share the interval mod 12 of the transformatio.
Affinity Spaces and Their Host Set Classes
509
Example 10. Complementary host set classes for 5 = 2 • 2 + 1 and 5 = 2 • 3 + 11
5 Three Properties of Host Set Classes We are now in a position to ask the question: What are the properties of host set classes? Can any set class be a host in an affinity space? Generalizing from the host sets addressed so far, (1) all are inversionally symmetrical (i.e., they map into themselves under some axis); and (2) all produce a near self-mapping “flip,” that is, they map into themselves except for one note. The interval between unmatched notes is the interval between substituted and substituting notes in a transformatio. However, unlike what we observed for Cohn-flips in the diatonic collection, the flip that corresponds to a transformatio does not have to substitute adjacent notes in other host sets (i.e., it produces a kind of unrestricted Cohn-flip). Finally, (3) host sets are either interval-cycle segments or a combination of i-cycle segments.10
6 Generating Affinity Spaces Let us now address the reverse case: Given a set class with the three host-set properties, what are the “rules” for generating an affinity space?11 We first arrange the host set into an ascending i-cycle, and partition it into ascending i-cycle segments of cardinality c, and one segment of cardinality c - 1. Then, we arrange all segments to reflect a constant transpositional relation between their first elements (there might be a few ways of doing this). These transpositional relations correspond to possible 10
Host set classes can be arranged into i-cycle segments when t = 1, 5, 7, or 11; or arranged into a combination of i-cycle segments when t = 2, 3, 4, 6, 8, or 10. 11 I’ll address here only the cases for t = 1, 5, 7, or 11.
510
J.O. Martins
values for t (transpositio). Finally, we determine the values for f (transformatio) by producing the two near self-mapping flips.12 For instance, consider the i-cycle segment {1, 2, 3, 4, 5, 6, 7}, which has the three host set properties. Example 11a arranges the set as an i-cycle segment and partitions or de-cycles it in eight different ways. In Example 11b, the two near self-mapping flips give us the values of the two possible transformatio relations. Example 11c shows three spaces that result from the eight possible arrangements of i-cycle segments.13 Several of the arrangements give rise to what amounts to equivalent affinity spaces: the if r or s = 0 then t = s or t = r in (1) and (2); retrograde cyclic orderings produce the same space when “read” clock or counterclockwise—for instance (3) and (4), or (7) and (8); and the permutation of r and s in a 2-element modular unit results in the same ordering of elements in the space—for instance (3) and (5), or (4) and (6). Example 11. Generating affinity spaces from the set {1, 2, 3, 4, 5, 6, 7}
12
As we’ve seen for the case of the diatonic collection, there’s no one-to-one correspondence between a host set class and an affinity space. 13 The 7-note set is not the only contiguous segment that can serve as host set in the complete icycle 1 or “chromatic scale” of Example 11c; rather, any contiguous segment can function as a host set for the i-cycle. In general, complete i-cycles are trivial realizations of affinity spaces and do not exclusively embed a single host set class. A similar condition applies to the circle-of-fourths in Example 6.
Affinity Spaces and Their Host Set Classes
511
7 Conclusion The transformatio relation enables a gradual harmonic “modulation” across members of the host set class by retaining all but one note in the near self-mapping flip between adjacent host sets of an affinity space. The note change involved in the harmonic modulation can take the form of parsimonious voice leading when the value of (the note change in) transformatio is 1 (or 11), as is the case for diatonic or pentatonic host sets.14 Example 11 demonstrates, however, that harmonic modulation can rely on less parsimonious transformatio values, which nevertheless entail a harmonic gradation across set-class members. The several possibilities for de-cycling a given host set create a relation (one-to-several: between a given host set class and the corresponding affinity spaces) that is theoretically rich and analytically useful.
References Carey, N., Clampitt, D.: Aspects of Well-Formed Scales. Music Theory Spectrum 11(2), 187– 206 (1989) Carey, N., Clampitt, D.: Regions: A Theory of Tonal Spaces in Early Medieval Treatises. Journal of Music Theory 40(1), 113–147 (1996) Gollin, E.: Multi-Aggregate Cycles and Multi-Aggregate Serial Techniques in the Music of Béla Bartók. Music Theory Spectrum 29(2), 143–176 (2007) Lambert, J.P.: Interval Cycles as Compositional Resources in the Music of Charles Ives. Music Theory Spectrum 12(1), 43–82 (1990) Lewin, D.: Cohn Functions. Journal of Music Theory 40(2), 181–216 (1996) Martins, J.O.: The Dasian, Guidonian, and Affinity Spaces in Twentieth-century Music. Ph. D diss., University of Chicago (2006a) Martins, J.O.: Stravinsky’s Discontinuities, Harmonic Practice and the Guidonian space in the ‘Hymne’ for the Serenade in A. Theory and Practice 31, 39–64 (2006b) Martins, J.O.: Diatonic reorientation in dual-organization spaces: interpreting polymodality in works of Milhaud. In: Presented at the annual meeting of the Music Theory Society of New York State (2007) Mengozzi, S.: Virtual Segments: The Hexachordal System in the Late Middle Ages. Journal of Musicology 23(3), 426–467 (2006) Palisca, C.V. (ed.): Musica enchiriadis and Scolica enchiriadis. Trans. Raymond Erickson. Music Theory Translation Series. Yale University Press, New Haven (1995) Pesce, D.: B-Flat: Transposition or Transformation? The Journal of Musicology 4(3), 330–349 (1986) Pesce, D.: The Affinities and Medieval Transposition. Indiana University Press, Bloomington (1987) Tymoczko, D.: Scale Networks and Debussy. Journal of Music Theory 48(2), 219–294 (2004)
14
The value transformatio f = 1 can take the form of an augmented unison (or chromatic inflection) as in the case of the diatonic collection or of a diatonic semitone as in the case of a pentatonic collection. In both of these cases the transformatio relation corresponds to what Carey and Clampitt (1989, 192–93) refer to as “primary intervals” of well-formed scales.
The Step-Class Automorphism Group in Tonal Analysis Jason Yust University of Alabama, Tuscaloosa
Until recently, researchers who have dealt formally with tonal hierarchy (prolongation) have considered only models in which the objects of the hierarchy are musical events (where a musical event might be a chord or a note in a particular voice).1 In contrast, Yust (2006) proposes a concept called “dynamic prolongation” in which the objects of tonal hierarchy are motions between events rather than events themselves. The events in the model of Yust (2006) are chords made up of harmonically related pitches from several voices. In the present study I develop a different approach to dynamic prolongation. Rather than expressing harmonic relations between notes by grouping them into chords, we can treat harmonic relations as intervals and mix them in a hierarchy with melodic motions. This creates a model that can posit long-range harmonic relationships and blur the boundaries between intervals of harmonic and melodic significance. A particularly beautiful possibility opened up by this approach is to consider intervals in terms of step-class values and to view prolongational relationships between intervals in terms of step-class symmetries. A step-class is the residue modulo seven of the number of diatonic steps that separate two notes (referred to as “unison,” second,” third,” etc.). The group of step-class intervals is therefore isomorphic to the integers under addition modulo seven (Z7). The automorphisms (i.e. symmetries) of Z7 (isomorphic mappings of Z7 to itself) can be thought of as multiplications modulo 7, and form a group (Aut(Z7)) isomorphic to Z6, which is in turn isomorphic to the direct product Z2 × Z3. Step-class inversion, which I’ll denote I, represents the order-two component of step-class interval automorphism group. The “halving” operation, H, which takes fifths to thirds, thirds to seconds, seconds to fifths, and so forth, is a representative order-three operation. Together these two operations generate the group, {1, H, H-1, I, HI, H-1I}. The order-three operations of this group are of particular interest as prolongational operations on intervals. This is illustrated in Example 1a, which shows intervals at various voice-leading levels with, slurs. Moving from one level to the next, the operation H relates a prolonged interval to two motions that divide it in half. Three types of basic prolongational units result from the operation: the prolongation of a second by the framing fifths of a dominant–tonic progression, the triadic filling-in of a fifth, and the filling-in of a third with a passing motion. The relationships between step-class motions in Example 1a take the form of a binary tree. We can also view this structure as a maximal outerplanar graph, or “MOP,” as in Example 1b, with notes as vertices and intervals of the hierarchy as edges. (Yust 2006 deals extensively with mathematical properties of MOPs and their musical consequences.) The staff above shows the MOP shows intervals in a modified Schenkerian notation that shows third-motions with slurs and fifth-motions with beams. Using step-class automorphisms as prolongational transformations suggests the following interpretation of the resulting MOP: step-class intervals define a space that 1
Rahn 1979, Smoliar 1980, Lerdahl and Jackendoff 1983, Cohn and Dempster 1992.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 512–520, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Step-Class Automorphism Group in Tonal Analysis
513
Example 1. H as a prolongational operation can occupy a number of orientations. The different orientations define familiar musical perspectives on step-class space: a melodic space where notes a second apart are close together, a space of thirds that defines harmonic closeness, and a fifths-space that relates harmonies and keys. The presence of a particular step-class motion in a piece of music can be seen as an orientation of musical space in terms of that motion. That is, while the listener experiences that motion, thinking at the prolongational level where it occurs means thinking of a corresponding orientation to step-class space. The listener then moves conceptually between different prolongational levels by means of automorphic transformations of the space. The simplest relationship between levels of musical motion is the automorphism H, which defines a symmetrical binary filling-in of an interval. For example, a diatonic sequence by fifths once initiated implies continuation through further motion by fifth. But after hearing a couple repetitions of the sequence the listener can mentally move to a higher level of musical organization, hearing a larger sequence by second that suggests continuation by second. By hearing at various such levels at once the sequence by fifths sounds not like a homogenous motion that could start and end at any time, but at certain points gives a sense of completion of larger-scale motions from earlier tonal events in the piece and initiation of motions that may receive later completion. The interval hierarchy includes not only such melodic motions, however, but also includes intervals that are found in the music as harmonic relationships at various levels. We are motivated to mix these interval types because the familiar structures that result from prolongational step-class automorphisms include passing figures, which are best expressed melodically, triads, which can be realized vertically, and fifth-progressions, which can come about through a combination of vertical and melodically articulated intervals. The total ordering of pitches implied by MOPs combined with the interpretation of intervals in the MOP both as melodic and harmonic intervals evokes Heinrich Schenker’s important idea of horizontalization or unfolding—that intervals of harmonic significance can be transformed into melodic intervals and thereby subjected to melodic elaboration. In fact, it takes the idea one step further by projecting a direction onto undirected foreground harmonic intervals so that intervals that remain vertical on the musical surface are horizontalized in the analysis.
514
J. Yust
The interaction between unordered step-class sets and ordered, prolongationallyinterpreted sets elevates the musical meaning that we can extract from these prolongational structures. Example 2 classifies the small number of transpositional types of step-class sets of order three and four. (Sets are shown on a generic staff to abstract them from particular pitch-class instantiations). The trichords symmetrical with respect to inversion function as the basic units of prolongational structures and provide an immediate musical interpretation of the structures in terms of the familiar musical-theoretic entities of passing motions, triads, and fifth-progressions. Figure 3 shows prolongational interpretations of these trichords and how they are transformed by the step-class automorphisms.
Example 2. A classification of step-class sets
Example 3. Automorphic transformations of step-class trichords
The Step-Class Automorphism Group in Tonal Analysis
515
Example 4. Automorphic transformations of step-class tetrachords On the other hand, the step-class tetrachords that play a significant direct role in prolongational structures are not the more familiar inversionally-symmetrical sets but the H–symmetrical sets. These sets function as minimal tonal regions because they include exactly one instance of each type of inversionally-symmetrical trichord, and because a step-class tetrachord of this sort can prolong itself indefinitely due to its symmetry with respect to H. Example 4 shows prolongational instantiations of these tetrachords and their transformations under the step-class automorphisms. Note that H and H–1 don’t change the tetrachord but reorder it to change the background interval and the types of trichords that make up the prolongational structure. The possibility of such prolongational reorientations allows these tetrachord types to prolong themselves indefinitely. Example 5 shows a few possible prolongational relationships between tonal regions. Example 5a shows how a region can right-prolong itself through prolongational reorientation, and Examples 5b–d show possible relations between a region and its dominant, supertonic, and mediant regions. Note that there are two possible pitch-class manifestations of the step-class set for the basic tonal region: one major and one minor. This is because the fifths occurring in the set are always perfect fifths, and the remaining step-class not related to the others by fifth can be a minor or major third with respect to the set’s triad. The shading of triads in the MOP facilitates reading of the structures and will be used as a convention for the remaining examples. The method of step-class symmetries also generates an interesting approach to the prolongational interpretation of seventh chords (an inversionally-symmetrical type of step-class set). Example 6a shows the typical configuration of a dominant-tonic progression, and Example 6b shows how the addition of a seventh to the dominant may be prolongationally interpreted in terms of a passing motion. The nature of the dominant seventh chord by itself implies this sort of interpretation, in fact. Consider such a chord presented in isolation, as in Example 6c.
516
J. Yust
Example 5. Relationships between basic tonal regions
Example 6. Prolongational interpretations of dominants
The Step-Class Automorphism Group in Tonal Analysis
517
Though the chord contains two triads, only one is bounded by a perfect fifth, so this major triad stands out as a prolongational unit. Yet, there’s no way for the remaining note, C, to make a complete prolongational structure with this triad, so it implies a continuation that will make a completion of the structure possible. A continuation of the stepwise motion D–C to B incorporates the note C into a prolongational unit, but the entire structure is incomplete and requires further continuation to the note G. In this sense, the dominant seventh chord by itself implies a continuation to its tonic. The (pitch-class) intervallic makeup of the chord is essential to this interpretation; Example 6d shows the simpler prolongational implications of a half-diminished seventh chord. The theory of step-class symmetries also leads to new outlooks on harmonic elaborations of the basic dominant-tonic structure. As Example 7a shows, the supertonic triad is a natural expansion of the stepwise motion from the third scale degree in the basic structure. The subdominant triad, on the other hand, occurs more naturally in an expansion of stepwise descent from the fifth of the tonic mirroring the progression of dominant to tonic on the right, as shown in Example 7b.
Example 7. Prolongational interpretations of pre-dominant harmonies There is one step-class interval that is not related to others through automorphism: the unison step-class interval. Therefore a theory in which harmonic intervals and melodic motions relate entirely through symmetries of the step-class group excludes the idea of repetition as a kind of motion or unison as a kind of interval. An analysis cannot express a melodic neighboring motion as a prolongation of a melodic repetition as in Example 8a, because the repetition doesn’t relate to the step intervals through any automorphism. This idea can be replaced, however, with the more compelling notion of embedded repetition of a prolongational structure. When we observe a departure and return, we first put the departure into a larger context of continuation. For instance, the motion B–A, a departure from B, implies a continuation to G (which is provided in Example 8a). We then interpret the return as a delay of the continuation by repetition.
518
J. Yust
Example 8. Embedded repetitions and extensions of prolongational structures In harmonic context, the best interpretation of the melodic motions shown in Example 8a would be in the form shown in Example 8b, a common type of background structure (as in a sonata first movement, for example). This analysis consists of the (right) embedded repetition of a tonal region, D–B–A–G, which can be accomplished entirely through symmetrical elaborations (i.e. H prolongations). Here, the motion B–A is part of a larger structure whose resolution to G is interrupted with a return to D–B. However, in certain cases the analyst may want to directly embed a repetition of a simple trichord structure, which one cannot do solely with symmetrical elaborations (Hprolongations). Example 8c accomplishes this with an asymmetrical prolongational unit where one motion is related to the prolonged motion by I (shown with a loop-arrow) and one is related to the prolonged motion by H–1 (shown with a double line). A different application of this sort of asymmetrical prolongation can replace the type of neighboring motion shown in Example 8d with a prolongational extension of a trichord (here a passing motion), as in Example 8e. In this case the motion B–C is not an initiation of a prolongational unit with an eventual continuation, but the reversal of a passing motion to A which then finds continuation. We can understand prolongational extension better by considering the asymmetrical prolongation as an involution or reflection of the normal symmetrical type. Example 9a shows how an asymmetrical prolongation results from reflecting a normal passing figure along one of its more foreground motions (B–A), pulling the other foreground motion (A–G) in behind what would ordinarily be the background interval (B–G). The loop-arrow indicates the flipped interval while the double line indicates the exposed background interval. Examples 9b–c show how the idea of involution can be extended to larger prolongational structures, drawing any foreground interval into the background. The advantage of deriving asymmetrical prolongations in this way is that one can base an interpretation of the resulting structure on the symmetrical one that it involutes. For instance, the structure of Example 9c gives a prolongation of a descending minor second by making it the resolving seventh of a V7–I progression.
The Step-Class Automorphism Group in Tonal Analysis
519
Example 9. I/H-1 prolongations as a reflections or involutions of H-prolongations Another way to think of the I/H–1 prolongations is as an elision from a normal structure, which is illustrated in Example 10. Excising the root of the dominant, D, collapses two levels of structure into one (indicated by the double line from A to F# in the resulting structure), and turns the indirect leading-tone motion F#–G into an element of the prolongational hierarchy.
Example 10. The I/H–1 prolongation as an elision Example 11 uses the tools developed so far to give a representation of Schenker’s middleground analysis of the Largo of Beethoven’s op. 10 no. 3 (excluding details of the recapitulation) from Der Freie Satz. The analysis begins (at the most background level) from the basic D minor tonal region, elaborated with the standard interruption structure. The resolving tonic of this background structure is pulled back to the left in two places. Bringing the tonic back as the prolongation of A–F provides the foundation for the initial ascent, and as the prolongation of F–E it establishes the initial D-minor tonality of the second theme. The extended A minor region of the exposition occurs in an embedded A minor repetition of the background descent through the dominant (A major). The first part of this descent hosts the C major area of the exposition. (I depart from Schenker here in showing the dominant of C major following the arrival on E, m. 13, as it does in measures 14 and 16.) Schenker shows the startling F major area of the middle section initiating an ascending motion to regain the E of the fundamental line. The step-class symmetry analysis produces this ascending motion as an inversion of the preceding E–A motion. The chord F major is a triadic extension of the overriding A minor.
520
J. Yust
Example 11. Schenker’s analysis of Beethoven’s op. 10 no. 3, 2nd mvt., and a step-class automorphism analysis
Bibliography Cohn, R., Dempster, D.: Hierarchical Unity, Plural Unities: Towards a Reconciliation. In: Disciplining Music: Musicology and its Canons, edited by Katherine Bergeson and Philip V. Bohlman, pp. 156–181. University of Chicago Press, Chicago Lerdahl, F., Jackendoff, R.: A Generative Theory of Tonal Music. MIT Press, Cambridge (1983) Rahn, J.: Logic, Set Theory, Music Theory. In: College Music Symposium, vol. 19(1), pp. 114– 127 (1979) Schenker, H.: Neue Musikalischen Theorien und Phantasien III: Der Freie Satz. In: Jonas, O. (ed.), 2nd edn. Universal Edition, Vienna (1956) Schenker, H.: Free Composition. Phantasies, Trans. Ernst Oster, vol. III. Longman, New York (1979) Smoliar, S.: A Computer Aid for Schenkerian Analysis. Computer Music Journal 4(2), 41–59 (1980) Yust, J.: Formal Models of Prolongation. Ph.D. diss., University of Washington (2006)
A Linear Algebraic Approach to Pitch-Class Set Genera Atte Tenkanen Department of Musicology University of Turku, Finland [email protected]
Abstract. The concept of interval-class vector (ICV) plays an important role in musical set theory. ICV can be seen as a six-dimensional representative of the intervallic content of a set class (SC), which often forms the basic tool in harmonic analysis of twentieth century music. Interrelations between SCs have been evaluated by means of similarity functions and clustering techniques in many contexts. SCs have also been classified into ’families’ called genera. Among others, six-, sevenand twelve-part systems have been outlined. Using the linear algebraic concept of spanning, the six-part genera is unambiguously justified. A group of interval-class vectors, which actually represent points, restrict a finite area in a six-dimensional space. Using the determinant of a matrix, the volume of the area formed by a six-member set class combination1 can be calculated. All possible set groups among TnI-type trichords or hexachords, which produce maximum volumes, are detected. Both the extreme points on the edges of SC space as well as the most neutral set classes in the middle of SC space are recognized using three different methods derived from linear algebra, namely the determinant of a matrix, cosine distance and principal component analysis. A short final demonstration concerns volumes of hexachord combinations used by Finnish composer Magnus Lindberg in his chaconne chains.
1
‘Corner-Stone Set-Classes’
The harmonic possibilities of our equal-tempered system are, in respect of all possible pitch combinations, endless. However, we can reduce this information and find some primary intervallic characteristics which originate from the mathematical properties of the octave divided into 12 equal intervals. When classifying pitch-class sets into families, music set-theorists talk about the concept of pitchclass set genera (Forte 1988) or simply genera (Quinn 2004). Usually six main sonorities have been distinguished and seen as projections of the six interval classes (ibid., p.2). This scheme of the six-part genera can also be justified by the linear algebraic concept of ’spanning’, which we utilize in this paper. When comparing 224 TnI-type set classes (SC) some of them seem to emerge as somewhat divergent from the others, 3-12, 4-28 and 6-35 just to mention a few. 1
That means type M6x6 (R) matrix.
T. Klouche and T. Noll (Eds.): MCM 2007, CCIS 37, pp. 521–530, 2009. c Springer-Verlag Berlin Heidelberg 2009
522
A. Tenkanen
These sets can be seen as sort of ’corner-stones’ of a six-dimensional pitch-class set space, because they are literally on the corners of the space they form with the set-classes of the same cardinality. The reasons for their special position have been well documented (ibid., passim). To form an overall picture of their relative positions among others they have been visualized in two-dimensional space (see e.g. Eriksson 1986). At times this has been done computationally using dimensionality reduction techniques such as multidimensional scaling (Samplaski 2005) and hierarchical clustering (Quinn 2001). Our purpose in this presentation is to find the corner-stone SCs and to provide an unambiguous numerical interpretation of the limits of spaces which are generated by combinations of 6 interval-class vectors (ICV) by using the determinant of a matrix.2 If we thereafter want to outline relationships between the extreme point SCs and other SCs, it can be done using some similarity or distance measure. However, it is impossible to reduce six-dimensional information visually to two dimensions, as is done by multidimensional scaling without partly odd results, regardless of the method used. This is not a problem with the numerical methods presented here.
2
Applying Cosine Distance and the Determinant of a Matrix with Musical Set Classes
There is one well-known and much used similarity measure for SCs which has a direct connection to linear algebra, namely the cosine of angle between two vectors (Rogers 1999). The idea behind this measure is: when the angle between two (interval-class) vectors decreases, the similarity between them increases. If two vectors are parallel to each other (maximally similar), the value of the cosine is 1. In case of the minimum (0) the two vectors are orthogonal to one another. The other basic linear algebraic method which we apply here, namely the determinant of a matrix, is also connected to orthogonality. However, the two aforementioned applications belong to different categories according to their applicational scope. The determinant is a single number which give information about certain properties of an nxn-type (square) matrix. There are many useful applications of the determinant. For example, it gives the volume of an n-dimensional ’box’ (called parallelepiped) which has been formed from row or column vectors of a matrix. In this study, these rows are built up by six ICVs. To generate or ’span’ the whole six-dimensional interval-class space, it is sufficient to select six linearly independent vectors. Vectors v1 , v2 , ..., vn are linearly independent if and only if there is not any linear combination of these vectors which gives the zero vector (except the zero combination, where all coefficients ci are zeros): c1 v1 + c2 v2 + ... + cn vn = 0 For an obvious example of the dependency we have chosen two linearly dependent ICVs, [002001] and [004002], of SCs 3-10 and 4-28. Both of them can 2
Formal mathematical definitions of the determinant can be found in most elementary guides of linear algebra (see e.g. Lipschutz and Lipson 2001, ch. 8).
A Linear Algebraic Approach to Pitch-Class Set Genera
523
be constituted from the other one by multiplying it with a scalar, in this case by 2 or 1/2: 2 ∗ v1 − v2 = 0 ⇔ 2 ∗ [002001] = [004002] Because they are parallel to each other, the other one doesn’t bring any new information to (or span a new dimension in) the six-dimensional interval-class space. That all vectors are independent in a more complex system of several vectors is not, however, as easy to see as in the case of the two vectors. The volume of the six-dimensional box formed by six ICVs can be calculated using the determinant of a matrix. If the vectors in a matrix are dependent, the volume of this set is 0 and the value of the determinant is 0. This effect is easier to understand in three-dimensional space: by shrinking just one dimension of a box to 0, we get a plane, the volume of which is 0. If the ICVs are standardized by dividing them by their Euclidean norm, the value of the maximum volume they are able to form is 1. This is exactly the case with the six interval classes, which form a six-dimensional orthogonal ’cube’. It is pertinent to mention herein that the interval classes establish an example of a standard basis consisting of six standard vectors. A basis for a vector space is a sequence of vectors v1 , v2 , ..., vn with two properties: (i) they are independent (det(Mnxn ) = 0) and (ii) they span the whole space. The determinant of a matrix is denoted with two lines around the matrix: 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 = 1 0 0 0 0 1 0 0 0 0 0 0 0 It is easily proved that using linear combinations of the standard vectors, the whole R6 -space can be generated. In that space, 208 ICVs form, however, only a tiny bunch of points.
3
Volume Tests with Interval-Class Vectors
Allen Forte (1988) based his system of pitch-class set genera on 12 TnI-type trichords. According to the concept of spanning, only six vectors are needed to generate all the dimensions of ICV space. In order to find the largest space among all six-member TnI-type trichord combinations, we calculated the volumes of all 924 six-vector combinations. The rows of the determinant below include the ICVs of those trichords which form the largest volume. SCs in question are 3-1, 3-8, 3-10, 3-12, 3-9, and 3-5.3 The determinant value has been calculated using the standardized (by their length) vectors. 3
All these belong to Eriksson’s (1986) ’maxpoint’ sets, a chord species containing the maximum number for its size of at least one interval class. However, the maxpoint set for ic2, which is 3-6 [020100], is missing.
524
A. Tenkanen
0.89 0.45 0 2 1 0 0 0 0 0 0 0 0 0.58 0 0.58 0 0.58 0 1 0 1 0 1 0 0 2 0 0 1 0 0.89 0 0 0.45 ⇒ 0 ≈ 0.48 0 0 0 0 3 0 0 0 0 1 0 0 0 0.45 0 0 1 0 0 2 0 0 0.89 0 0.58 0 1 0 0 0 1 1 0 0 0.58 0.58 This set of trichords is unequivocal: there are no other alternatives of the same volume. In his system of trichordal ’progenitors’, Forte called these six SCs ’chroma’, ’whole-tone’, ’diminished’, ’augmented’, ’dia’ and ’atonal’ (Forte 1988, p. 201). What is missing here are Forte’s ’semigroups’ (’semichroma’, ’chromadia’ etc). By using the concept of spanning we rejected Forte’s proposition of twelve trichordal progenitors4, but on the other hand, we have thus managed to catch his main trichords. The determinant formed by the trichords is < 1 which means that these ICVs do not form an orthogonal six-dimensional cube: some of them are in rectangular position to each other but that is not the case among all the pairs. In fact, there are only 18 TnI-type SCs of cardinality domains 3 − 12 whose ICVs form perpendicular pairs with other ICVs. Among them 7 belong to cardinality 4, the remaining 11 to cardinality 3. By constructing six-vector linear combinations out of these 18 vectors, the largest determinant among 18564 possible combinations5 is ≈ 0.396. There are only four such combinations. If we, moreover, prefer the sets of cardinality 4, the result combination includes 3-6, 3-12, 4-1, 4-9, 4-23, and 4-28. Each of them emphasizes a certain interval class (2, 4, 1, 6, 5, 3 respectively). Some further observations can be made about this group of sets. It consists of two transpositionally symmetric and maximally even SCs, 3-12 and 4-28, which are complementary to each other. One might say that this pair chrystallizes several ’harmonic odours’. Even though there is only one possible union to be attained with these two sets6 , the union set 6-Z28 (0,1,3,5,6,9) contains not only all four triads and the dominanth seventh chord like its better known sister 634A (0,1,3,5,7,9)7, but also both of the important tetrachords used in enharmonic modulations, namely the diminished seventh chord and the ’Tristan chord’. It is, however, good to remember, that in order to span the whole six-dimensional interval-class vector space, six linearly independent vectors are needed.8
4
5 6 7 8
If we have six-dimensional ICV space, we can find the remotest points only in six directions, not more. " # 18 = 18564. 6 This is so called ’complement union property’ defined by Morris (1990). I.e. ’Prometheus chord’. Though the ICV [111111] of 4-Z15 and 4-Z29 can be formed by the union of all interval classes, this vector spans only one dimension: we can’t build a box using one side only!
A Linear Algebraic Approach to Pitch-Class Set Genera
4
525
‘Strangest’ Hexachords
After the excursion with orthogonal vectors we will now focus on hexachords, which have often been of special interest to the twentieth century composers and music theorists. Before using the determinant with hexachords, some general observations can be made about the extreme SCs indirectly by applying a similarity measure. By calculating all the similarity values between all hexachord pairs, we get an overview of how ’distant’ or ’strange’ each of them are among all others. In order to produce distinctions between Z- and Tn-type SCs, we extend the well-known cosine measure, proposed in musical set theory by Rogers (1999), a little by using the subset-class vectors 2-10CV instead of using mere ICVs (2CVs)9 . Such similarity functions, which exploit subset classes, have been called ’total measures’ (Castr´en 1994, p. 4). Therefore, we call this measure costotal. If the subset-class vectors of the two SCs sc1 and sc2 are denoted by subcv1 and subcv2 , we get the formula costotal(sc1 , sc2 ) =
subcv1 · subcv2 subcv1 subcv2
where the dividend is the dot product of the subset-class vectors and the divisor is the product of the Euclidean norms of the subset-class vectors. After calculating all the costotal-values (ranging from 0.167 to 1.0) between all the hexachord pairs and thereafter all the mean values 10 related to each hexachord, the strangest among others with their ’strangeness-values’ appear to be: 6-35, 0.357; 6-20, 0.491; 6-7, 0.525; 6-30A/B, 0.560; 6-1 and 6-32, 0.568; 6-27A/B, 0.591 (see Figure 1). There seems to be a connection between the lower values and the number of symmetrical axes. That ’poor guy 6-35 without friends’ (except Debussy!), or as Forte puts it, ’notoriously antisocial creature’ (Forte 1988, 192), has 6 symmetrical axes, 6-20 three, 6-7 two, 6-30A/B, 6-1 and 6-32 one and 6-27A/B none. On the ’social’ side we meet Z-vectors only: 6Z11A/B and 6-Z40A/B, 0.696; 6-Z10A/B, 6-Z24A/B, 6-Z39A/B and 6-Z46A/B, 0.69711 . The same ’strange’ sets 6-35, 6-20, 6-7, 6-30, 6-32, 6-1 form the corners of the space with the greatest determinant value 0.131 among all hexachord combinations.12 9
10 11
12
nCV stands for subset-class vectors which means that all the subset occurrences of cardinality class n, embedded within a SC, are taken into consideration (Castr´en 1994, p. 3). Note that, in the case of the mean values, there are no differences between A- and B-type SCs, because the hexachords themselves form a symmetrical group of sets. Among the trichords the ’middlemost’ sets are 3-3, 3-4 and 3-11 with mean value 0.408. " # 35 There are = 1623160 combinations of six-member hexachord groups. Note 6 that the determinant is calculated using a square matrix (of type 6x6 here). Thus the ICVs are used instead of 2-10CV’s. These 6 SCs belong also to Eriksson’s maxpoint sets. SC 6-27 is missing, which is a maxpoint for ic3. This is substituted by cyclically symmetric 6-30.
526
5
A. Tenkanen
Principal Component Analysis: A Flexible Approach for Mapping ICV-Space
There is one more flexible method which finds the corner SCs and in addition to this, the most neutral or ’odourless’ SCs in the middle of set-class space, namely principal component analysis (PCA). The central idea of PCA is to reduce the dimensionality of a data set consisting of a larger number of interrelated variables, while retaining as much as possible of the variation present in the data set (Jolliffe 2002, p.1). PCA transforms the data to a new coordinate system with orthogonal axes. Those axis vectors are called principal components (PC). The first PC is such a vector, which has the maximum variance with all the data points. The second greatest variance lies on the second PC and so on. These axis vectors are, in fact, real-valued eigenvectors which are a special set of vectors associated with square matrix. However, PCA is a flexible method also in terms of initial requirements: it is not bound to dimensions or to square matrices. We present next an example which is a bit tricky but well-founded. We can calculate principal components for, say all the 2-3CVs of Tn-type hexachords. In this presentation, notation 2-3CV means that not all the subsets but only the 6 interval classes (2CV) and 19 Tn-type trichord subsets (3CV) embedded within a hexachord are worked out as a representative of each hexachord-SC in question. These vectors thus belong to R25 . As an example, the 2-3CVs of set-classes 6-1 and 6-35 are presented in Figure 2. There are only 35 different ICVs (2CVs) but 79 different 2-3CVs among all 80 Tn-type hexachord SCs (6-14A and 6-14B have identical 2-3CVs). Thus, using the 2-3CVs both A- and B-type as well as Z-type SCs can be distinguished. After calculating all the real-valued principal components, 25 pieces in all, we chose the first six of them and calculated the nearest representatives for them among
Fig. 1. Costotal means calculated for 80 Tn-type hexachords. The six ‘strangest’ hexachords have been marked with Forte notation.
A Linear Algebraic Approach to Pitch-Class Set Genera
527
Fig. 2. The 2-3-class-vectors of set classes 6-1 and 6-35
actual 2-3CVs by using the cosine distance. The best ’discriminator’ vectors among hexachords thus found are again of the symmetrical 6-35, 6-1, 6-7, 620, 6-32, 6-30A/B. Similarly, the middlemost SCs can be found, and they are 6-Z24A/B, 6-Z39A/B, 6-Z40A/B, 6-Z11A/B, 6-Z46A/B and 6-Z10A/B. Principal components can be calculated for any combination of vectors of equal length. Therefore, if we are able to derive, for example, all the ICVs that a musical piece one way or another includes13 , all the six PCs specific to that piece can be computed. This information, for one’s part, can be used, for instance, for classifying musical pieces.
6
Using Corner-Stone Vectors for Producing a System of Genera
The goal in different genera systems is to place all the set classes into a certain pcset family (genus). This is easier said than done. There are many borderline SCs which do not seem to belong clearly to any group (Samplaski 2004). Several approaches have been proposed for solving this problem. For example, multidimensional scaling can be instrumental in mapping set classes into more illustrative two-dimensional space. However, this method is problematic: a lot of data is thrown away by reducing six-dimensional information to two dimensions. For instance, SCs such as 6-1, 6-8 and 6-32 with fairly similar ICVs, [543210], [343230] and [143250], may be projected - among all other hexachords - to the same point in two- or three-dimensional space even though they are very different in their chordal nature.14 13 14
Some kind of an automated segmention is naturally required to produce the intervalclass vectors. This may happen with multidimensional scaling if the two projection vectors are the first two principal components calculated by using the ICVs of all hexachords. In that case, PC-vectors are based on greatest variance between different interval classes. Among Tn-type hexachords the ic-variances are: ic1, 0.82; ic2, 0.91; ic3, 0.87; ic4, 0.94; ic5, 0.82; ic6, 0.50. Ic2 and ic4 have the greatest variance among the ICVs of hexachords and hence these interval classes are emphasized by the projection vectors. But what is more important in this case is that because the sums of the ic1- and ic5-components with distinctive values are the same (6) in all three set classes with similar ’weights’ (0.82), these set classes project to the same point in two-dimensional space.
528
A. Tenkanen
One straightforward solution for generating families from the pitch-class sets of the same cardinality is 1) to select the corner-stone set classes for the heads of the families and 2) to connect all the rest of the SCs to the heads according to their distance from them. If, in the case of hexachords, the corner-stone SCs 6-1, 6-7, 6-20, 6-30, 6-32 and 6-35 are accepted as basis vectors and the Euclidean norm is used to evaluate the distances between the normalized 2-10CVs, the resulting genera is as seen in Figure 3. The figure is analogous to what Eriksson (1986, 106) represents about the genera of hexachords15.
Fig. 3. Corner-stone hexachords and their nearest relatives. The criterion for inclusion is the Euclidean distance.
7
Harmonic Space in Composition
Let us finally make another short experiment with determinants. In discussing the pioneering work of Howard Hanson in the 1960s, Michael Buchler (1998, p.5) states that Hanson’s work was, in part, ’meant to show composers how they could project certain intervallic structures using particular Scs’. An example of a composer who uses SCs systematically this way is the contemporary Finnish composer Magnus Lindberg (b. 1958). Many of his works are ’based upon an extended chaconne principle with chord chains cycling around, undergoing constant transformation and being articulated in a very gestural way’ (Oramo 2004, p. 5). According to Lindberg, these chords are based on hexachords 6-2, 6-5, 6-9, 6-15, 6-16, 6-18, 6-21, 6-22, 6-27, 6-30, 6-31, 6-33 and 6-34 (ibid., p.7). Lindberg seems to avoid the corner SCs such as 6-1, 6-35 and ’break colours’ using 6-2, 634 etc. instead. Our hypothesis was that Lindberg’s SCs could still form a fairly large volume. If five of those special corner SCs (6-35, 6-20, 6-7, 6-32 and 6-1) 15
In fact, the corner-stone sets are comparable to Eriksson’s maxpoint sets, which collect the rest around them.
A Linear Algebraic Approach to Pitch-Class Set Genera
529
are rejected, using the remaining 30 ICVs, the largest determinant16 (≈ 0.025) is formed by SCs 6-2, 6-Z6, 6-14, 6-22, 6-30 and 6-33. Four of these belong to Lindberg’s ’vocabulary’. We evaluated all the possible determinants formed by the six SCs that are used in the compositions Corrente and GranDuo 17 . The biggest volume in GranDuo (0.00276) is attained by SCs 6-2, 6-16, 6-21, 6-27, 6-30 and 6-34 and in Corrente (0.00072) by 6-15, 6-21/6-34, 6-22, 6-27, 6-30 and 6-31. The mean of the determinant values in GranDuo is appr. 0.0009 and in Corrente 0.0004. It seems that especially the nearly chromatic SC 6-2 widens the harmonic space of GranDuo compared to the corresponding space of Corrente (without a chromatic relative of genus 6-1, cf. Figure 3).
8
Conclusions
Cosine distance and the determinant of a matrix as well as principal component analysis can well be used to evaluate the ’corner’ SCs and the ’middlemost’ SCs in the set-class space. Using the concept of spanning, the number of the basic SCs, which are needed to determine the pitch-class set genera, can be unambiguously defined in the case of six-dimensional ICV space. Regardless of the methods used, the same groups of SCs seem to emerge. They belong to the group of Eriksson’s ’maxpoint’ sets or maximally symmetrical set classes.
References Buchler, M.H.: Relative Saturation of Subsets and Interval Cycles as a Means for Determining Set-Class Similarity. Ph. D. dissertation, University of Rochester (1998) Castr´en, M.: Recrel: A Similarity Measure for Set-Classes. Ph.D. dissertation, Sibelius Academy (1994) Castr´en, M.: Aspects of Pitch Organization in Magnus Lindberg’s GranDuo for 24 Wind Instruments. In: Humal, M. (ed.) Proceedings of the Fourth International Conference of Music Theory, Tallinn, April 3-5, 2003. A Composition as a Problem, vol. 4(1) (2004) Eriksson, T.: The IC Max Point Structure, MM Vectors and Regions. Journal of Music Theory 30(1), 95–111 (1986) Forte, A.: Pitch-Class Set Genera and the Origin of Modern Harmonic Species. Journal of Music Theory 32(2), 187–270 (1988) Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer Series in Statistics (2002) Lipschutz, S., Lipson, M.: Schaum’s Outline of Theory and Problems of Linear Algebra, 3rd edn. McGraw-Hill, New York (1968) (2001) Morris, R.: Pitch Class Complementation and its Generalizations. Journal on Music Theory 34(2), 175–245 (1990) 16 17
The statistical values for minimum, 1st quartile, median, mean, 3rd quartile and maximum of such determinants are 0.0000, 0.0002, 0.0005, 0.0009, 0.0011 and 0.0252. The basic hexachords used in Corrente are 6-15A/B, 6-21A/B 6-22A/B, 6-27A/B, 6-30A/B, 6-31A/B, 6-34A/B (Oramo 2004, p. 9) and in GranDuo 6-2A/B, 6-16A/B, 6-21A/B, 6-22A/B, 6-27A/B, 6-30A/B, 6-31A/B and 6-34A/B (Castr´en 2004).
530
A. Tenkanen
Oramo, I.: Chaconne-periaate ja muoto Magnus Lindbergin Correntessa (ChaconnePrinciple and Form in Magnus Lindberg’s Corrente). S¨ avellys ja musiikin teoria 11, 5–15 (2004) Quinn, I.: Listening to Similarity Relations. Perspectives of New Music 39(2), 108–158 (2001) Quinn, I.: A Unified Theory of Chord Quality in Equal Temperaments. Ph.D. dissertation, Eastman School of Music (2004) Rogers, D.W.: A Geometric Approach to PCSet Similarity. Perspectives of New Music 37(1), 77–90 (1999) Samplaski, A.: Mapping the Geometries of Pitch-Class Set Similarity Measures via Multidimensional Scaling. Music Theory Online 11(2) (2005)
Author Index
Adilo˘ glu, Kamil 67, 220, 247 Agon, Carlos 412 Ahn, Yun-Kang 412 Amiot, Emmanuel 469 Anagnostopoulou, Christina 247 Anders, Torsten 52 Andreatta, Moreno 412 Buteau, Chantal
59, 247
Carey, Norman 449 Cartwright, Julyan H.E. 168 Cheng, Eric 347 Chew, Elaine 11, 240, 347 Clampitt, David 464, 477 Clark, Jonathan Owen 330 Dom´ınguez, Manuel
477
Ebeling, Martin 140 Erkkil¨ a, Jaakko 156 Exarchos, Dimitris 419
Lartillot, Olivier 156, 230, 247 Lewis, David 107 Luck, Geoff 156 Majchrzak, Miroslaw 257 Martins, Jos´e Oliveira 499 Miljkovic, Katarina 318 Milmeister, G´erard 19 Morris, Robert 266 M¨ ullensiefen, Daniel 107 Nolan, Catherine 375 Noll, Thomas 477 Obermayer, Klaus 67 Oehler, Michael 189 Parncutt, Richard 124 Peck, Robert 489 Piro, Oreste 168 Pound, Eleri Angharad 198
Ferkov´ a, Eva 250 Foley, Gretchen C. 365 Fran¸cois, Alexandre 11
Rahn, John 289 Reuter, Christoph 189 Rhodes, Christophe 107 Riikkil¨ a, Kari 156 Roeder, John 303
Garbers, J¨ org 78, 97 Giesl, Peter 117 Gollin, Edward 340, 406 Gonz´ alez, Diego L. 168 Grijp, Louis 78, 97 Gualda, Fernando 430
Schlingmann, Dirk 441 Scotto, Ciro 25 Sethares, William A. 1 Shuster, Lawrence B. 354 Sidl´ık, Peter 250
Honingh, Aline
Tanir, G. Ada 220 Tenkanen, Atte 211, 521 Toiviainen, Petri 156
88
Ianni, Jerry G. 354 Ilom¨ aki, Tuukka 386 Jedrzejewski, Franck Johnson, Tom 311
493
Kranenburg, Peter van 78, 97 Kreyszig, Herbert 392 Kreyszig, Walter 392 Kulp, Christopher W. 441
Veltkamp, Remco C. 78, 97 Vipperman, John 59 Volk, Anja 78, 97, 204 Wiering, Frans Yust, Jason Zd´ımal, Milan
78, 97 512 250
Index
algebraic combinatorics on words 464, 484 all-interval trichord 454 all-trichord hexachord 303 analysis (musical) 156,211,412,441,443 - automated 1,347 - computer-aided 250 - melodic 59, 247 - Schenkerian 512 - tonal 250, 251 Argus algorithm 240, 241 Atonality 124, 125 audio signal processing 10 autocorrelation 142-148 automorphism group 512 axis-dyad chord 367, 368 bayesian model selection 107, 111 beat tracking 4, 5, 7 block design 311, 313, 317, 454 Boethian music theory 404 Burnside's lemma 271 canon, rhythmic 469,473 canonical form 419 category theory 19,20,412,415,430 Cayley graphs 310,342,343 cellular automata 318,328 centralizer subgroup 489-491 chord 367 - axis-dyad 367,368 -labels 107 - mystic 214,217,218 -rootofachord 124,127,128 - stability 78-87,97,98, 105 Christoffel word 479,480,481 Christoffel duality 480, 481 cluster, chromatic 472 clustering - melodic 59 Cohn flip 464, 504, 505 combination theory 311,317 combinatorial group theory 342 combinatorial word theory 477 common tone theorem 273, 275
comparison set analysis 211-212,218 complement union property 303, 305 composition - algorithmic 321, 338,436 - by creative analysis 412 - constraint based 52, 53 - mathematically inspired 311 compositional design 305 connectivity component 70,72-74 consonance 134-135, 140-141 constraint satisfaction problem 24, 52, 53 content-based retrieval 97 convex sets 88 correlation coefficient - Pearson 81, 82 - Spearman 253 cycle - maximally smooth 464, 467 - modulating 464 cyclical spectra 190 data format 19 denotator 19,20 devi's staircase 171, 172 diagram - co1imit of a 22, 24 - in category theory 19, 20 -limit of a 20,21 diatonic modes 477,483,484 Dirichlet distribution 107, 110 dispersion (statistics) 253 dissonance 134, 140, 141 distance - earth movers (EMD) 79, 82, 85, 98 - transportation 105 dynamics - nonlinear 168, 185 - nonlinear (of networks) 331 dynamical systems 117-119,330,332 Euler lattice 88, 89 expectation - violated 14 expressive performances 347, 352
534
Index
Fano plane 454, 455 Fareysum 172-174,185,493 feature extraction 156 flat interval distribution 469,474 folk song (dutch) 97,99 forced oscillator 168, 171, 185 formal concept analysis 235 formant 189,191 Fourier analysis 3--4, 143, 168 Fourier Transform - discrete, in audio signal processing 10 - discrete, in music theory 469 Fourier spectrum 170 frequency ratio 392 fundamental (pitch) - missing 168, 170, 174 Gaffurio number generating function 392, 397,401 Galois connection 236 generalized coincidence function (in pitch perception) 141,148,153 generative processes 318 gestures - characteristic (in transformational theory) 303, 306, 310 golden mean 168,176,179 graph 305 - as compositional space 305 - Cayley 310,342-345 - outerplanar, maximal 512 - theory 489,490 - underlying a K-net 365, 371-372 group - action 289, 292,489, 490 - centralizer subgroup 489,491 - Klein four-group 303, 308 - SchrittlWechsel 489 - stabilizer subgroup 356 - symmetric 340, 342 - theory 266, 283, 285, 342, 430 - theory, combinatorial 342, 430 - TranspositionlInversion 271,275,489 Guidonian system 393, 395 harmonic - analysis (in mathematics) 469 - analysis (in music) 116, 121, 122,407 - labeling 97, 107, 250
harp 198, 199,201 hexachord - all-trichord 303, 305 - theorem 274 Hopf Bifurcation 332, 333 HopfTheorem (equivariant) 332 host set 499,503,504,508 Hungarian minor 464, 467 improvisation 156, 159, 160 inferior nucleus 143 information Theory 441 inheritance property 67, 70, 73 inner metric analysis (IMA) 78 interval - all-interval 271, 281 - preservation 489 - System generalized 489, 490 intervallic content 470 isography 386, 390 - axial 358, 363 - double-axial 358, 362 - positive 389 - strong 355,361,387,389,390 isomorphism 282, 292 key finding algorithms 12, 88, 250, 257 key range 258, 265 K-relation/Kh -relation (Forte) 283 Klumpenhouwer networks (K-nets) 290, 354,365,375,386,492 - tetrachordal K-nets 354, 362 - trichordal K-nets 354, 362 line of fifths - spiral array 11, 12,246 - straight line 257-265 linkage cluster 221 local meter 80,81,204-205,208 Lyapunov function 118, 119 matrix multiplication 130 Mathematica (Software) 318,441 maximally even - *scale, tightly generated 494 - sets 469-471 MaxMSP (Software) 338 mental disorder 156, 159, 165 metric analysis 78, 79, 204
Index metric weight - inner 78,79, 204, 209 -levels 80,81,97, 100 microtones 198 MIDI 156, 158, 165 modulation 88, 92, 94 monoid 290,291 - action 289, 290, 292 morphisms 416,434,436 motif - description 53, 54, 56, 57 - motivic space 59, 65 - motivic structure 59,60 - musical 52, 53 - representation 52, 54, 56 - variation 52, 53, 55 motivic - analysis 59,61,230,247 - pattern 167,230,247,248 multiplicities (Deleuze) 335 musical action 289, 296 musical analysis - computer aided 220 musical form 304, 306, 310 musical humour 11, 17 musical prosody 11,240,347 music therapy 156, 157, 165 music theory - Boethian 404 - medieval 499,501 Musica Enchiriadis 511 Muugle (Software) 97,99 naturalness (of wind instrument sound) 196 nature (and music) 313,318,319 neighboorhood - (melodic) content 68,73,220 - (melodic) presence 68, 72, 75 neo-Riemannian theory 410 network 306,311,330,335,338,365,367, 406 - isographies 375, 378 - isomorphism 293 neuronal processing 141 nonlinear circles 198, 199 nonlinear time series analysis 443 normal form 25,26,31,48 number - chromatic 494, 495 - elementary 392,401 - factorization (Chinese remainder theorem) 396, 397
535
- natural 213 - prime 396,397,401 - theory 392, 396 octave-complex tones 128, 132 Open Music (Software) 412 ordinary differential equations (ODE) 330 pairwise well-formed scales 464, 466, 467 partials 127 pattern extraction 230, 232 perceptual analysis 135 perceptual salience 127 Perle cycles 365,367,370 periodicity (in pitch height) 422,425,495 - quasiperiodicity (of a response) 171, 172 permutation 341,345 phrase detection 347,350 phrasing strategies 347,348 physical modeling 189 pitch-class category 434 pitch-class space 499 pitch perception 140, 143, 168, 175, 186 pitch stability 99, 100, 105 polyphony - implicit 22-29,67-71,247-249 prime numbers 396,397,401 programming paradigm (functional/object oriented) 412,414 prominence profile 221, 224 proportions 392, 395 psychoacoustics 124, 127 pulse forming 189,190,193,196 pulse width function 195 pure intonation 406 Q-relation 467 redundancy reduction 227,247 representation - musical, acoustic 95,347,348, 353 - musical, symbolic 107, 108,211,213 - of a motiv 58 residue behaviour 169, 173, 174 rhythm 1,2,5, 10 rhythm analysis 214,216,217 rhythm-class set 215 roughness 140, 141 Rubato (Software) 19,98,101,412,413, 416
536
Index
scale 88,89,91,198,200 - chromatic (or equidistant) 421 - Dasian 499,501,502,505 - diatonic 449,451,452 - golden 168,176,179,180,182 - Hungarian minor 464, 467 - maximally even*, tightly generated 494 - pseudo-diatonic 493-495 - theory (musical) 477,482,497 - Wyschnegradsky diatonic 493 Schlegel diagram 345 Schumann's principles of Timbre 189 segmentation, automatic 68,213,216, 228,247 self-similarity 318,320,322 set (in mathematics) - convex 88,89,91 set (of pitch classes) - comparison 211,213,215 - complex 268, 283 - maximally even 469,471,482,484 - theory 28,48,125, 135,219,266,273, 430,431,436,449,451,499 - theory, diatonic 449,451,458,512,513 Shannon entropy 444 sieve theory (Xenakis) 419,421,423 similarity - contour 220 - neighborhood, melodic 59,67,69,70, 97,101,220,221,247 - rhythmic 78, 82, 86, 204 similarity measures 213 space - topological 59, 60, 64 specificity relation 230, 236, 237 spiral array model 11, 12 statistical modeling 156, 159 Steiner triple system 454, 455 step-class - automorphism 512,515,517 - interval 512,513,515,517 - tetrachord, automorphic 513,515 - trichord, automorphic 449,452, 454 Stem-Brocot Tree 477,493,494 - shuffled 493, 494 structure - melodic 67,69,70,76 - motivic 58, 59, 230, 232 symmetry 303,310,421,424,428 - in pitch space 11-13, 347
synthesis and analysis framework 193 196 synthesizer 189,190, 193 ' systema teleion 393, 394, 397, 401 temperament 469 - LehmannlBach 469 - equal 198,199,203 - non-equal ("non-linear") 198, 199 tetrachord 354,357,358,361,362, 375,379,499,500,501 theorem of Wiener 143 timbre 189, 195, 196 tonal - analysis 512,517,519 - fusion 140,153 - harmony, irrelative system in 257,263 - implications 124, 126, 135 - region 515,518,519 tonality 11,15,124,133,134 - Perle 386, 388 tonnetz 345,406,408,409,410 topology - melodic 67,69,70,71,73-75 - motivic 60,61 topological model 60, 64, 67 topos 415,418 transfer entropy 441,445,447 transformation - prolongational 512, 515, 517, 518 - triadic, uniform 491,492 transformatio 499,502,504,506,508 510 transformational 375,383,406,409,410 - analysis 290, 365-374 - theory 344, 406 transitus class 407 transpositio 499-503 transpositional type (T-type) 125 Tree 289, 301 - Christoffel 477,479,480,485 - Stem-Brocot 477,493,494 trope system (Hauer) 271 tuning systems 198, 199,200,203 twelve tone -invariance 268,269,270 - system 266,267,271-273,281, 286 - tonality 365, 370, 374 unique factorisation theorem 419
Index variation - of a folk song 78,81,97, 101,106 - of a motiv 52 Variophon 189,193,195-197 virtual pitch 127, 128 visualization - of melodic clusters 59,247 - of tonal content 388 voice leading 449, 454, 456 volley principle 141
537
weight - inner metric 78, 79, 81 - spectral (metric) 205,206,208 weighted Gaussian mixture 338 well-formed scales 464,465,467,477,486 whole-tone scale proposition 388-390 words - Christoffel 464,477-488 - Sturrnian 464 Z-relation (Forte) 469,470,475