Bach Chorales Data Set
Download: Data Folder, Data Set Description

Abstract: Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp

Data Set Characteristics:  

Univariate, Time-Series

Attribute Characteristics:

Categorical, Integer

Associated Tasks:


Chorales: Mainous and Ottman edition.Mainous, Frank D. and Robert W. Ottman, eds. 1966.
The 371 Bach Chorales. Holt, Rinehart and Winston, New York.

Original Owners of Database:

Darrell Conklin
ZymoGenetics Inc.
1201 Eastlake Avenue East
Seattle WA, 98102
conklin '@'

Donor of database:

Same as owner. Ann Blombach of Ohio State University originally supplied me with 4-voice encodings of 100 chorales. The present database is the soprano line, converted into Lisp-readable form, and extensively corrected.

Data Set Information:

Sequential (time-series) domain. Single-line melodies of 100 Bach chorales (originally 4 voices). The melody line can be studied independently of other voices. The grand challenge is to learn a generative grammar for stylistically valid chorales (see references and discussion in "Multiple Viewpoint Systems for Music Prediction").

Attribute Information:

Number of Attributes: 6 (nominal) per event

(a) start-time, measured in 16th notes from chorale beginning (time 0)
(b) pitch, MIDI number (60 = C4, 61 = C#4, 72 = C5, etc.)
(c) duration, measured in 16th notes
(d) key signature, number of sharps or flats, positive if key signature has sharps, negative if key signature has flats
(e) time signature, in 16th notes per bar
(f) fermata, true or false depending on whether event is under a fermata

Attribute domains (all integers):

(a) {0,1,2,...}
(b) {60,...,75}
(c) {1,...,16}
(d) {-4,...,+4}
(e) {12,16}
(f) {0,1}

Relevant Papers:

Conklin, Darrell and Witten, Ian. 1995. Multiple Viewpoint Systems for Music Prediction. Journal of New Music Research. 24(1):51-73.
[Web Link]

