Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

Bach Chorales Data Set
Download: Data Folder, Data Set Description

Abstract: Time-series data based on chorales; challenge is to learn generative grammar; data in Lisp

Data Set Characteristics:  

Univariate, Time-Series

Number of Instances:




Attribute Characteristics:

Categorical, Integer

Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Chorales: Mainous and Ottman edition.Mainous, Frank D. and Robert W. Ottman, eds. 1966.
The 371 Bach Chorales. Holt, Rinehart and Winston, New York.

Original Owners of Database:

Darrell Conklin
ZymoGenetics Inc.
1201 Eastlake Avenue East
Seattle WA, 98102
conklin '@'

Donor of database:

Same as owner. Ann Blombach of Ohio State University originally supplied me with 4-voice encodings of 100 chorales. The present database is the soprano line, converted into Lisp-readable form, and extensively corrected.

Data Set Information:

Sequential (time-series) domain. Single-line melodies of 100 Bach chorales (originally 4 voices). The melody line can be studied independently of other voices. The grand challenge is to learn a generative grammar for stylistically valid chorales (see references and discussion in "Multiple Viewpoint Systems for Music Prediction").

Attribute Information:

Number of Attributes: 6 (nominal) per event

(a) start-time, measured in 16th notes from chorale beginning (time 0)
(b) pitch, MIDI number (60 = C4, 61 = C#4, 72 = C5, etc.)
(c) duration, measured in 16th notes
(d) key signature, number of sharps or flats, positive if key signature has sharps, negative if key signature has flats
(e) time signature, in 16th notes per bar
(f) fermata, true or false depending on whether event is under a fermata

Attribute domains (all integers):

(a) {0,1,2,...}
(b) {60,...,75}
(c) {1,...,16}
(d) {-4,...,+4}
(e) {12,16}
(f) {0,1}

Relevant Papers:

Conklin, Darrell and Witten, Ian. 1995. Multiple Viewpoint Systems for Music Prediction. Journal of New Music Research. 24(1):51-73.
[Web Link]

Papers That Cite This Data Set1:

Matthew Brand. Pattern discovery via entropy minimization. MERL -- A MITSUBISHI ELECTRIC RESEARCH LABORATORY. 1998. [View Context].

Matthew Brand. An Entropic Estimator for Structure Discovery. NIPS. 1998. [View Context].

Zoubin Ghahramani and Michael I. Jordan. Factorial Hidden Markov Models. Machine Learning, 29. 1997. [View Context].

Mohammed Waleed Kadous and Claude Sammut. The University of New South Wales School of Computer Science and Engineering Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series. [View Context].

Mohammed Waleed Kadous. Expanding the Scope of Concept Learning Using Metafeatures. School of Computer Science and Engineering, University of New South Wales. [View Context].

Citation Request:

Please refer to the Machine Learning Repository's citation policy

[1] Papers were automatically harvested and associated with this data set, in collaboration with

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML