Center for Machine Learning and Intelligent Systems
About  Citation Policy  Donate a Data Set  Contact

Repository Web            Google
View ALL Data Sets

× Check out the beta version of the new UCI Machine Learning Repository we are currently testing! Contact us if you have any issues, questions, or concerns. Click here to try out the new site.

MicroMass Data Set
Download: Data Folder, Data Set Description

Abstract: A dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data.

Data Set Characteristics:  


Number of Instances:




Attribute Characteristics:


Number of Attributes:


Date Donated


Associated Tasks:


Missing Values?


Number of Web Hits:



Pierre Mahé, pierre.mahe '@', bioMérieux
Jean-Baptiste Veyrieras, jean-baptiste.veyrieras '@', bioMérieux

Data Set Information:

This MALDI-TOF dataset consists in:
A) A reference panel of 20 Gram positive and negative bacterial species covering 9 genera among which several species are known to be hard to discriminate by mass spectrometry (MALDI-TOF). Each species was represented by 11 to 60 mass spectra obtained from 7 to 20 bacterial strains, constituting altogether a dataset of 571 spectra obtained from 213 strains. The spectra were obtained according to the standard culture-based workflow used in clinical routine in which the microorganism was first grown on an agar plate for 24 to 48 hours, before a portion of colony was picked, spotted on a MALDI slide and a mass spectrum was acquired.
B) Based on this reference panel, a dedicated in vitro mock-up mixture dataset was constituted. For that purpose we considered 10 pairs of species of various taxonomic proximity:
* 4 mixtures, labelled A, B, C and D, involved species that belong to the same genus,
* 2 mixtures, labelled E and F, involved species that belong to distinct genera, but to the same Gram type,
* 4 mixtures, labelled G, H, I and J, involved species that belong to distinct Gram types.
Each mixture was represented by 2 pairs of strains, which were mixed according to the following 9 concentration ratios : 1:0, 10:1, 5:1, 2:1, 1:1, 1:2, 1:5, 1:10, 0:1. Two replicate spectra were acquired for each concentration ratio and each couple of strains, leading altogether to a dataset of 360 spectra, among which 80 are actually pure sample spectra.

Attribute Information:

Provide information about each attribute in your data set.

Relevant Papers:

Mahé et al. (2014). Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum. Bioinformatics.
Vervier et al., A benchmark of support vector machines strategies for microbial identification by mass-spectrometry data, submitted

Citation Request:

If you have no special citation requests, please leave this field blank.

Supported By:

 In Collaboration With:

About  ||  Citation Policy  ||  Donation Policy  ||  Contact  ||  CML