Parkinson's Speech with Multiple Types of Sound Recordings
Donated on 6/11/2014
The training data belongs to 20 Parkinson's Disease (PD) patients and 20 healthy subjects. From all subjects, multiple types of sound recordings (26) are taken.
Dataset Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification, Regression
Feature Type
Integer, Real
# Instances
1040
# Features
-
Dataset Information
Additional Information
The PD database consists of training and test files. The training data belongs to 20 PWP (6 female, 14 male) and 20 healthy individuals (10 female, 10 male) who appealed at the Department of Neurology in Cerrahpasa Faculty of Medicine, Istanbul University. From all subjects, multiple types of sound recordings (26 voice samples including sustained vowels, numbers, words and short sentences) are taken. A group of 26 linear and time–frequency based features are extracted from each voice sample. UPDRS ((Unified Parkinson’s Disease Rating Scale) score of each patient which is determined by expert physician is also available in this dataset. Therefore, this dataset can also be used for regression. After collecting the training dataset which consists of multiple types of sound recordings and performing our experiments, in line with the obtained findings we continued collecting an independent test set from PWP via the same physician’s examination process under the same conditions. During the collection of this dataset, 28 PD patients are asked to say only the sustained vowels 'a' and 'o' three times respectively which makes a total of 168 recordings. The same 26 features are extracted from voice samples of this dataset. This dataset can be used as an independent test set to validate the results obtained on training set. Further details are contained in the following reference -- if you use this dataset, please cite: Erdogdu Sakar, B., Isenkul, M., Sakar, C.O., Sertbas, A., Gurgen, F., Delil, S., Apaydin, H., Kursun, O., 'Collection and Analysis of a Parkinson Speech Dataset with Multiple Types of Sound Recordings', IEEE Journal of Biomedical and Health Informatics, vol. 17(4), pp. 828-834, 2013 Training Data File: Each subject has 26 voice samples including sustained vowels, numbers, words and short sentences. The voice samples in the training data file are given in the following order: sample# - corresponding voice samples 1: sustained vowel (aaa……) 2: sustained vowel (ooo…...) 3: sustained vowel (uuu…...) 4-13: numbers from 1 to 10 14-17: short sentences 18-26: words Test Data File: 28 PD patients are asked to say only the sustained vowels 'a' and 'o' three times respectively which makes a total of 168 recordings (each subject has 6 voice samples) The voice samples in the test data file are given in the following order: sample# - corresponding voice samples 1-3: sustained vowel (aaa……) 4-6: sustained vowel (ooo……)
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no | |||||
no |
0 to 10 of 26
Additional Variable Information
Training Data File: column 1: Subject id colum 2-27: features features 1-5: Jitter (local),Jitter (local, absolute),Jitter (rap),Jitter (ppq5),Jitter (ddp), features 6-11: Shimmer (local),Shimmer (local, dB),Shimmer (apq3),Shimmer (apq5), Shimmer (apq11),Shimmer (dda), features 12-14: AC,NTH,HTN, features 15-19: Median pitch,Mean pitch,Standard deviation,Minimum pitch,Maximum pitch, features 20-23: Number of pulses,Number of periods,Mean period,Standard deviation of period, features 24-26: Fraction of locally unvoiced frames,Number of voice breaks,Degree of voice breaks column 28: UPDRS column 29: class information Test Data File: column 1: Subject id colum 2-27: features features 1-5: Jitter (local),Jitter (local, absolute),Jitter (rap),Jitter (ppq5),Jitter (ddp), features 6-11: Shimmer (local),Shimmer (local, dB),Shimmer (apq3),Shimmer (apq5), Shimmer (apq11),Shimmer (dda), features 12-14: AC,NTH,HTN, features 15-19: Median pitch,Mean pitch,Standard deviation,Minimum pitch,Maximum pitch, features 20-23: Number of pulses,Number of periods,Mean period,Standard deviation of period, features 24-26: Fraction of locally unvoiced frames,Number of voice breaks,Degree of voice breaks column 28: class information
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset parkinson_s_speech_with_multiple_types_of_sound_recordings = fetch_ucirepo(id=301) # data (as pandas dataframes) X = parkinson_s_speech_with_multiple_types_of_sound_recordings.data.features y = parkinson_s_speech_with_multiple_types_of_sound_recordings.data.targets # metadata print(parkinson_s_speech_with_multiple_types_of_sound_recordings.metadata) # variable information print(parkinson_s_speech_with_multiple_types_of_sound_recordings.variables)
Kursun,Olcay, Sakar,Betul, Isenkul,M., Sakar,C., Sertbas,Ahmet, and Gurgen,Fikret. (2014). Parkinson's Speech with Multiple Types of Sound Recordings. UCI Machine Learning Repository. https://doi.org/10.24432/C5NC8M.
@misc{misc_parkinson's_speech_with_multiple_types_of_sound_recordings_301, author = {Kursun,Olcay, Sakar,Betul, Isenkul,M., Sakar,C., Sertbas,Ahmet, and Gurgen,Fikret}, title = {{Parkinson's Speech with Multiple Types of Sound Recordings}}, year = {2014}, howpublished = {UCI Machine Learning Repository}, note = {{DOI}: https://doi.org/10.24432/C5NC8M} }
Creators
Olcay Kursun
Betul Sakar
M. Isenkul
C. Sakar
Ahmet Sertbas
Fikret Gurgen
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.