Spoken Arabic Digit
Donated on 9/12/2010
This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 male and 44 female native Arabic speakers.
Dataset Characteristics
Multivariate, Time-Series
Subject Area
Other
Associated Tasks
Classification
Feature Type
Real
# Instances
8800
# Features
13
Dataset Information
Additional Information
Dataset from 8800(10 digits x 10 repetitions x 88 speakers) time series of 13 Frequency Cepstral Coefficients (MFCCs) had taken from 44 males and 44 females Arabic native speakers between the ages 18 and 40 to represent ten spoken Arabic digit.
Has Missing Values?
No
Variable Information
Each line on the data base represents 13 MFCCs coefficients in the increasing order separated by spaces. This corresponds to one analysis frame. The 13 Mel Frequency Cepstral Coefficients (MFCCs) are computed with the following conditions; Sampling rate: 11025 Hz, 16 bits Window applied: hamming Filter pre-emphasized: 1-0.97Z^(-1)
Dataset Files
File | Size |
---|---|
Train_Arabic_Digit.txt | 27 MB |
Test_Arabic_Digit.txt | 8.9 MB |
graphic.jpg | 31.8 KB |
documentation.html | 20.8 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset spoken_arabic_digit = fetch_ucirepo(id=195) # data (as pandas dataframes) X = spoken_arabic_digit.data.features y = spoken_arabic_digit.data.targets # metadata print(spoken_arabic_digit.metadata) # variable information print(spoken_arabic_digit.variables)
Bedda, M. & Hammami, N. (2008). Spoken Arabic Digit [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C52C9Q.
Creators
Mouldi Bedda
Nacereddine Hammami
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.