Spoken Arabic Digits

Abstract: This dataset contains time series of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 males and 44 females native Arabic speakers.

1.       Title of Database: Spoken Arabic Digit

2.       Source(s): Data Collected by the Laboratory of Automatic and Signals,
University of Badji-Mokhtar
Annaba, Algeria.

Direction: Prof.Mouldi Bedda
Participants: H.Dahmani, C.Snani, MC.Amara Korba, S.Atoui
Adapted and preprocessed by :
                        Nacereddine Hammami and Mouldi Bedda
                          Faculty of Engineering,
                          Al-Jouf University
Sakaka, Al-Jouf
Kingdom of Saudi Arabia
                          e-mail: nacereddine.hammami@gmail.com
mouldi_bedda@yahoo.fr
Date: October, 2008

3.       Past Usage:

[1] N. Hammami, M. Bedda ,”Improved Tree model for Arabic Speech Recognition”, Proc. IEEE
ICCSIT10 Conference, 2010.
[2] N. Hammami, M. Sellami ,”Tree distribution classifier for automatic spoken Arabic digit
recognition”, Proc. IEEE ICITST09 Conference, 2009 , PP 1-4.

4.       Relevant Information Paragraph:
Dataset from 8800(10 digits x 10 repetitions x 88 speakers) time series of 13 Frequency Cepstral
Coefficients (MFCCs) had taken from 44 males and 44 females Arabic native speakers
between the ages 18 and 40 to represent ten spoken Arabic digit.

5.       Number of Instances: 8800

6.       Number of Attributes: 13

7.       Attribute Information:
Each line on the data base represents 13 MFCCs coefficients in the increasing order separated by
spaces. This corresponds to one analysis frame. The 13 Mel Frequency Cepstral Coefficients
(MFCCs) are computed with the following
conditions;
Sampling rate: 11025 Hz, 16 bits
Window applied: hamming
Filter pre-emphasized: 1-0.97Z^(-1)

8.       Missing Attribute Values: None

9.       Class Distribution:
Each line in Train_Arabic_Digit.txt or Test_Arabic_Digit.txt represents 13 MFCCs coefficients in
the increasing order separated by spaces. This corresponds to one analysis frame.
Lines are organized into blocks, which are a set of 4-93 lines separated by blank lines and
corresponds to a single speech utterance of an spoken Arabic digit with 4-93 frames.
Each spoken digit is a set of consecutive blocks.

In Train_Arabic_Digit.txt there are 660 blocks for each spoken digit .The first 330 blocks
represent male speakers and the second 330 blocks represent the female speakers. Blocks 1-660
represent the spoken digit "0" (10 utterances of /0/ from 66 speakers), blocks 661-1320 represent
the spoken digit "1" (10 utterances of /1/ from the same 66 speakers 33 males and 33 females
), and so on up to digit 9.

In Test_Arabic_Digit.txt, digits 0 to 9 have 220 blocks for each one. The first 110 blocks
represent male speakers and the second 110 blocks represent the female speakers. Therefore,
blocks 1-220 represent digit "0" (10 utterances of /0/ from the 22 speakers ), blocks
221-440 represent digit "1" (10 utterances of /1/ from the same 22 speakers 11 males and 11
females ), and so on.
Speakers in the test dataset are different from those in the train dataset.

For the Matlab users, files “ *.mat” that represent each block separately are also available.