Connectionist Bench (Nettalk Corpus)

Donated on 10/10/1954

The file "nettalk.data" contains a list of 20,008 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes

Dataset Characteristics

Multivariate

Subject Area

Other

Associated Tasks

-

Feature Type

Categorical

# Instances

20008

# Features

-

Dataset Information

Additional Information

This is an updated and corrected version of the data set used by Sejnowski and Rosenberg in their influential study of speech generation using a neural network [1]. The file "nettalk.data" contains a list of 20,008 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes, given a string of letters as input. This is an example of an input/output mapping task that exhibits strong global regularities, but also a large number of more specialized rules and exceptional cases. Please see original readme file for more information.

Has Missing Values?

No

Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values
no
no
no
no

0 to 4 of 4

Additional Variable Information

The pronouncing dictionary was created to study the translation process between written English, using graphemes or letters as units, and spoken English, using phonemes as units. The dictionary includes 20008 aligned letter and phonetic representations with stresses. The dictionary contains four tab separated fields of information for each word. The fields are: 1) a letter representation 2) a phonemic representation 3) stress and syllabic structure 4) an integer indicating foreign and irregular words Please see original readme file for more information.

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download
0 citations
2495 views

Keywords

keyword

Creators

Terry Sejnowski

Charles Rosenberg

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy