Connectionist Bench (Nettalk Corpus)
Donated on 10/10/1954
The file "nettalk.data" contains a list of 20,008 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes
Dataset Characteristics
Multivariate
Subject Area
Other
Associated Tasks
-
Feature Type
Categorical
# Instances
20008
# Features
-
Dataset Information
Additional Information
This is an updated and corrected version of the data set used by Sejnowski and Rosenberg in their influential study of speech generation using a neural network [1]. The file "nettalk.data" contains a list of 20,008 English words, along with a phonetic transcription for each word. The task is to train a network to produce the proper phonemes, given a string of letters as input. This is an example of an input/output mapping task that exhibits strong global regularities, but also a large number of more specialized rules and exceptional cases. Please see original readme file for more information.
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
no | |||||
no | |||||
no | |||||
no |
0 to 4 of 4
Additional Variable Information
The pronouncing dictionary was created to study the translation process between written English, using graphemes or letters as units, and spoken English, using phonemes as units. The dictionary includes 20008 aligned letter and phonetic representations with stresses. The dictionary contains four tab separated fields of information for each word. The fields are: 1) a letter representation 2) a phonemic representation 3) stress and syllabic structure 4) an integer indicating foreign and irregular words Please see original readme file for more information.
Dataset Files
File | Size |
---|---|
nettalk.data | 528.6 KB |
nettalk.names | 13.4 KB |
Index | 114 Bytes |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset connectionist_bench_nettalk_corpus = fetch_ucirepo(id=150) # data (as pandas dataframes) X = connectionist_bench_nettalk_corpus.data.features y = connectionist_bench_nettalk_corpus.data.targets # metadata print(connectionist_bench_nettalk_corpus.metadata) # variable information print(connectionist_bench_nettalk_corpus.variables)
Sejnowski, T. & Rosenberg, C. (1988). Connectionist Bench (Nettalk Corpus) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5VP6T.
Creators
Terry Sejnowski
Charles Rosenberg
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.