Molecular Biology (Protein Secondary Structure)

From CMU connectionist bench repository; Classifies secondary structure of certain globular proteins

Dataset Characteristics

Sequential

Subject Area

Biology

Associated Tasks

Classification

Feature Type

Categorical

# Instances

128

# Features

-

Dataset Information

Additional Information

This is a data set used by Ning Qian and Terry Sejnowski in their study using a neural net to predict the secondary structure of certain globular proteins [1]. The idea is to take a linear sequence of amino acids and to predict, for each of these amino acids, what secondary structure it is a part of within the protein. There are three choices: alpha-helix, beta-sheet, and random-coil. The data set contains both a large set of training data and a distinct set of data that can be used for testing the resulting network. Qian and Sejnowski use a Nettalk-like approach and report an accuracy of 64.3% on the test set, and they speculate that this is about the best that can be done using only local context. There is also a domain theory in the folder, donated and created by Jude Shavlik & Rich Maclin

Has Missing Values?

No

Dataset Files

FileSize
protein-secondary-structure.train71.8 KB
protein-secondary-structure.test14.2 KB
protein-secondary-structure.theory11.2 KB
protein-secondary-structure.names1.9 KB
Index285 Bytes

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (25.6 KB)
0 citations
2611 views

Creators

Terry Sejnowski

Ning Qian

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy