Center for Machine Learning and Intelligent Systems
Donate a Data Set
View ALL Data Sets
The data set was contributed to the benchmark collection by Terry Sejnowski, now at the Salk Institute and the University of California at San Deigo. The data set was developed in collaboration with Ning Qian of Johns-Hopkins University.
Data Set Information:
This is a data set used by Ning Qian and Terry Sejnowski in their study using a neural net to predict the secondary structure of certain globular proteins . The idea is to take a linear sequence of amino acids and to predict, for each of these amino acids, what secondary structure it is a part of within the protein. There are three choices: alpha-helix, beta-sheet, and random-coil. The data set contains both a large set of training data and a distinct set of data that can be used for testing the resulting network. Qian and Sejnowski use a Nettalk-like approach and report an accuracy of 64.3% on the test set, and they speculate that this is about the best that can be done using only local context.
Ning Qian and Terrnece J. Sejnowski (1988), "Predicting the Secondary Structure of Globular Proteins Using Neural Network Models" in Journal of Molecular Biology 202, 865-884. Academic Press.
Papers That Cite This Data Set1:
Jianbin Tan and David L. Dowe. MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes. Australian Conference on Artificial Intelligence. 2003. [View Context].
Copyright (C) 1988 by Terrence J. Sejnowski. Permission is hereby given to use the included data for non-commercial research purposes. Contact the John Hopkins University, Cognitive Science Center, Baltimore MD, USA for information on commercial use.
 Papers were automatically harvested and associated with this data set, in collaboration with Rexa.info
In Collaboration With: