Yeast

Donated on 8/31/1996

Predicting the Cellular Localization Sites of Proteins

Dataset Characteristics

Multivariate

Subject Area

Biology

Associated Tasks

Classification

Feature Type

Real

# Instances

1484

# Features

8

Dataset Information

Additional Information

Predicted Attribute: Localization site of protein. ( non-numeric ). The references below describe a predecessor to this dataset and its development. They also give results (not cross-validated) for classification by a rule-based expert system with that version of the dataset. Reference: "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110, 1991. Reference: "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992.

Has Missing Values?

No

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
Sequence_NameIDCategoricalAccession number for the SWISS-PROT databaseno
mcgFeatureContinuousMcGeoch's method for signal sequence recognition.no
gvhFeatureContinuousvon Heijne's method for signal sequence recognition.no
almFeatureContinuousScore of the ALOM membrane spanning region prediction program.no
mitFeatureContinuousScore of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins.no
erlFeatureContinuousPresence of HDEL substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute.no
poxFeatureContinuousPeroxisomal targeting signal in the C-terminus.no
vacFeatureContinuousScore of discriminant analysis of the amino acid content of vacuolar and extracellular proteins.no
nucFeatureContinuousScore of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins.no
localization_siteTargetCategoricalno

0 to 10 of 10

Baseline Model Performance

Dataset Files

FileSize
yeast.data92.8 KB
yeast.names3.2 KB

Papers Citing this Dataset

On Possibility and Impossibility of Multiclass Classification with Rejection

By Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama. 2019

Published in ArXiv.

Incremental kernel PCA and the Nystr"om method

By Fredrik Hallgren, Paul Northrop. 2018

Published in ArXiv.

Degrees of Freedom and Model Selection for k-means Clustering

By David Hofmeyr. 2018

Published in ArXiv.

Multi-Resolution Dual-Tree Wavelet Scattering Network for Signal Classification

By Amarjot Singh, Nick Kingsbury. 2017

Published in ArXiv.

A Siamese Deep Forest

By Lev Utkin, Mikhail Ryabinin. 2017

Published in ArXiv.

0 to 5 of 19

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (18.5 KB)
19 citations
19762 views

Creators

Kenta Nakai

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy