Yeast
Donated on 8/31/1996
Predicting the Cellular Localization Sites of Proteins
Dataset Characteristics
Multivariate
Subject Area
Biology
Associated Tasks
Classification
Feature Type
Real
# Instances
1484
# Features
8
Dataset Information
Additional Information
Predicted Attribute: Localization site of protein. ( non-numeric ). The references below describe a predecessor to this dataset and its development. They also give results (not cross-validated) for classification by a rule-based expert system with that version of the dataset. Reference: "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110, 1991. Reference: "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992.
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
Sequence_Name | ID | Categorical | Accession number for the SWISS-PROT database | no | |
mcg | Feature | Continuous | McGeoch's method for signal sequence recognition. | no | |
gvh | Feature | Continuous | von Heijne's method for signal sequence recognition. | no | |
alm | Feature | Continuous | Score of the ALOM membrane spanning region prediction program. | no | |
mit | Feature | Continuous | Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins. | no | |
erl | Feature | Continuous | Presence of HDEL substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute. | no | |
pox | Feature | Continuous | Peroxisomal targeting signal in the C-terminus. | no | |
vac | Feature | Continuous | Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins. | no | |
nuc | Feature | Continuous | Score of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins. | no | |
localization_site | Target | Categorical | no |
0 to 10 of 10
Baseline Model Performance
Dataset Files
File | Size |
---|---|
yeast.data | 92.8 KB |
yeast.names | 3.2 KB |
Papers Citing this Dataset
Sort by Year, desc
By Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama. 2019
Published in ArXiv.
By Fredrik Hallgren, Paul Northrop. 2018
Published in ArXiv.
By David Hofmeyr. 2018
Published in ArXiv.
By Amarjot Singh, Nick Kingsbury. 2017
Published in ArXiv.
0 to 5 of 19
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset yeast = fetch_ucirepo(id=110) # data (as pandas dataframes) X = yeast.data.features y = yeast.data.targets # metadata print(yeast.metadata) # variable information print(yeast.variables)
Nakai, K. (1991). Yeast [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5KG68.
Creators
Kenta Nakai
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.