Yeast

Donated on 8/31/1996

Predicting the Cellular Localization Sites of Proteins

Dataset Characteristics

Multivariate

Subject Area

Biology

Associated Tasks

Classification

Feature Type

Real

# Instances

1484

# Features

Dataset Information

Additional Information

Predicted Attribute: Localization site of protein. ( non-numeric ). The references below describe a predecessor to this dataset and its development. They also give results (not cross-validated) for classification by a rule-based expert system with that version of the dataset. Reference: "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110, 1991. Reference: "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992.

Has Missing Values?

Variables Table

Variable Name	Role	Type	Description	Missing Values
Sequence_Name	ID	Categorical	Accession number for the SWISS-PROT database	no
mcg	Feature	Continuous	McGeoch's method for signal sequence recognition.	no
gvh	Feature	Continuous	von Heijne's method for signal sequence recognition.	no
alm	Feature	Continuous	Score of the ALOM membrane spanning region prediction program.	no
mit	Feature	Continuous	Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins.	no
erl	Feature	Continuous	Presence of HDEL substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute.	no
pox	Feature	Continuous	Peroxisomal targeting signal in the C-terminus.	no
vac	Feature	Continuous	Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins.	no
nuc	Feature	Continuous	Score of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins.	no
localization_site	Target	Categorical		no

Rows per page

0 to 10 of 10

Baseline Model Performance

Dataset Files

File	Size
yeast.data	92.8 KB
yeast.names	3.2 KB

Papers Citing this Dataset

On Possibility and Impossibility of Multiclass Classification with Rejection

By Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama. 2019

Published in ArXiv.

Incremental kernel PCA and the Nystr"om method

By Fredrik Hallgren, Paul Northrop. 2018

Published in ArXiv.

Degrees of Freedom and Model Selection for k-means Clustering

By David Hofmeyr. 2018

Published in ArXiv.

Multi-Resolution Dual-Tree Wavelet Scattering Network for Signal Classification

By Amarjot Singh, Nick Kingsbury. 2017

Published in ArXiv.

A Siamese Deep Forest

By Lev Utkin, Mikhail Ryabinin. 2017

Published in ArXiv.

Rows per page

0 to 5 of 19

Reviews

There are no reviews for this dataset yet.

Download (18.5 KB)

19 citations

28001 views

Creators

Kenta Nakai

DOI

10.24432/C5KG68

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

Yeast

Donated on 8/31/1996

Dataset Characteristics

Subject Area

Associated Tasks

Feature Type

# Instances

# Features

Dataset Information

Variables Table

Baseline Model Performance

Dataset Files

Papers Citing this Dataset

Reviews

Write a Review

Creators

DOI

License