Donated on 8/31/1996

Predicting the Cellular Localization Sites of Proteins

Dataset Characteristics


Subject Area


Associated Tasks


Feature Type


# Instances


# Features


Dataset Information

Additional Information

Predicted Attribute: Localization site of protein. ( non-numeric ). The references below describe a predecessor to this dataset and its development. They also give results (not cross-validated) for classification by a rule-based expert system with that version of the dataset. Reference: "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110, 1991. Reference: "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992.

Has Missing Values?


Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
Sequence_NameIDCategoricalAccession number for the SWISS-PROT databaseno
mcgFeatureContinuousMcGeoch's method for signal sequence
gvhFeatureContinuousvon Heijne's method for signal sequence
almFeatureContinuousScore of the ALOM membrane spanning region prediction
mitFeatureContinuousScore of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial
erlFeatureContinuousPresence of HDEL substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary
poxFeatureContinuousPeroxisomal targeting signal in the
vacFeatureContinuousScore of discriminant analysis of the amino acid content of vacuolar and extracellular
nucFeatureContinuousScore of discriminant analysis of nuclear localization signals of nuclear and non-nuclear

0 to 10 of 10

Baseline Model Performance

Papers Citing this Dataset

On Possibility and Impossibility of Multiclass Classification with Rejection

By Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama. 2019

Published in ArXiv.

Incremental kernel PCA and the Nystr"om method

By Fredrik Hallgren, Paul Northrop. 2018

Published in ArXiv.

Degrees of Freedom and Model Selection for k-means Clustering

By David Hofmeyr. 2018

Published in ArXiv.

Multi-Resolution Dual-Tree Wavelet Scattering Network for Signal Classification

By Amarjot Singh, Nick Kingsbury. 2017

Published in ArXiv.

A Siamese Deep Forest

By Lev Utkin, Mikhail Ryabinin. 2017

Published in ArXiv.

0 to 5 of 19


There are no reviews for this dataset yet.

Login to Write a Review
19 citations


Kenta Nakai


By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy