Molecular Biology (Promoter Gene Sequences)

Donated on 6/29/1990

E. Coli promoter gene sequences (DNA) with partial domain theory

Dataset Characteristics

Sequential, Domain-Theory

Subject Area

Biology

Associated Tasks

Classification

Feature Type

Categorical

# Instances

106

# Features

-

Dataset Information

Additional Information

This dataset has been developed to help evaluate a "hybrid" learning algorithm ("KBANN") that uses examples to inductively refine preexisting knowledge. Using a "leave-one-out" methodology, the following errors were produced by various ML algorithms. (See Towell, Shavlik, & Noordewier, 1990, for details.) System -- Errors -- Comments ---------------------------------------------------------------- KBANN -- 4/106 -- a hybrid ML system BP -- 8/106 -- std backprop with one hidden layer O'Neill -- 12/106 -- ad hoc technique from the bio. lit. Near-Neigh -- 13/106 -- a nearest-neighbor algo (k=3) ID3 -- 19/106 -- Quinlan's decision-tree builder Type of domain: non-numeric, nominal (one of A, G, T, C) Note: DNA nucleotides can be grouped into a hierarchy, as shown below: X (any) / \ (purine) R Y (pyrimidine) / \ / \ A G T C Here is that hierachy in a text-friendly format: X (any) . R (purine) . . A . . G . Y (pyrimidine) . . T . . C

Has Missing Values?

No

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
no
no
no
no
no
no
no
no
no
no

0 to 10 of 58

Additional Variable Information

1. One of {+/-}, indicating the class ("+" = promoter). 2. The instance name (non-promoters named by position in the 1500-long nucleotide sequence provided by T. Record). 3-59. The remaining 57 fields are the sequence, starting at position -50 (p-50) and ending at position +7 (p7). Each of these fields is filled by one of {a, g, t, c}.

Dataset Files

FileSize
promoters.data7 KB
promoters.names3.4 KB
promoters.theory1.9 KB
Index172 Bytes

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (5.1 KB)
0 citations
6465 views

Creators

C. Harley

R. Reynolds

M. Noordewier

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy