Soybean (Small)
Donated on 12/31/1986
Michalski's famous soybean disease database
Dataset Characteristics
Multivariate
Subject Area
Biology
Associated Tasks
Classification
Feature Type
Categorical
# Instances
47
# Features
35
Dataset Information
Additional Information
A small subset of the original soybean database. See the reference for Fisher and Schlimmer in soybean-large.names for more information. Steven Souders wrote: > Figure 15 in the Michalski and Stepp paper (PAMI-82) says that the > discriminant values for the attribute CONDITION OF FRUIT PODS for the > classes Rhizoctonia Root Rot and Phytophthora Rot are "few or none" > and "irrelevant" respectively. However, in the SOYBEAN-SMALL dataset > I got from UCI, the value for this attribute is "dna" (does not apply) > for both classes. I show the actual data below for cases D3 > (Rhizoctonia Root Rot) and D4 (Phytophthora Rot). According to the > attribute names given in soybean-large.names, FRUIT-PODS is attribute > #28. If you look at column 28 in the data below (marked with arrows) > you'll notice that all cases of D3 and D4 have the same value. Thus, > the SOYBEAN-SMALL dataset from UCI could NOT have produced the results > in the Michalski and Stepp paper. I do not have that paper, but have found what is probably a later variation of that figure in Stepp's dissertation, which lists the value "normal" for the first 2 classes and "irrelevant" for the latter 2 classes. I believe that "irrelevant" is used here as a synonym for "not-applicable", "dna", and "does-not-apply". I believe that there is a mis-print in the figure he read in their PAMI-83 article. I have checked over each attribute value in this database. It corresponds exactly with the copies listed in both Stepp's and Fisher's dissertations.
Has Missing Values?
No
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
date | Feature | Categorical | no | ||
plant-stand | Feature | Categorical | no | ||
precip | Feature | Categorical | no | ||
temp | Feature | Categorical | no | ||
hail | Feature | Categorical | no | ||
crop-hist | Feature | Categorical | no | ||
area-damaged | Feature | Categorical | no | ||
severity | Feature | Categorical | no | ||
seed-tmt | Feature | Categorical | no | ||
germination | Feature | Categorical | no |
0 to 10 of 36
Additional Variable Information
1. date: april,may,june,july,august,september,october,?. 2. plant-stand: normal,lt-normal,?. 3. precip: lt-norm,norm,gt-norm,?. 4. temp: lt-norm,norm,gt-norm,?. 5. hail: yes,no,?. 6. crop-hist: diff-lst-year,same-lst-yr,same-lst-two-yrs, same-lst-sev-yrs,?. 7. area-damaged: scattered,low-areas,upper-areas,whole-field,?. 8. severity: minor,pot-severe,severe,?. 9. seed-tmt: none,fungicide,other,?. 10. germination: 90-100%,80-89%,lt-80%,?. 11. plant-growth: norm,abnorm,?. 12. leaves: norm,abnorm. 13. leafspots-halo: absent,yellow-halos,no-yellow-halos,?. 14. leafspots-marg: w-s-marg,no-w-s-marg,dna,?. 15. leafspot-size: lt-1/8,gt-1/8,dna,?. 16. leaf-shread: absent,present,?. 17. leaf-malf: absent,present,?. 18. leaf-mild: absent,upper-surf,lower-surf,?. 19. stem: norm,abnorm,?. 20. lodging: yes,no,?. 21. stem-cankers: absent,below-soil,above-soil,above-sec-nde,?. 22. canker-lesion: dna,brown,dk-brown-blk,tan,?. 23. fruiting-bodies: absent,present,?. 24. external decay: absent,firm-and-dry,watery,?. 25. mycelium: absent,present,?. 26. int-discolor: none,brown,black,?. 27. sclerotia: absent,present,?. 28. fruit-pods: norm,diseased,few-present,dna,?. 29. fruit spots: absent,colored,brown-w/blk-specks,distort,dna,?. 30. seed: norm,abnorm,?. 31. mold-growth: absent,present,?. 32. seed-discolor: absent,present,?. 33. seed-size: norm,lt-norm,?. 34. shriveling: absent,present,?. 35. roots: norm,rotted,galls-cysts,?.
Baseline Model Performance
Dataset Files
File | Size |
---|---|
soybean-explanation | 26 KB |
fisher-order | 3.4 KB |
stepp-order | 3.4 KB |
soybean-small.data | 3.4 KB |
soybean-small.names | 2.5 KB |
0 to 5 of 7
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset soybean_small = fetch_ucirepo(id=91) # data (as pandas dataframes) X = soybean_small.data.features y = soybean_small.data.targets # metadata print(soybean_small.metadata) # variable information print(soybean_small.variables)
Michalski, R. (1980). Soybean (Small) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5DS3P.
Creators
R. Michalski
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.