Soybean (Large)
Donated on 7/10/1988
Michalski's famous soybean disease database
Dataset Characteristics
Multivariate
Subject Area
Biology
Associated Tasks
Classification
Feature Type
Categorical
# Instances
307
# Features
35
Dataset Information
Additional Information
There are 19 classes, only the first 15 of which have been used in prior work. The folklore seems to be that the last four classes are unjustified by the data since they have so few examples. There are 35 categorical attributes, some nominal and some ordered. The value "dna'' means does not apply. The values for attributes are encoded numerically, with the first value encoded as "0,'' the second as "1,'' and so forth. An unknown values is encoded as "?''.
Has Missing Values?
Yes
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
class | Target | Categorical | diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot, phytophthora-rot, brown-stem-rot, powdery-mildew, downy-mildew, brown-spot, bacterial-blight, bacterial-pustule, purple-seed-stain, anthracnose, phyllosticta-leaf-spot, alternarialeaf-spot, frog-eye-leaf-spot, diaporthe-pod-&-stem-blight, cyst-nematode, 2-4-d-injury, herbicide-injury | no | |
date | Feature | Categorical | april,may,june,july,august,september,october,? | yes | |
plant-stand | Feature | Categorical | normal,lt-normal,? | yes | |
precip | Feature | Categorical | lt-norm,norm,gt-norm,? | yes | |
temp | Feature | Categorical | lt-norm,norm,gt-norm,? | yes | |
hail | Feature | Categorical | yes,no,? | yes | |
crop-hist | Feature | Categorical | diff-lst-year,same-lst-yr,same-lst-two-yrs,same-lst-sev-yrs,? | yes | |
area-damaged | Feature | Categorical | scattered,low-areas,upper-areas,whole-field,? | yes | |
severity | Feature | Categorical | minor,pot-severe,severe,? | yes | |
seed-tmt | Feature | Categorical | none,fungicide,other,? | yes |
0 to 10 of 36
Additional Variable Information
1. date: april,may,june,july,august,september,october,?. 2. plant-stand: normal,lt-normal,?. 3. precip: lt-norm,norm,gt-norm,?. 4. temp: lt-norm,norm,gt-norm,?. 5. hail: yes,no,?. 6. crop-hist: diff-lst-year,same-lst-yr,same-lst-two-yrs, same-lst-sev-yrs,?. 7. area-damaged: scattered,low-areas,upper-areas,whole-field,?. 8. severity: minor,pot-severe,severe,?. 9. seed-tmt: none,fungicide,other,?. 10. germination: 90-100%,80-89%,lt-80%,?. 11. plant-growth: norm,abnorm,?. 12. leaves: norm,abnorm. 13. leafspots-halo: absent,yellow-halos,no-yellow-halos,?. 14. leafspots-marg: w-s-marg,no-w-s-marg,dna,?. 15. leafspot-size: lt-1/8,gt-1/8,dna,?. 16. leaf-shread: absent,present,?. 17. leaf-malf: absent,present,?. 18. leaf-mild: absent,upper-surf,lower-surf,?. 19. stem: norm,abnorm,?. 20. lodging: yes,no,?. 21. stem-cankers: absent,below-soil,above-soil,above-sec-nde,?. 22. canker-lesion: dna,brown,dk-brown-blk,tan,?. 23. fruiting-bodies: absent,present,?. 24. external decay: absent,firm-and-dry,watery,?. 25. mycelium: absent,present,?. 26. int-discolor: none,brown,black,?. 27. sclerotia: absent,present,?. 28. fruit-pods: norm,diseased,few-present,dna,?. 29. fruit spots: absent,colored,brown-w/blk-specks,distort,dna,?. 30. seed: norm,abnorm,?. 31. mold-growth: absent,present,?. 32. seed-discolor: absent,present,?. 33. seed-size: norm,lt-norm,?. 34. shriveling: absent,present,?. 35. roots: norm,rotted,galls-cysts,?.
Class Labels
-- 19 Classes diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot, phytophthora-rot, brown-stem-rot, powdery-mildew, downy-mildew, brown-spot, bacterial-blight, bacterial-pustule, purple-seed-stain, anthracnose, phyllosticta-leaf-spot, alternarialeaf-spot, frog-eye-leaf-spot, diaporthe-pod-&-stem-blight, cyst-nematode, 2-4-d-injury, herbicide-injury.
Baseline Model Performance
Dataset Files
File | Size |
---|---|
backup-large.test | 33.4 KB |
soybean-large.test | 31.8 KB |
backup-large.data | 26.7 KB |
soybean-large.data | 26 KB |
soybean-explanation | 26 KB |
0 to 5 of 10
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset soybean_large = fetch_ucirepo(id=90) # data (as pandas dataframes) X = soybean_large.data.features y = soybean_large.data.targets # metadata print(soybean_large.metadata) # variable information print(soybean_large.variables)
Michalski, R. & Chilausky, R. (1980). Soybean (Large) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5JG6Z.
Creators
R.S. Michalski
R.L. Chilausky
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.