Soybean (Large)

Donated on 7/10/1988

Michalski's famous soybean disease database

Dataset Characteristics

Multivariate

Subject Area

Biology

Associated Tasks

Classification

Feature Type

Categorical

# Instances

307

# Features

Dataset Information

Additional Information

There are 19 classes, only the first 15 of which have been used in prior work. The folklore seems to be that the last four classes are unjustified by the data since they have so few examples. There are 35 categorical attributes, some nominal and some ordered. The value "dna'' means does not apply. The values for attributes are encoded numerically, with the first value encoded as "0,'' the second as "1,'' and so forth. An unknown values is encoded as "?''.

Has Missing Values?

Yes

Variables Table

Variable Name	Role	Type	Description	Missing Values
class	Target	Categorical	diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot, phytophthora-rot, brown-stem-rot, powdery-mildew, downy-mildew, brown-spot, bacterial-blight, bacterial-pustule, purple-seed-stain, anthracnose, phyllosticta-leaf-spot, alternarialeaf-spot, frog-eye-leaf-spot, diaporthe-pod-&-stem-blight, cyst-nematode, 2-4-d-injury, herbicide-injury	no
date	Feature	Categorical	april,may,june,july,august,september,october,?	yes
plant-stand	Feature	Categorical	normal,lt-normal,?	yes
precip	Feature	Categorical	lt-norm,norm,gt-norm,?	yes
temp	Feature	Categorical	lt-norm,norm,gt-norm,?	yes
hail	Feature	Categorical	yes,no,?	yes
crop-hist	Feature	Categorical	diff-lst-year,same-lst-yr,same-lst-two-yrs,same-lst-sev-yrs,?	yes
area-damaged	Feature	Categorical	scattered,low-areas,upper-areas,whole-field,?	yes
severity	Feature	Categorical	minor,pot-severe,severe,?	yes
seed-tmt	Feature	Categorical	none,fungicide,other,?	yes

Rows per page

0 to 10 of 36

Additional Variable Information

1. date: april,may,june,july,august,september,october,?. 2. plant-stand: normal,lt-normal,?. 3. precip: lt-norm,norm,gt-norm,?. 4. temp: lt-norm,norm,gt-norm,?. 5. hail: yes,no,?. 6. crop-hist: diff-lst-year,same-lst-yr,same-lst-two-yrs, same-lst-sev-yrs,?. 7. area-damaged: scattered,low-areas,upper-areas,whole-field,?. 8. severity: minor,pot-severe,severe,?. 9. seed-tmt: none,fungicide,other,?. 10. germination: 90-100%,80-89%,lt-80%,?. 11. plant-growth: norm,abnorm,?. 12. leaves: norm,abnorm. 13. leafspots-halo: absent,yellow-halos,no-yellow-halos,?. 14. leafspots-marg: w-s-marg,no-w-s-marg,dna,?. 15. leafspot-size: lt-1/8,gt-1/8,dna,?. 16. leaf-shread: absent,present,?. 17. leaf-malf: absent,present,?. 18. leaf-mild: absent,upper-surf,lower-surf,?. 19. stem: norm,abnorm,?. 20. lodging: yes,no,?. 21. stem-cankers: absent,below-soil,above-soil,above-sec-nde,?. 22. canker-lesion: dna,brown,dk-brown-blk,tan,?. 23. fruiting-bodies: absent,present,?. 24. external decay: absent,firm-and-dry,watery,?. 25. mycelium: absent,present,?. 26. int-discolor: none,brown,black,?. 27. sclerotia: absent,present,?. 28. fruit-pods: norm,diseased,few-present,dna,?. 29. fruit spots: absent,colored,brown-w/blk-specks,distort,dna,?. 30. seed: norm,abnorm,?. 31. mold-growth: absent,present,?. 32. seed-discolor: absent,present,?. 33. seed-size: norm,lt-norm,?. 34. shriveling: absent,present,?. 35. roots: norm,rotted,galls-cysts,?.

Class Labels

-- 19 Classes diaporthe-stem-canker, charcoal-rot, rhizoctonia-root-rot, phytophthora-rot, brown-stem-rot, powdery-mildew, downy-mildew, brown-spot, bacterial-blight, bacterial-pustule, purple-seed-stain, anthracnose, phyllosticta-leaf-spot, alternarialeaf-spot, frog-eye-leaf-spot, diaporthe-pod-&-stem-blight, cyst-nematode, 2-4-d-injury, herbicide-injury.

Baseline Model Performance

Dataset Files

File	Size
backup-large.test	33.4 KB
soybean-large.test	31.8 KB
backup-large.data	26.7 KB
soybean-large.data	26 KB
soybean-explanation	26 KB

Rows per page

0 to 5 of 10

Reviews

There are no reviews for this dataset yet.

Download (22.8 KB)

0 citations

18566 views

Creators

R.S. Michalski

R.L. Chilausky

DOI

10.24432/C5JG6Z

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.