Secondary Mushroom

Donated on 8/13/2023

Dataset of simulated mushrooms for binary classification into edible and poisonous.

Dataset Characteristics

Tabular

Subject Area

Biology

Associated Tasks

Classification

Feature Type

Real

# Instances

61068

# Features

20

Dataset Information

For what purpose was the dataset created?

Inspired by the Mushroom Data Set of J. Schlimmer: url:https://archive.ics.uci.edu/ml/datasets/Mushroom.

Additional Information

The given information is about the Secondary Mushroom Dataset, the Primary Mushroom Dataset used for the simulation and the respective metadata can be found in the zip. This dataset includes 61069 hypothetical mushrooms with caps based on 173 species (353 mushrooms per species). Each mushroom is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended (the latter class was combined with the poisonous class). The related Python project contains a Python module secondary_data_generation.py used to generate this data based on primary_data_edited.csv also found in the repository. Both nominal and metrical variables are a result of randomization. The simulated and ordered by species version is found in secondary_data_generated.csv. The randomly shuffled version is found in secondary_data_shuffled.csv.

Has Missing Values?

No

Introductory Paper

Mushroom data creation, curation, and simulation to support classification tasks

By Dennis Wagner, D. Heider, Georges Hattab. 2021

Published in Scientific Reports

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
classTargetCategoricalno
cap-diameterFeatureContinuousno
cap-shapeFeatureCategoricalno
cap-surfaceFeatureCategoricalyes
cap-colorFeatureCategoricalno
does-bruise-or-bleedFeatureCategoricalno
gill-attachmentFeatureCategoricalyes
gill-spacingFeatureCategoricalyes
gill-colorFeatureCategoricalno
stem-heightFeatureContinuousno

0 to 10 of 21

Additional Variable Information

One binary class divided in edible=e and poisonous=p (with the latter one also containing mushrooms of unknown edibility). Twenty remaining variables (n: nominal, m: metrical) 1. cap-diameter (m): float number in cm 2. cap-shape (n): bell=b, conical=c, convex=x, flat=f, sunken=s, spherical=p, others=o 3. cap-surface (n): fibrous=i, grooves=g, scaly=y, smooth=s, shiny=h, leathery=l, silky=k, sticky=t, wrinkled=w, fleshy=e 4. cap-color (n): brown=n, buff=b, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y, blue=l, orange=o, black=k 5. does-bruise-bleed (n): bruises-or-bleeding=t,no=f 6. gill-attachment (n): adnate=a, adnexed=x, decurrent=d, free=e, sinuate=s, pores=p, none=f, unknown=? 7. gill-spacing (n): close=c, distant=d, none=f 8. gill-color (n): see cap-color + none=f 9. stem-height (m): float number in cm 10. stem-width (m): float number in mm 11. stem-root (n): bulbous=b, swollen=s, club=c, cup=u, equal=e, rhizomorphs=z, rooted=r 12. stem-surface (n): see cap-surface + none=f 13. stem-color (n): see cap-color + none=f 14. veil-type (n): partial=p, universal=u 15. veil-color (n): see cap-color + none=f 16. has-ring (n): ring=t, none=f 17. ring-type (n): cobwebby=c, evanescent=e, flaring=r, grooved=g, large=l, pendant=p, sheathing=s, zone=z, scaly=y, movable=m, none=f, unknown=? 18. spore-print-color (n): see cap color 19. habitat (n): grasses=g, leaves=l, meadows=m, paths=p, heaths=h, urban=u, waste=w, woods=d 20. season (n): spring=s, summer=u, autumn=a, winter=w

Class Labels

edible=e, poisonous=p

Dataset Files

FileSize
MushroomDataset.zip462.1 KB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (462 KB)
1 citations
23713 views

Keywords

Creators

Dennis Wagner

dwagner93@gmx.de

Product of bachelor thesis at Philipps-University Marburg, Bioinformatics Division

D. Heider

Georges Hattab

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy