Secondary Mushroom
Donated on 8/13/2023
Dataset of simulated mushrooms for binary classification into edible and poisonous.
Dataset Characteristics
Tabular
Subject Area
Biology
Associated Tasks
Classification
Feature Type
Real
# Instances
61068
# Features
20
Dataset Information
For what purpose was the dataset created?
Inspired by the Mushroom Data Set of J. Schlimmer: url:https://archive.ics.uci.edu/ml/datasets/Mushroom.
Additional Information
The given information is about the Secondary Mushroom Dataset, the Primary Mushroom Dataset used for the simulation and the respective metadata can be found in the zip. This dataset includes 61069 hypothetical mushrooms with caps based on 173 species (353 mushrooms per species). Each mushroom is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended (the latter class was combined with the poisonous class). The related Python project contains a Python module secondary_data_generation.py used to generate this data based on primary_data_edited.csv also found in the repository. Both nominal and metrical variables are a result of randomization. The simulated and ordered by species version is found in secondary_data_generated.csv. The randomly shuffled version is found in secondary_data_shuffled.csv.
Has Missing Values?
No
Introductory Paper
By Dennis Wagner, D. Heider, Georges Hattab. 2021
Published in Scientific Reports
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
class | Target | Categorical | no | ||
cap-diameter | Feature | Continuous | no | ||
cap-shape | Feature | Categorical | no | ||
cap-surface | Feature | Categorical | yes | ||
cap-color | Feature | Categorical | no | ||
does-bruise-or-bleed | Feature | Categorical | no | ||
gill-attachment | Feature | Categorical | yes | ||
gill-spacing | Feature | Categorical | yes | ||
gill-color | Feature | Categorical | no | ||
stem-height | Feature | Continuous | no |
0 to 10 of 21
Additional Variable Information
One binary class divided in edible=e and poisonous=p (with the latter one also containing mushrooms of unknown edibility). Twenty remaining variables (n: nominal, m: metrical) 1. cap-diameter (m): float number in cm 2. cap-shape (n): bell=b, conical=c, convex=x, flat=f, sunken=s, spherical=p, others=o 3. cap-surface (n): fibrous=i, grooves=g, scaly=y, smooth=s, shiny=h, leathery=l, silky=k, sticky=t, wrinkled=w, fleshy=e 4. cap-color (n): brown=n, buff=b, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y, blue=l, orange=o, black=k 5. does-bruise-bleed (n): bruises-or-bleeding=t,no=f 6. gill-attachment (n): adnate=a, adnexed=x, decurrent=d, free=e, sinuate=s, pores=p, none=f, unknown=? 7. gill-spacing (n): close=c, distant=d, none=f 8. gill-color (n): see cap-color + none=f 9. stem-height (m): float number in cm 10. stem-width (m): float number in mm 11. stem-root (n): bulbous=b, swollen=s, club=c, cup=u, equal=e, rhizomorphs=z, rooted=r 12. stem-surface (n): see cap-surface + none=f 13. stem-color (n): see cap-color + none=f 14. veil-type (n): partial=p, universal=u 15. veil-color (n): see cap-color + none=f 16. has-ring (n): ring=t, none=f 17. ring-type (n): cobwebby=c, evanescent=e, flaring=r, grooved=g, large=l, pendant=p, sheathing=s, zone=z, scaly=y, movable=m, none=f, unknown=? 18. spore-print-color (n): see cap color 19. habitat (n): grasses=g, leaves=l, meadows=m, paths=p, heaths=h, urban=u, waste=w, woods=d 20. season (n): spring=s, summer=u, autumn=a, winter=w
Class Labels
edible=e, poisonous=p
Dataset Files
File | Size |
---|---|
MushroomDataset.zip | 462.1 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset secondary_mushroom = fetch_ucirepo(id=848) # data (as pandas dataframes) X = secondary_mushroom.data.features y = secondary_mushroom.data.targets # metadata print(secondary_mushroom.metadata) # variable information print(secondary_mushroom.variables)
Wagner, D., Heider, D., & Hattab, G. (2021). Secondary Mushroom [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5FP5Q.
Keywords
Creators
Dennis Wagner
dwagner93@gmx.de
Product of bachelor thesis at Philipps-University Marburg, Bioinformatics Division
D. Heider
Georges Hattab
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.