Toxicity
Donated on 5/4/2022
The dataset includes 171 molecules designed for functional domains of a core clock protein, CRY1, responsible for generating circadian rhythm. 56 of the molecules are toxic and the rest are non-toxic.
Dataset Characteristics
Tabular
Subject Area
Biology
Associated Tasks
Classification
Feature Type
-
# Instances
171
# Features
1203
Dataset Information
What do the instances in this dataset represent?
Small molecules
Was there any data preprocessing performed?
The data consists a complete set of 1203 molecular descriptors and needs feature selection before classification since some of the features are redundant. We used Recursive Feature Elimination together with Decision Tree Classifier (DTC) to get the best set of molecular descriptors for DTC. Subsetted data with 13 features is included as supplementary file.
Has Missing Values?
No
Introductory Paper
By Seref Gul, F. Rahim, Safak Isin, Fatma Yilmaz, Nuri Ozturk, M. Turkay, I. Kavakli. 2021
Published in Scientific reports
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
MATS3v | Feature | Continuous | no | ||
nHBint10 | Feature | Integer | no | ||
MATS3s | Feature | Continuous | no | ||
MATS3p | Feature | Continuous | no | ||
nHBDon_Lipinski | Feature | Integer | no | ||
minHBint8 | Feature | Continuous | no | ||
MATS3e | Feature | Continuous | no | ||
MATS3c | Feature | Continuous | no | ||
minHBint2 | Feature | Continuous | no | ||
MATS3m | Feature | Continuous | no |
0 to 10 of 1204
Dataset Files
File | Size |
---|---|
data.csv | 1.2 MB |
Classification_figure.png | 21.3 KB |
Toxicity-13F.csv | 17 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset toxicity = fetch_ucirepo(id=728) # data (as pandas dataframes) X = toxicity.data.features y = toxicity.data.targets # metadata print(toxicity.metadata) # variable information print(toxicity.variables)
Gül, Ş. & RAHIM, F. (2021). Toxicity [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C59313.
Keywords
Creators
Şeref Gül
serefgul@ku.edu.tr
Koç University
FATIH RAHIM
frahim@ku.edu.tr
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.