Toxicity

Donated on 5/4/2022

The dataset includes 171 molecules designed for functional domains of a core clock protein, CRY1, responsible for generating circadian rhythm. 56 of the molecules are toxic and the rest are non-toxic.

Dataset Characteristics

Tabular

Subject Area

Biology

Associated Tasks

Classification

Feature Type

-

# Instances

171

# Features

1203

Dataset Information

What do the instances in this dataset represent?

Small molecules

Was there any data preprocessing performed?

The data consists a complete set of 1203 molecular descriptors and needs feature selection before classification since some of the features are redundant. We used Recursive Feature Elimination together with Decision Tree Classifier (DTC) to get the best set of molecular descriptors for DTC. Subsetted data with 13 features is included as supplementary file.

Has Missing Values?

No

Introductory Paper

Structure-based design and classifications of small molecules regulating the circadian rhythm period

By Seref Gul, F. Rahim, Safak Isin, Fatma Yilmaz, Nuri Ozturk, M. Turkay, I. Kavakli. 2021

Published in Scientific reports

Variables Table

Variable NameRoleTypeDescriptionUnitsMissing Values
MATS3vFeatureContinuousno
nHBint10FeatureIntegerno
MATS3sFeatureContinuousno
MATS3pFeatureContinuousno
nHBDon_LipinskiFeatureIntegerno
minHBint8FeatureContinuousno
MATS3eFeatureContinuousno
MATS3cFeatureContinuousno
minHBint2FeatureContinuousno
MATS3mFeatureContinuousno

0 to 10 of 1204

Dataset Files

FileSize
data.csv1.2 MB
Classification_figure.png21.3 KB
Toxicity-13F.csv17 KB

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Download (1.2 MB)
1 citations
7682 views

Creators

Şeref Gül

serefgul@ku.edu.tr

Koç University

FATIH RAHIM

frahim@ku.edu.tr

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy