Anticancer peptides
Donated on 10/10/2019
Peptides with experimental annotations on their anticancer action on breast and lung cancer cells.
Dataset Characteristics
Sequential
Subject Area
Biology
Associated Tasks
Classification
Feature Type
-
# Instances
1850
# Features
2
Dataset Information
Additional Information
Membranolytic anticancer peptides (ACPs) are drawing increasing attention as potential future therapeutics against cancer, due to their ability to hinder the development of cellular resistance and their potential to overcome common hurdles of chemotherapy, e.g., side effects and cytotoxicity. This dataset contains information on peptides (annotated for their one-letter amino acid code) and their anticancer activity on breast and lung cancer cell lines. Two peptide datasets targeting breast and lung cancer cells were assembled and curated manually from CancerPPD. EC50, IC50, LD50 and LC50 annotations on breast and lung cancer cells were retained (breast cell lines: MCF7 = 57%, MDA-MB-361 = 11%, MT-1 = 9%; lung cell lines: H-1299 = 45%, A-549 = 17.7%); mg ml−1 values were converted to μM units. Linear and l-chiral peptides were retained, while cyclic, mixed or d-chiral peptides were discarded. In the presence of both amidated and non-amidated data for the same sequence, only the value referred to the amidated peptide was retained. Peptides were split into three classes for model training: (1) very active (EC/IC/LD/LC50 ≤ 5 μM), (2) moderately active (EC/IC/LD/LC50 values up to 50 μM) and (3) inactive (EC/IC/LD/LC50 > 50 μM) peptides. Duplicates with conflicting class annotations were compared manually to the original sources, and, if necessary, corrected. If multiple class annotations were present for the same sequence, the most frequently represented class was chosen; in case of ties, the less active class was chosen. Since the CancerPPD is biased towards the annotation of active peptides, we built a set of presumably inactive peptides by randomly extracting 750 alpha-helical sequences from crystal structures deposited in the Protein Data Bank (7–30 amino acids). The final training sets contained 949 peptides for Breast cancer and 901 peptides for Lung cancer. The datasets were used to develop neural networks model for anticancer peptide design and are provided as .csv file in a .zip folder. Additional details can be found in: Grisoni, F., Neuhaus, C.S., Hishinuma, M., Gabernet, G., Hiss, J.A., Kotera, M. and Schneider, G., 2019. De novo design of anticancer peptides by ensemble artificial neural networks. Journal of Molecular Modeling, 25(5), 112.
Has Missing Values?
No
Variable Information
The dataset contains three attributes: 1. Peptide ID 2. One-letter amino-acid sequence 3. Class (active, moderately active, experimental inactive, virtual inactive)
Dataset Files
File | Size |
---|---|
ACPs_Breast_cancer.csv | 36.5 KB |
ACPs_Lung_cancer.csv | 35.7 KB |
__MACOSX/._ACPs_Breast_cancer.csv | 509 Bytes |
__MACOSX/._ACPs_Lung_cancer.csv | 278 Bytes |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset anticancer_peptides = fetch_ucirepo(id=589) # data (as pandas dataframes) X = anticancer_peptides.data.features y = anticancer_peptides.data.targets # metadata print(anticancer_peptides.metadata) # variable information print(anticancer_peptides.variables)
Anticancer peptides [Dataset]. (2019). UCI Machine Learning Repository. https://doi.org/10.24432/C5T90F.
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.