Anticancer peptides

Donated on 10/10/2019

Peptides with experimental annotations on their anticancer action on breast and lung cancer cells.

Dataset Characteristics


Subject Area

Life Science

Associated Tasks


Feature Type


# Instances


# Features


Dataset Information

Additional Information

Membranolytic anticancer peptides (ACPs) are drawing increasing attention as potential future therapeutics against cancer, due to their ability to hinder the development of cellular resistance and their potential to overcome common hurdles of chemotherapy, e.g., side effects and cytotoxicity. This dataset contains information on peptides (annotated for their one-letter amino acid code) and their anticancer activity on breast and lung cancer cell lines. Two peptide datasets targeting breast and lung cancer cells were assembled and curated manually from CancerPPD. EC50, IC50, LD50 and LC50 annotations on breast and lung cancer cells were retained (breast cell lines: MCF7 = 57%, MDA-MB-361 = 11%, MT-1 = 9%; lung cell lines: H-1299 = 45%, A-549 = 17.7%); mg ml−1 values were converted to μM units. Linear and l-chiral peptides were retained, while cyclic, mixed or d-chiral peptides were discarded. In the presence of both amidated and non-amidated data for the same sequence, only the value referred to the amidated peptide was retained. Peptides were split into three classes for model training: (1) very active (EC/IC/LD/LC50 ≤ 5 μM), (2) moderately active (EC/IC/LD/LC50 values up to 50 μM) and (3) inactive (EC/IC/LD/LC50 > 50 μM) peptides. Duplicates with conflicting class annotations were compared manually to the original sources, and, if necessary, corrected. If multiple class annotations were present for the same sequence, the most frequently represented class was chosen; in case of ties, the less active class was chosen. Since the CancerPPD is biased towards the annotation of active peptides, we built a set of presumably inactive peptides by randomly extracting 750 alpha-helical sequences from crystal structures deposited in the Protein Data Bank (7–30 amino acids). The final training sets contained 949 peptides for Breast cancer and 901 peptides for Lung cancer. The datasets were used to develop neural networks model for anticancer peptide design and are provided as .csv file in a .zip folder. Additional details can be found in: Grisoni, F., Neuhaus, C.S., Hishinuma, M., Gabernet, G., Hiss, J.A., Kotera, M. and Schneider, G., 2019. De novo design of anticancer peptides by ensemble artificial neural networks. Journal of Molecular Modeling, 25(5), 112.

Has Missing Values?


Variable Information

The dataset contains three attributes: 1. Peptide ID 2. One-letter amino-acid sequence 3. Class (active, moderately active, experimental inactive, virtual inactive)

0 citations


By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy