Deepfakes: Medical Image Tamper Detection
Donated on 3/10/2020
Medical deepfakes: CT scans of human lungs, where some have been tampered with cancer added/removed. Can you find them?
Dataset Characteristics
Multivariate
Subject Area
Computer Science
Associated Tasks
Classification
Feature Type
Real
# Instances
20000
# Features
200000
Dataset Information
Additional Information
Attackers have the ability to intercept and add/remove medical evidence in medical imagery with high realism using deep learning. In this dataset we present medical deepfakes: 3D CT scans of human lungs, where some have been tampered with real cancer removed and with fake cancer injected. The objective of this dataset is to distinguish between real and fake cancers, and identify where medical scans have been tampered. Three expert radiologists have evaluated this dataset and could not reliably tell the difference between real and fake cancers, meaning that the fake cancers are realistic and this detection task is very challenging. For more information, please see our paper 'CT-GAN'. The dataset consists of two sets (80 scans and 20 scans). The first 80 were used in a blind trial with the radiologists (they weren't told they were tampered), and the 20 scans were used in an open trial with the radiologists (they were told the truth and asked to identify them). Provided with the scans is a table with the ground truth. For each scan, where a cancer is located (x, y, and z [slice#]) and its classification. A location can be classified as being: True-Benign, (TB): A location that actually has no cancer True-Malicious (TM): A location that has real cancer False-Benign (FB): A location that has real cancer, but it was removed. False-Malicious (FM): A location that does not have cancer, but fake cancer was injected there. Access to the dataset is via this link: https://drive.google.com/open?id=1R0WD_5IZ3NlyCiOPf1Ex74nBnZYQegwr
Has Missing Values?
No
Variable Information
Each scan is in the medical dicom format, but it can be loaded as a 3D matrix with Python by using the tools provided in our code repository: https://github.com/ymirsky/CT-GAN A scan is basically a series of 512x512 images. The series is usually about 100-300 slices long (the z axis). Cancers can occupy multiple slices along the z-axis. The value at each pixel is the Hounsfield unit (radiodensity) at that location.
Dataset Files
File | Size |
---|---|
data.zip | 6 GB |
Dataset Access.txt | 258 Bytes |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset deepfakes_medical_image_tamper_detection = fetch_ucirepo(id=520) # data (as pandas dataframes) X = deepfakes_medical_image_tamper_detection.data.features y = deepfakes_medical_image_tamper_detection.data.targets # metadata print(deepfakes_medical_image_tamper_detection.metadata) # variable information print(deepfakes_medical_image_tamper_detection.variables)
Deepfakes: Medical Image Tamper Detection [Dataset]. (2020). UCI Machine Learning Repository. https://doi.org/10.24432/C5J318.
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.