Deepfakes: Medical Image Tamper Detection

Donated on 3/10/2020

Medical deepfakes: CT scans of human lungs, where some have been tampered with cancer added/removed. Can you find them?

Dataset Characteristics

Multivariate

Subject Area

Computer Science

Associated Tasks

Classification

Feature Type

Real

# Instances

20000

# Features

200000

Dataset Information

Additional Information

Attackers have the ability to intercept and add/remove medical evidence in medical imagery with high realism using deep learning. In this dataset we present medical deepfakes: 3D CT scans of human lungs, where some have been tampered with real cancer removed and with fake cancer injected. The objective of this dataset is to distinguish between real and fake cancers, and identify where medical scans have been tampered. Three expert radiologists have evaluated this dataset and could not reliably tell the difference between real and fake cancers, meaning that the fake cancers are realistic and this detection task is very challenging. For more information, please see our paper 'CT-GAN'. The dataset consists of two sets (80 scans and 20 scans). The first 80 were used in a blind trial with the radiologists (they weren't told they were tampered), and the 20 scans were used in an open trial with the radiologists (they were told the truth and asked to identify them). Provided with the scans is a table with the ground truth. For each scan, where a cancer is located (x, y, and z [slice#]) and its classification. A location can be classified as being: True-Benign, (TB): A location that actually has no cancer True-Malicious (TM): A location that has real cancer False-Benign (FB): A location that has real cancer, but it was removed. False-Malicious (FM): A location that does not have cancer, but fake cancer was injected there. Access to the dataset is via this link: https://drive.google.com/open?id=1R0WD_5IZ3NlyCiOPf1Ex74nBnZYQegwr

Has Missing Values?

Variable Information

Each scan is in the medical dicom format, but it can be loaded as a 3D matrix with Python by using the tools provided in our code repository: https://github.com/ymirsky/CT-GAN A scan is basically a series of 512x512 images. The series is usually about 100-300 slices long (the z axis). Cancers can occupy multiple slices along the z-axis. The value at each pixel is the Hounsfield unit (radiodensity) at that location.

Dataset Files

File	Size
data.zip	6 GB
Dataset Access.txt	258 Bytes

Reviews

There are no reviews for this dataset yet.

Download (6 GB)

0 citations

7780 views

DOI

10.24432/C5J318

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.