Sepsis Survival Minimal Clinical Records

Donated on 7/9/2023

The dataset consists of 110,204 admissions of 84,811 hospitalized subjects between 2011 and 2012 in Norway who were diagnosed with infections, systemic inflammatory response syndrome, sepsis by causative microbes, or septic shock. The prediction task is to determine whether a patient survived or is deceased at a time of about 9 days after collecting their medical record at the hospital. This is an important prediction problem in clinical medicine. Sepsis is a life-threatening condition triggered by an immune overreaction to infection, leading to organ failure or even death. Sepsis is associated with immediate death risk, often killing patients within one hour. This renders many laboratory tests and hospital analyses impractical for timely diagnosis and treatment. Being able to predict the survival of patients within minutes with as few and easy-to-retrieve medical features as possible is very important.

Dataset Characteristics

Multivariate

Subject Area

Health and Medicine

Associated Tasks

Classification

Feature Type

Integer

# Instances

110341

# Features

Dataset Information

What do the instances in this dataset represent?

For the primary cohort, they represent records of patients affected by sepsis potential preconditions (ante Sepsis-3 definition); for the study cohort, they represent only the patients’ admissions defined by the novel Sepsis-3 definition.

Are there recommended data splits?

No recommendation, standard train-test split could be used. Can use three-way holdout split (i.e., training, validation/development, testing) when doing model selection.

Does the dataset contain data that might be considered sensitive in any way?

Yes. It contains information about the gender and age of the patient.

Was there any data preprocessing performed?

All the categorical variables have been encoded (so no preprocessing is necessary).

Additional Information

Primary cohort from Norway: - 4 features for 110,204 patient admissions - file: 's41598-020-73558-3_sepsis_survival_primary_cohort.csv' Study cohort (a subset of the primary cohort) from Norway: - 4 features for 19,051 patient admissions - file: 's41598-020-73558-3_sepsis_survival_study_cohort.csv' Validation cohort from South Korea: - 4 features for 137 patients - file: 's41598-020-73558-3_sepsis_survival_validation_cohort.csv' The validation cohort from South Korea was used by Chicco and Jurman (2020) as an external validation cohort to confirm the generalizability of their proposed approach.

Has Missing Values?

Introductory Paper

Survival prediction of patients with sepsis from age, sex, and septic episode number alone

By D. Chicco, Giuseppe Jurman. 2020

Published in Scientific Reports 10

Variables Table

Variable Name	Role	Type	Demographic	Description	Units	Missing Values
age_years	Feature	Integer	Age	Age of the patient in years.	years	no
sex_0male_1female	Feature	Binary	Gender	Gender of the patient. Values are encoded as follows: {0: male, 1: female}		no
episode_number	Feature	Integer		Number of prior Sepsis episodes		no
hospital_outcome_1alive_0dead	Target	Binary		Status of the patient after 9,351 days of being admitted to the hospital. Values are encoded as follows: {1: Alive, 0: Dead}		no

Rows per page

0 to 4 of 4

Dataset Files

File	Size
s41598-020-73558-3_sepsis_survival_dataset.zip	219.8 KB

Reviews

There are no reviews for this dataset yet.

Download (219.9 KB)

1 citations

14759 views

Creators

Davide Chicco

davidechicco@davidechicco.it

Giuseppe Jurman

DOI

10.24432/C53C8N

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.