Sepsis Survival Minimal Clinical Records
Donated on 7/9/2023
The dataset consists of 110,204 admissions of 84,811 hospitalized subjects between 2011 and 2012 in Norway who were diagnosed with infections, systemic inflammatory response syndrome, sepsis by causative microbes, or septic shock. The prediction task is to determine whether a patient survived or is deceased at a time of about 9 days after collecting their medical record at the hospital. This is an important prediction problem in clinical medicine. Sepsis is a life-threatening condition triggered by an immune overreaction to infection, leading to organ failure or even death. Sepsis is associated with immediate death risk, often killing patients within one hour. This renders many laboratory tests and hospital analyses impractical for timely diagnosis and treatment. Being able to predict the survival of patients within minutes with as few and easy-to-retrieve medical features as possible is very important.
Dataset Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification
Feature Type
Integer
# Instances
110341
# Features
3
Dataset Information
What do the instances in this dataset represent?
For the primary cohort, they represent records of patients affected by sepsis potential preconditions (ante Sepsis-3 definition); for the study cohort, they represent only the patients’ admissions defined by the novel Sepsis-3 definition.
Are there recommended data splits?
No recommendation, standard train-test split could be used. Can use three-way holdout split (i.e., training, validation/development, testing) when doing model selection.
Does the dataset contain data that might be considered sensitive in any way?
Yes. It contains information about the gender and age of the patient.
Was there any data preprocessing performed?
All the categorical variables have been encoded (so no preprocessing is necessary).
Additional Information
Primary cohort from Norway: - 4 features for 110,204 patient admissions - file: 's41598-020-73558-3_sepsis_survival_primary_cohort.csv' Study cohort (a subset of the primary cohort) from Norway: - 4 features for 19,051 patient admissions - file: 's41598-020-73558-3_sepsis_survival_study_cohort.csv' Validation cohort from South Korea: - 4 features for 137 patients - file: 's41598-020-73558-3_sepsis_survival_validation_cohort.csv' The validation cohort from South Korea was used by Chicco and Jurman (2020) as an external validation cohort to confirm the generalizability of their proposed approach.
Has Missing Values?
No
Introductory Paper
By D. Chicco, Giuseppe Jurman. 2020
Published in Scientific Reports 10
Variables Table
Variable Name | Role | Type | Demographic | Description | Units | Missing Values |
---|---|---|---|---|---|---|
age_years | Feature | Integer | Age | Age of the patient in years. | years | no |
sex_0male_1female | Feature | Binary | Gender | Gender of the patient. Values are encoded as follows: {0: male, 1: female} | no | |
episode_number | Feature | Integer | Number of prior Sepsis episodes | no | ||
hospital_outcome_1alive_0dead | Target | Binary | Status of the patient after 9,351 days of being admitted to the hospital. Values are encoded as follows: {1: Alive, 0: Dead} | no |
0 to 4 of 4
Dataset Files
File | Size |
---|---|
s41598-020-73558-3_sepsis_survival_dataset.zip | 219.8 KB |
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset sepsis_survival_minimal_clinical_records = fetch_ucirepo(id=827) # data (as pandas dataframes) X = sepsis_survival_minimal_clinical_records.data.features y = sepsis_survival_minimal_clinical_records.data.targets # metadata print(sepsis_survival_minimal_clinical_records.metadata) # variable information print(sepsis_survival_minimal_clinical_records.variables)
Chicco, D. & Jurman, G. (2020). Sepsis Survival Minimal Clinical Records [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C53C8N.
Creators
Davide Chicco
davidechicco@davidechicco.it
Giuseppe Jurman
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.