SUPPORT2

External

Linked on 9/14/2023

This dataset comprises 9105 individual critically ill patients across 5 United States medical centers, accessioned throughout 1989-1991 and 1992-1994. Each row concerns hospitalized patient records who met the inclusion and exclusion criteria for nine disease categories: acute respiratory failure, chronic obstructive pulmonary disease, congestive heart failure, liver disease, coma, colon cancer, lung cancer, multiple organ system failure with malignancy, and multiple organ system failure with sepsis. The goal is to determine these patients' 2- and 6-month survival rates based on several physiologic, demographics, and disease severity information. It is an important problem because it addresses the growing national concern over patients' loss of control near the end of life. It enables earlier decisions and planning to reduce the frequency of a mechanical, painful, and prolonged dying process.

Dataset Characteristics

Tabular, Multivariate

Subject Area

Health and Medicine

Associated Tasks

Classification, Regression, Other

Feature Type

Real, Categorical, Integer

# Instances

9105

# Features

42

Dataset Information

For what purpose was the dataset created?

To develop and validate a prognostic model that estimates survival over a 180-day period for seriously ill hospitalized adults (phase I of SUPPORT) and to compare this model's predictions with those of an existing prognostic system and with physicians' independent estimates (SUPPORT phase II).

Who funded the creation of the dataset?

Funded by the Robert Wood Johnson Foundation

What do the instances in this dataset represent?

The instances represent records of critically ill patients admitted to United States hospitals with advanced stages of serious illness.

Are there recommended data splits?

No recommendation, standard train-test split could be used. Can use three-way holdout split (i.e., train-validation-test) when doing model selection.

Does the dataset contain data that might be considered sensitive in any way?

Yes. There is information about race, gender, income, and education level.

Was there any data preprocessing performed?

No. Due to the high percentage of missing values, there are a couple of recommended imputation values: According to the HBiostat Repository (https://hbiostat.org/data/repo/supportdesc, Professor Frank Harrell) the following default values have been found to be useful in imputing missing baseline physiologic data: Baseline Variable Normal Fill-in Value - Serum albumin (alb) 3.5 - PaO2/FiO2 ratio (pafi) 333.3 - Bilirubin (bili) 1.01 - Creatinine (crea) 1.01 - bun 6.51 - White blood count (wblc) 9 (thousands) - Urine output (urine) 2502 There are 159 patients surviving 2 months for whom there were no patient or surrogate interviews. These patients have missing sfdm2.

Additional Information

Data sources are medical records, personal interviews, and the National Death Index (NDI). For each patient administrative records data, clinical data and survey data were collected. The objective of the SUPPORT project was to improve decision-making in order to address the growing national concern over the loss of control that patients have near the end of life and to reduce the frequency of a mechanical, painful, and prolonged process of dying. SUPPORT comprised a two-year prospective observational study (Phase I) followed by a two-year controlled clinical trial (Phase II). Phase I of SUPPORT collected data from patients accessioned during 1989-1991 to characterize the care, treatment preferences, and patterns of decision-making among critically ill patients. It also served as a preliminary step for devising an intervention strategy for improving critically-ill patients' care and for the construction of statistical models for predicting patient prognosis and functional status. An intervention was implemented in Phase II of SUPPORT, which accessioned patients during 1992-1994. The Phase II intervention provided physicians with accurate predictive information on future functional ability, survival probability to six months, and patients' preferences for end-of-life care. Additionally, a skilled nurse was provided as part of the intervention to elicit patient preferences, provide prognoses, enhance understanding, enable palliative care, and facilitate advance planning. The intervention was expected to increase communication, resulting in earlier decisions to have orders against resuscitation, decrease time that patients spent in undesirable states (e.g., in the Intensive Care Unit, on a ventilator, and in a coma), increase physician understanding of patients' preferences for care, decrease patient pain, and decrease hospital resource use. Data collection in both phases of SUPPORT consisted of questionnaires administered to patients, their surrogates, and physicians, plus chart reviews for abstracting clinical, treatment, and decision information. Phase II also collected information regarding the implementation of the intervention, such as patient-specific logs maintained by nurses assigned to patients as part of the intervention. SUPPORT patients were followed for six months after inclusion in the study. Those who did not die within six months or were lost to follow-up were matched against the National Death Index to identify deaths through 1997. Patients who did not die within one year or were lost to follow-up were matched against the National Death Index to identify deaths through 1997. All patients in five United States medical centers who met inclusion and exclusion criteria for nine disease categories: acute respiratory failure, chronic obstructive pulmonary disease, congestive heart failure, liver disease, coma, colon cancer, lung cancer, multiple organ system failure with malignancy, and multiple organ system failure with sepsis. SUPPORT is a combination of patients from 2 studies, each of which lasted 2 years. The first phase concerns 4,301 patients, whereas the second phase concerns 4,804 patients. Time wise, these studies were accessioned in 1989 (June 12) through 1991 (June 11) for phase I and in 1992 (January 7) through 1994 (January 24).

Has Missing Values?

No

Introductory Paper

A controlled trial to improve care for seriously ill hospitalized patients. The study to understand prognoses and preferences for outcomes and risks of treatments (SUPPORT)

By The SUPPORT Principal Investigators. 1995

Published in In the Journal of the American Medical Association, 274(20):1591–1598

Variables Table

Variable NameRoleTypeDemographicDescriptionUnitsMissing Values
idIDIntegerno
ageFeatureContinuousAgeAge of the patients in yearsyearsno
deathTargetContinuousDeath at any time up to National Death Index (NDI) data on 31 of December of 1994. Some patients are discharged before the end of the study and are not followed up. The authors looked up the information about death.no
sexFeatureCategoricalSexGender of the patient. Listed values are {male, female}.no
hospdeadTargetBinaryDeath in hospitalno
slosOtherContinuousDays from Study Entry to Dischargeno
d.timeOtherContinuousDays of follow-upno
dzgroupFeatureCategoricalThe patient's disease sub category amogst ARF/MOSF w/Sepsis, CHF, COPD, Cirrhosis, Colon Cancer, Coma, Lung Cancer, MOSF w/Malig.no
dzclassFeatureCategoricalThe patient's disease category amongst "ARF/MOSF", "COPD/CHF/Cirrhosis", "Cancer", "Coma".no
num.coFeatureContinuousThe number of simultaneous diseases (or comorbidities) exhibited by the patient. Values are ordinal with higher values indicating worse condition and chances of survival.no

0 to 10 of 48

Additional Variable Information

Class Labels

According to the HBiostat Repository (https://hbiostat.org/data/repo/supportdesc, Professor Frank Harrell) the following tasks have been found to be useful for education purposes: - Binary classification: Hospital death - Ordinal regression: The functional disability of the patient (variable sfdm2) on a 5 points scale (with 5 being the most severely disabled), was measured 2 months after study entry through patient or surrogate interviews. It uses the Sickness Impact Profile (SIP), a behavioral-based measure of health status. The variable has 5 levels mapped as follows: 1: No signs of moderate to severe functional disability from the interview. 2: Patient was unable to do 4 or more activities of daily living 3: Sickness Impact Profile total score at 2 months is greater or equal to 30. 4. Patient intubated or in coma 5: Patient died before 2 months after study entry For more detailed on the used scale, refer to https://www.sciencedirect.com/science/article/pii/089543569090224D?via%3Dihub - Regression Can predict the total hospital costs per patient. Can predict the length of stay for the patients.

Reviews

There are no reviews for this dataset yet.

Login to Write a Review
Dataset Home Page
1 citations
17813 views

Citations/Acknowledgements

If you use this dataset, please follow the acknowledgment policy on the original dataset website.

Creators

Frank Harrel

fh@fharrell.com

Department of Biostatistics

License

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Read Policy