Interobserver variability in data collection of the APACHE II score in teaching and community hospitals

Crit Care Med. 1999 Sep;27(9):1999-2004. doi: 10.1097/00003246-199909000-00046.

Abstract

Objectives: To examine interobserver reliability of the Acute Physiologic and Chronic Health Evaluation (APACHE) II score and identify major causes of variability in data collection.

Design: Descriptive, comparative analysis.

Setting: Nine intensive care units in two teaching and six community hospitals

Subjects: A random sample of 342 patient records selected from a network database.

Intervention: None.

Measurements and main results: Data were reabstracted and compared with the original records. Individual physiologic points derived from the APACHE II scoring system (instead of the actual physiologic values) were compared using the kappa statistic. Paired measurements of the continuous variables were compared using the interclass correlation coefficient and Bland-Altman plots. Excellent agreement was found in most demographic, admission, and discharge data. The system failure requiring intensive care unit admission was consistently identified by both data collectors in 88% of cases, but only 66% agreed on the exact admitting diagnosis. For APACHE II score components, the kappa statistic ranged from 0.315 for the Glasgow Coma Scale point to 0.976 for the age point. Significant disagreement regarding the probability of death derived from the APACHE II model was evident in some patient records. Overall agreement among groups of patients regarding the APACHE II score was good, however, with no significant difference in the mean score (20.2 vs. 20.1; p = .758). The predicted mortality from the reabstracted data was 30%, similar to the 27% predicted mortality from the original data (p = .380).

Conclusion: Reliability of data collection varied widely in different components of the APACHE II probability-of-death model. Significant discrepancies in some components suggested a lack of explicit definitions and timing for consistent data collection between institutions or between data collectors. Nonetheless, variability resulting from data collection appears to be randomly distributed, so that comparisons of group means are valid.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • APACHE*
  • Glasgow Coma Scale
  • Hospital Mortality
  • Hospitals, Community / statistics & numerical data
  • Hospitals, Teaching / statistics & numerical data
  • Humans
  • Intensive Care Units / statistics & numerical data*
  • Observer Variation*
  • Ontario
  • Reproducibility of Results
  • Risk Assessment*