Review Article
A review of uses of health care utilization databases for epidemiologic research on therapeutics

https://doi.org/10.1016/j.jclinepi.2004.10.012Get rights and content

Abstract

Objective

Large health care utilization databases are frequently used in variety of settings to study the use and outcomes of therapeutics. Their size allows the study of infrequent events, their representativeness of routine clinical care makes it possible to study real-world effectiveness and utilization patterns, and their availability at relatively low cost without long delays makes them accessible to many researchers. However, concerns about database studies include data validity, lack of detailed clinical information, and a limited ability to control confounding.

Study Design and Setting

We consider the strengths, limitations, and appropriate applications of health care utilization databases in epidemiology and health services research, with particular reference to the study of medications.

Conclusion

Progress has been made on many methodologic issues related to the use of health care utilization databases in recent years, but important areas persist and merit scrutiny.

Introduction

It is widely accepted that randomized clinical trials (RCT) cannot provide all necessary information about the safe and effective use of medicines at the time they are marketed. This stems from the inherent limitations of RCTs during drug development: They usually have a small sample size that often under-represents vulnerable patient groups, and they focus on short-term efficacy and safety in a controlled environment that is often far from routine clinical practice. Moreover, the RCT outcome sufficient to win marketing approval—short-term improvement in a surrogate marker compared with the effect of placebo—often fails to answer the more relevant questions that face doctors and patients. Such limitations make it inevitable that epidemiologic research is performed post marketing to define these issues [1]. Although the focus of pharmacoepidemiology is on post-marketing surveillance of drugs, biologics [2], and medical devices [3], the approach has valuable applications in the pre-marketing phase to assess the safety profile of drugs and put them into context of the natural history of the condition they are designed to treat [4].

Although pharmacoepidemiology makes use of all epidemiologic study designs and data sources, in recent years there has been enormous growth in the use of large health care databases [5]. These are made up of the automated electronic recording of filled prescriptions, professional services, and hospitalizations; such data are increasingly collected routinely for the payment and administration of health services. Beyond this, electronic medical records often contain detailed clinical information, patients' reports of symptoms, the findings of physical examinations, and the results of diagnostic tests. However, researchers more frequently use insurance data on submitted claims for specific services, procedures, and pharmaceuticals. These are usually less detailed in their clinical contents but often representative and complete for very large patient populations, including elderly patients, children, the very poor, and those in nursing homes who are most often under-represented in or totally excluded from clinical trials.

Clinical epidemiologists can answer a wide spectrum of research questions with database studies, but they must be aware of the specific issues that can compromise their validity and of recent methodologic advances to address these shortcomings. This article outlines the breadth of research applications using databases, the issues that may compromise the validity of such studies, and approaches to managing such analytic challenges. The goal is to provide researchers with a methodological framework and to comment on the value of new techniques for more advanced users.

Section snippets

Research applications with database studies

Research applications of databases vary broadly. Most take advantage of the strengths of these datasets: (1) Their large size allows the study of rare events, (2) their representativeness of routine clinical care makes it possible to study real-world effectiveness and utilization patterns, and (3) their availability at relatively low cost and without long delays makes them accessible and efficient.

From patients to records and claims databases

One advantage of health care utilization databases (i.e., their representativeness of routine clinical practice in large populations) is also a disadvantage (i.e., the reliance on previously collected data generated primarily for administrative purposes). In other epidemiologic studies that use primary data collection, the timing of data collection and the detail and accuracy of data are to a large extent under the control of the investigator. By contrast, in administrative databases a record

Validity of clinical information

As in any epidemiologic study one must consider a cascade of potential biases that may come into play between an underlying causal relation and the reported findings of a database study [41]. The following issues are more likely to be present in database studies, although they are not unique to them.

Primary medical record review

Validation of diagnoses is generally recommended if there is any doubt about the specificity of the coding of the study outcome because specificity of outcome classification is key for unbiased relative risk estimates. Sudden, highly symptomatic events that lead to hospitalizations are more likely to have a low rate of false positives, whereas outcomes with insidious onset and less clearly defined diagnostic criteria are likely to be less specific. Findings of published validation studies

Conclusion

The growing trend of recording data on all medical encounters in electronic format is making large utilization-based datasets more and more common in health care. Their representativeness, large size, and capacity to contain large quantities of longitudinal clinical data on each patient can make such datasets useful for clinical epidemiologic research, especially on medication utilization and outcomes. Such pharmacoepidemiologic applications extend from studies of physician prescribing and

Acknowledgments

Dr. Schneeweiss received support from the National Institute on Aging (RO1-AG021950) and the Agency for Healthcare Research and Quality (2-RO1-HS10881), Department of Health and Human Services, Rockville, MD. We thank our colleagues in the Division of Pharmacoepidemiology and Pharmacoeconomics at the Brigham and Women's Hospital for their helpful discussions: Robert J Glynn, PhD, ScD; Til Stürmer, MD, MPH; Allan Brookhart, PhD; and Ken Rothman, DMD, DrPH.

References (117)

  • M. Wilchesky et al.

    Validation of diagnostic codes within medical services claims

    J Clin Epidemiol

    (2004)
  • Y. Kiyota et al.

    The accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value based on review of hospital records

    Am Heart J

    (2004)
  • C.R. Meier et al.

    Acute respiratory-tract infections and risk of first-time acute myocardial infarction

    Lancet

    (1998)
  • F. Barbone et al.

    Association of road-traffic accidents with benzodiazepine use

    Lancet

    (1998)
  • R.A. Deyo et al.

    Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases

    J Clin Epidemiol

    (1992)
  • M.E. Charlson et al.

    A new method of classifying prognostic comorbidity in longitudinal studies: development and validation

    J Chron Dis

    (1987)
  • H. Quan et al.

    Assessing accuracy of diagnosis-type indicators for flagging complications in administrative data

    J Clin Epidemiol

    (2004)
  • R.J. Marshall

    Validation study methods for estimating exposure proportions and odds ratios with misclassified data

    J Clin Epidemiol

    (1990)
  • S. Schneeweiss et al.

    A Medicare database review found that physician preferences increasingly outweighed patient characteristics as determinants of first-time prescriptions for cox-2 inhibitors

    J Clin Epidemiol

    (2005)
  • P. Peduzzi et al.

    A simulation study of the number of events per variable in logistic regression analysis

    J Clin Epidemiol

    (1996)
  • W.A. Ray et al.

    Non-steroidal anti-inflammatory drugs and risk of serious coronary heart disease: an observational cohort study

    Lancet

    (2002)
  • J.P. Vandenbroucke

    When are observational studies as credible as randomized trials?

    Lancet

    (2004)
  • P.S. Wang et al.

    Use of the case-crossover design to study prolonged drug exposures and insidious outcomes

    Ann Epidemiol

    (2004)
  • J. Avorn

    Powerful medicines: the benefits, risks, and costs of prescription drugs

    (2004)
  • The Centers for Education and Research on Therapeutics (CERTs) Risk Assessment Workshop Participants

    Risk assessment of drugs, biologics and therapeutic devices: present and future issues

    Pharmacoepidemiol Drug Safety

    (2003)
  • A. Arana et al.

    What do we show and who does so? An analysis of the abstracts presented at the 19th ICPE

    Pharmacoepidemiol Drug Safety

    (2004)
  • J. Hallas et al.

    The waiting time distribution as a graphical approach to epidemiologic measures of drug utilization

    Epidemiology

    (1997)
  • J.S. Benner et al.

    Long-term persistence in use of statin therapy in elderly patients

    JAMA

    (2002)
  • S.B. Soumerai et al.

    Adverse outcomes of underuse of beta-blockers in elderly survivors of acute myocardial infarction

    JAMA

    (1997)
  • E.L. Knight et al.

    Quality indicators for appropriate medication use in vulnerable elders

    Ann Intern Med

    (2001)
  • Brookhart MA, Solomon DH, Wang P, Glynn RJ, Avorn J, Schneeweiss S. Quantifying sources of explained variation in...
  • R. Tamblyn et al.

    Physician and practice characteristics associated with the early utilization of new prescription drugs

    Med Care

    (2003)
  • C.A. Gatsonis et al.

    Variations in the utilization of coronary angiography for elderly patients with an acute myocardial infarction: an analysis using hierarchical logistic regression

    Med Care

    (1995)
  • N. Goldfield

    Physician profiling and risk adjustment

    (1999)
  • E.M. Rodriguez et al.

    The role of databases in drug postmarketing surveillance

    Pharmacoepidemiol Drug Saf

    (2001)
  • B.L. Strom

    Sample size considerations for pharmacoepidemiology studies

  • L.B. Weatherby et al.

    The impact of wording in “Dear doctor” letters and in black box labels

    Clin Pharmacol Ther

    (2002)
  • B. Rockhill et al.

    Use and misuse of population attributable fractions

    Am J Public Health

    (1998)
  • O.S. Miettinen

    The need for randomization in the study of intended effects

    Stat Med

    (1983)
  • F. Grodstein et al.

    Understanding the divergent data on postmenopausal hormone therapy

    N Engl J Med

    (2003)
  • A.M. Walker et al.

    Mortality in current and former users of clozapine

    Epidemiology

    (1997)
  • The West of Scotland Coronary Preventive Study Group

    Computerised record linkage: compared with traditional patient follow-up methods in clinical trials and illustrated in a prospective epidemiological study

    J Clin Epidemiol

    (1995)
  • S.B. Soumerai et al.

    A critical analysis of studies of state drug reimbursement policies: research in need of discipline

    Milbank Q

    (1993)
  • R. Tamblyn et al.

    Adverse events associated with prescription drug cost-sharing among poor and elderly persons

    JAMA

    (2001)
  • S.B. Soumerai et al.

    Effects of Medicaid drug-payment limits on admission to hospitals and nursing homes

    N Engl J Med

    (1991)
  • S. Schneeweiss et al.

    Outcomes of reference pricing for angiotensin-converting enzyme inhibitors

    New Engl J Med

    (2002)
  • S. Hennessy et al.

    Descriptive analyses of the integrity of a US Medicaid claims database

    Pharmacoepidemiol Drug Saf

    (2003)
  • H.T. Sorenson et al.

    A framework for evaluation of secondary data sources for epidemiological research

    Int J Epidemiol

    (1996)
  • J. Hallas

    Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis

    Epidemiology

    (1996)
  • A.M. Walker

    Confounding by indication

    Epidemiology

    (1996)
  • Cited by (979)

    View all citing articles on Scopus
    View full text