Review ArticleA review of uses of health care utilization databases for epidemiologic research on therapeutics
Introduction
It is widely accepted that randomized clinical trials (RCT) cannot provide all necessary information about the safe and effective use of medicines at the time they are marketed. This stems from the inherent limitations of RCTs during drug development: They usually have a small sample size that often under-represents vulnerable patient groups, and they focus on short-term efficacy and safety in a controlled environment that is often far from routine clinical practice. Moreover, the RCT outcome sufficient to win marketing approval—short-term improvement in a surrogate marker compared with the effect of placebo—often fails to answer the more relevant questions that face doctors and patients. Such limitations make it inevitable that epidemiologic research is performed post marketing to define these issues [1]. Although the focus of pharmacoepidemiology is on post-marketing surveillance of drugs, biologics [2], and medical devices [3], the approach has valuable applications in the pre-marketing phase to assess the safety profile of drugs and put them into context of the natural history of the condition they are designed to treat [4].
Although pharmacoepidemiology makes use of all epidemiologic study designs and data sources, in recent years there has been enormous growth in the use of large health care databases [5]. These are made up of the automated electronic recording of filled prescriptions, professional services, and hospitalizations; such data are increasingly collected routinely for the payment and administration of health services. Beyond this, electronic medical records often contain detailed clinical information, patients' reports of symptoms, the findings of physical examinations, and the results of diagnostic tests. However, researchers more frequently use insurance data on submitted claims for specific services, procedures, and pharmaceuticals. These are usually less detailed in their clinical contents but often representative and complete for very large patient populations, including elderly patients, children, the very poor, and those in nursing homes who are most often under-represented in or totally excluded from clinical trials.
Clinical epidemiologists can answer a wide spectrum of research questions with database studies, but they must be aware of the specific issues that can compromise their validity and of recent methodologic advances to address these shortcomings. This article outlines the breadth of research applications using databases, the issues that may compromise the validity of such studies, and approaches to managing such analytic challenges. The goal is to provide researchers with a methodological framework and to comment on the value of new techniques for more advanced users.
Section snippets
Research applications with database studies
Research applications of databases vary broadly. Most take advantage of the strengths of these datasets: (1) Their large size allows the study of rare events, (2) their representativeness of routine clinical care makes it possible to study real-world effectiveness and utilization patterns, and (3) their availability at relatively low cost and without long delays makes them accessible and efficient.
From patients to records and claims databases
One advantage of health care utilization databases (i.e., their representativeness of routine clinical practice in large populations) is also a disadvantage (i.e., the reliance on previously collected data generated primarily for administrative purposes). In other epidemiologic studies that use primary data collection, the timing of data collection and the detail and accuracy of data are to a large extent under the control of the investigator. By contrast, in administrative databases a record
Validity of clinical information
As in any epidemiologic study one must consider a cascade of potential biases that may come into play between an underlying causal relation and the reported findings of a database study [41]. The following issues are more likely to be present in database studies, although they are not unique to them.
Primary medical record review
Validation of diagnoses is generally recommended if there is any doubt about the specificity of the coding of the study outcome because specificity of outcome classification is key for unbiased relative risk estimates. Sudden, highly symptomatic events that lead to hospitalizations are more likely to have a low rate of false positives, whereas outcomes with insidious onset and less clearly defined diagnostic criteria are likely to be less specific. Findings of published validation studies
Conclusion
The growing trend of recording data on all medical encounters in electronic format is making large utilization-based datasets more and more common in health care. Their representativeness, large size, and capacity to contain large quantities of longitudinal clinical data on each patient can make such datasets useful for clinical epidemiologic research, especially on medication utilization and outcomes. Such pharmacoepidemiologic applications extend from studies of physician prescribing and
Acknowledgments
Dr. Schneeweiss received support from the National Institute on Aging (RO1-AG021950) and the Agency for Healthcare Research and Quality (2-RO1-HS10881), Department of Health and Human Services, Rockville, MD. We thank our colleagues in the Division of Pharmacoepidemiology and Pharmacoeconomics at the Brigham and Women's Hospital for their helpful discussions: Robert J Glynn, PhD, ScD; Til Stürmer, MD, MPH; Allan Brookhart, PhD; and Ken Rothman, DMD, DrPH.
References (117)
- et al.
A retrospective cohort study of implantable medical devices and selected chronic disease in Medicare claims data
Ann Epidemiol
(2000) Pharmacoepidemiology in pre-approval clinical trial safety monitoring
J Clin Epidemiol
(1991)- et al.
The effect of physicians' training on prescribing beta-blockers for secondary prevention of myocardial infarction in the elderly
Ann Epidemiol
(2002) - et al.
Post-marketing studies of drug efficacy: how?
Am J Med
(1984) - et al.
Medicaid data as a resource for epidemiologic studies: strengths and limitations
J Clin Epidemiol
(1989) - et al.
Statins and the risk of dementia
Lancet
(2000) - et al.
What is Germany's experience on reference based drug pricing and the etiology of adverse health outcomes or substitution?
Health Policy
(1998) - et al.
The validity of medicaid pharmacy claims for estimating drug use among elderly nursing home residents: the Oregon experience
J Clin Epidemiol
(2000) - et al.
Completeness of prescription recording in outpatients medical records from a health maintenance organization
J Clin Epidemiol
(1994) Behavior of the exposure odds ratio in a case-control study when the hazard function is not constant over time
J Clin Epidemiol
(1989)
Validation of diagnostic codes within medical services claims
J Clin Epidemiol
The accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value based on review of hospital records
Am Heart J
Acute respiratory-tract infections and risk of first-time acute myocardial infarction
Lancet
Association of road-traffic accidents with benzodiazepine use
Lancet
Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases
J Clin Epidemiol
A new method of classifying prognostic comorbidity in longitudinal studies: development and validation
J Chron Dis
Assessing accuracy of diagnosis-type indicators for flagging complications in administrative data
J Clin Epidemiol
Validation study methods for estimating exposure proportions and odds ratios with misclassified data
J Clin Epidemiol
A Medicare database review found that physician preferences increasingly outweighed patient characteristics as determinants of first-time prescriptions for cox-2 inhibitors
J Clin Epidemiol
A simulation study of the number of events per variable in logistic regression analysis
J Clin Epidemiol
Non-steroidal anti-inflammatory drugs and risk of serious coronary heart disease: an observational cohort study
Lancet
When are observational studies as credible as randomized trials?
Lancet
Use of the case-crossover design to study prolonged drug exposures and insidious outcomes
Ann Epidemiol
Powerful medicines: the benefits, risks, and costs of prescription drugs
Risk assessment of drugs, biologics and therapeutic devices: present and future issues
Pharmacoepidemiol Drug Safety
What do we show and who does so? An analysis of the abstracts presented at the 19th ICPE
Pharmacoepidemiol Drug Safety
The waiting time distribution as a graphical approach to epidemiologic measures of drug utilization
Epidemiology
Long-term persistence in use of statin therapy in elderly patients
JAMA
Adverse outcomes of underuse of beta-blockers in elderly survivors of acute myocardial infarction
JAMA
Quality indicators for appropriate medication use in vulnerable elders
Ann Intern Med
Physician and practice characteristics associated with the early utilization of new prescription drugs
Med Care
Variations in the utilization of coronary angiography for elderly patients with an acute myocardial infarction: an analysis using hierarchical logistic regression
Med Care
Physician profiling and risk adjustment
The role of databases in drug postmarketing surveillance
Pharmacoepidemiol Drug Saf
Sample size considerations for pharmacoepidemiology studies
The impact of wording in “Dear doctor” letters and in black box labels
Clin Pharmacol Ther
Use and misuse of population attributable fractions
Am J Public Health
The need for randomization in the study of intended effects
Stat Med
Understanding the divergent data on postmenopausal hormone therapy
N Engl J Med
Mortality in current and former users of clozapine
Epidemiology
Computerised record linkage: compared with traditional patient follow-up methods in clinical trials and illustrated in a prospective epidemiological study
J Clin Epidemiol
A critical analysis of studies of state drug reimbursement policies: research in need of discipline
Milbank Q
Adverse events associated with prescription drug cost-sharing among poor and elderly persons
JAMA
Effects of Medicaid drug-payment limits on admission to hospitals and nursing homes
N Engl J Med
Outcomes of reference pricing for angiotensin-converting enzyme inhibitors
New Engl J Med
Descriptive analyses of the integrity of a US Medicaid claims database
Pharmacoepidemiol Drug Saf
A framework for evaluation of secondary data sources for epidemiological research
Int J Epidemiol
Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis
Epidemiology
Confounding by indication
Epidemiology
Cited by (979)
Validity of diagnoses of respiratory diseases recorded in a Japanese administrative database
2023, Respiratory InvestigationDrug and Natural Health Product Data Collection and Curation in the Canadian Longitudinal Study on Aging
2024, Canadian Journal on Aging