Routinely collected data (RCD) are increasingly used for biomedical research. Extensive resources have been invested in this field: they include the set-up of disease registries and clinical databases at regional, national or international levels; the promotion of the use of electronic health records; and making use of wearable devices for the collection of health data. Analysis of this data can inform on descriptive features (prevalence or incidence of disease, treatments and risk factors), associations with putative risk factors and/or treatment effects of interventions (e.g., drugs, surgery, psychotherapy or medical devices).
Although descriptive estimates and associations offer interesting information, treatment effects are most important for clinical decision-making. They are the core of comparative effectiveness research. In this article, we focus primarily on RCD for determining treatment effects, because they are increasingly considered mainstream options for building evidence on treatment choices. The promises and hype of personalized medicine (or precision medicine, predictive medicine, participatory medicine, 4P or stratified medicine) are also similarly fueled by the widespread use of RCD. We do not use these terms here, because these promises have the same major challenges that are faced by traditional comparative effectiveness research — even to a higher degree — because they try to identify best options for single patients or small subgroups rather than larger populations. In this overview, we contrast the expectations many have of the use of RCD versus their limitations, discuss which expectations can be met and suggest potential changes in the research agenda for RCD.
Main strengths and weaknesses of routinely collected data
Big data studies with enormous sample sizes or real-world analyses of near-perfect representations of routine care fuel tremendous expectations for RCD in clinical decision-making. Although the traditional limitations of observational research remain, such extremes amplify strengths and weaknesses. The latter may increase exponentially by challenges specifically related to the very nature of data not collected for the purpose of research (e.g., additional biases or errors occurring when gigantic datasets have to be assembled, cleaned, processed, linked and retrospectively analyzed).
In theory, RCD have several advantages. Data collection under real-world circumstances maximizes representativeness and generalizability, minimizes costs and effort, and allows the capture of information in large populations and many clinical events in large datasets that are continuously updated and cover long periods.
However, these theoretical advantages should be viewed cautiously. First, many RCD are collected in situations where populations, diseases, settings and/or interventions are not representative (e.g., when data are collected in tertiary referral hospitals or in health care systems where the population or use of specific interventions are selected by ability to pay or other filters). Evaluation of newly approved drugs may be difficult because there are few existing routine data, and barriers to access innovative drugs may create strong confounding by indication. Second, costs are not necessarily low in all cases (e.g., many hospitals and health care systems make large investments in infrastructure and maintenance because of the increasing popularity of electronic health records). Fragmentation of efforts escalates cost compared with centralized systems that include all health care facilities in a country (e.g., the health care system in Taiwan1). Third, large sample sizes without thorough analytical safeguards can result in statistically significant false-positive and false-negative results.
The observational nature of RCD is an inherent limitation for the study of treatment effects. Which treatment is chosen depends on various known (e.g., severity of disease) or unknown factors that may be associated with the outcome. Such confounding by indication can invalidate real-world observations. Multiple statistical methods are used to reduce these biases (e.g., propensity scores and instrumental variables analyses),2,3 but only properly designed randomized controlled trials (RCTs) can pre-emptively overcome such biases.
Multiple errors and biases may interfere with routine data collection and processing (e.g., data linkage problems, misclassification bias and underreporting).3,4 This further reduces the validity of RCD. Additional steps, such as manual reviews of patient records, are sometimes incorporated to improve the quality of the RCD data. However, this adds to the cost and does not solve misclassification problems that occur when risk exposures and/or outcomes are ascertained in a nonstandardized way and when differences in coding practices also exist. Differences in management practice within and across institutions can reflect differences in several other confounding factors (e.g., disease severity).
Studies of RCD or better RCTs?
To understand how to best use RCD for health care decision-making, we should revisit the limitations of RCTs (the gold standard for studying treatment effects) and whether overcoming these limitations needs a better RCT agenda or use of RCD.
Generalizability and real-world relevance of clinical studies, in particular those that are used for drug approval, are often limited by narrow inclusion and exclusion criteria,5 and trial participants may have different characteristics than non-participants. Trials are frequently conducted under artificial conditions that differ from routine care (e.g., use of run-in periods, structured follow-up visits or standardized cotreatments). Certain populations are frequently underrepresented in RCTs, including children, women, older adults or patients with comorbidities and polypharmacy.6–10 Drug–drug interactions or adverse effects occurring in routine care may be overlooked. Cost considerations prohibit large studies that would be informative for subgroup-specific effects.
Some of these deficiencies may be best solved by improving the RCT agenda rather than turning to RCD. For example, the cost of RCTs can be reduced substantially, allowing very large sample sizes and better representativeness of the enrolled populations, if simple, pragmatic megatrials are adopted and RCD are used for collecting outcome information.11,12 Nevertheless, such megatrials are uncommon, and thus observational RCD studies are used to fill the evidence gap. For uncommon conditions, even megatrials would have few patients to inform on outcomes in these subgroups. Studies using RCD can reach sample sizes that are 100- to 1000-fold bigger than the sample sizes of large trials. However, the planning and reporting for claims of subgroup differences in clinical research have been dismal, and most claims are not validated.13 For example, it remains unknown whether the treatment effect suggested by RCD studies involving patients over 80 years of age with modest renal impairment, hypertension and taking three other drugs would be more reliable than the average treatment effect suggested by an RCT that involved patients with none or few of these characteristics.
Given the limited funds for RCTs, many important health care questions are not studied. Such evidence gaps could be addressed by a better RCT research agenda that prioritizes the use of pragmatic, patient-important outcomes14 and relevant head-to-head comparisons.5,15,16 Some comparative effectiveness evidence may also be accommodated by network meta-analyses of RCTs.5,15,17 However, even then, an exhaustive evaluation of treatment effects on mortality and other patient-important outcomes (including major harms) with RCTs alone is unrealistic. Here, RCD could fill many evidence gaps. One may then decide that the RCD evidence is strong enough to lead to policy or guideline changes, or the RCD evidence may be used to guide the design of future RCTs. There are also situations where conducting RCTs would be unrealistic or perceived as unethical.18
Randomized controlled trials currently differ from RCD studies in many features besides randomization. Many of the features that improve the validity of RCTs, either directly or indirectly, may also contribute to the perceived practical disadvantages of this type of research. For example, the regulatory requirements that need to be fulfilled before a trial may start are often cumbersome.19 These requirements are a direct result of the experimental nature and ethical implications of randomization.20 They include thorough reflections about the intended purpose of the research to justify randomization, study protocols clearly stating assumptions, hypotheses and calculations of sample size, and submission of protocols to regulatory authorities. Working in larger collaborative groups of researchers with various backgrounds and exchanges with involved stakeholders, ethics committees or data–safety monitoring boards generates feedback loops that may improve initial RCT research plans.
Most of these steps are often not undertaken for RCD research. Some of the perceived practical advantages of RCD studies may actually be limitations. Available datasets may be rapidly analyzed by small teams or a single researcher. Studies of RCD are largely overpowered to obtain nominally significant effects, however small they may be.21 Post hoc explanations are easily invoked, increasing confidence in spurious findings.22,23 Results can remain unpublished, or results may be published depending on the plausibility of explanations, preconceived hypotheses, commercial interests or the researcher’s personal need for scientific reward.
In Table 1, we summarize some of the limitations of current RCTs, beginning with those that may be the most amenable to improvement of the current RCT agenda. We list ways to bypass these limitations with RCD and highlight residual caveats of RCD studies.
The status quo of routinely collected data
We recently conducted an empirical analysis on how RCD studies try to complement RCTs to understand treatment effects.24 We assessed 337 RCD studies that investigated the comparative effectiveness of medical treatments on mortality. Seventy percent of these studies were incremental research that supplemented existing RCTs but did not fill fundamental knowledge gaps (i.e., questions never evaluated in RCTs). In only six (1.8%) of these RCD studies did the authors state that conducting RCTs on their research topic would be unethical, and in only 18 (5.3%) did they state that it would be difficult. Typically, investigators conducting the RCDs reasoned that RCT results had limited generalizability (37.6%), did not adequately address specific outcomes (31.9%) or certain populations (23.5%), or were inconclusive or inconsistent (25.8%).
Most RCD studies focus on questions that have been addressed by RCTs or could be definitively addressed by RCTs.24 Agreement between the results of such RCD studies and the results of the RCTs offers some incremental reassurance, but the benefit for clinical decision-making is limited or nonexistent. When RCTs and observational studies disagree,25 the situation becomes complicated. Much of the interpretation of inconsistent results between such sources of evidence is currently a case-by-case discussion. Eventually, residual bias owing to nonrandomization or the artificial RCT setting may be used as arguments for almost any disagreement. Consensus becomes difficult to reach.
In areas without evidence from RCTs, studies of RCTS may provide the only guidance on a critical health care question, albeit with recognizable limitations. Policy or guideline changes based on RCD should acknowledge the limitations of RCD, and strategic plans should be in place to monitor the clinical impact of these changes. Unfortunately, current RCD studies do not focus on the large numbers of critical health care questions that do not have evidence from RCTs.24 For example, comparisons of drug and nondrug treatments, and evaluations of inexpensive drugs are lacking. Evidence from RCD studies would be useful in providing answers to these vital questions.
Changes in the RCD research agenda and practices
Overall, expectations about the utility of RCD studies for understanding treatment effects are probably overestimated. We discuss what improvements can be made in RCD studies and what resources would be required (Table 2).
Selecting priorities
In selecting research questions, prior evidence must be systematically reviewed. Another study or analysis may not be necessary. Routinely collected data studies should focus more on questions that have not been addressed or are difficult or impossible to address with other study designs.
Protocols and prespecification
Research using RCD may or may not use explicit protocols and prespecified analyses. It is important to know what was not prespecified. Exploratory analyses should be described as such; they need further prospective validation with protocol-based, prespecified studies. Wherever prespecification is not feasible, transparent and complete documentation of the conduct of the study is still useful. The validity of RCD and their proper interpretation can be improved by using falsification end points (negative controls of known null associations),27 validation datasets28 and prespecified rules when the study hypotheses should be considered confirmed or rejected.
Registration
Registration of RCD studies that have prospective design and/or analysis elements and explicit protocols would help shape a more efficient research agenda and reduce selective reporting of methods and findings. For explorative research, it may be best to register datasets; this would facilitate planning a concerted research agenda, data-sharing activities and using datasets for validation.29,30
Reporting
Incomplete or unusable reporting wastes research resources.31 Studies using RCD have a low rate of reporting.32 Recently, the RECORD (REporting of studies Conducted using Observational Routinely-collected health Data) statement was published,33 which aims to improve the reporting quality specifically of observational RCD studies by providing an extension to the STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) statement.34 In addition to transparent reporting, the results need to be embedded into a systematic review of the available evidence. Journals, peer reviewers, funders and authorities can help to improve the reporting quality of RCD studies.
Access to raw data
Lack of access to raw data makes it impossible to independently assess analytic errors and biases, and limits opportunities for joint analyses. Facilitated availability of different datasets would support external validation and improve standardization and efforts to enhance quality. Patients should be asked for explicit consent up front for prospective data sharing of RCD, as is required for RCTs. The misleading view that health information is not really protected data when it is routinely collected creates serious problems.35 Consent issues would be best decided during database building. Data deidentification should also be carefully planned.
Research networks
Large research networks can foster the joint use of RCT and RCD datasets. Research networks may be in the best position to face the challenges involved in establishing harmonized/standardized research. This includes outcome definitions (e.g., by developing and validating universally accepted lists of diagnostic codes for specific outcomes), time points of outcome assessments, risk exposures to be analyzed, subgroup analyses to be explored, and predetermined effect sizes and other criteria for clinically significant outcome differences. Standardized guidance can be developed for organizing and implementing data sharing. Collaborators with various levels of expertise and backgrounds would provide diverse perspectives to maximize research applicability.
Research on research
More research on the reliability of RCD results is necessary (e.g., on the performance of approaches to deal with confounding by indication, such as propensity scores, instrumental variables or the use of falsification end points). Compared with RCTs, there is little empirical guidance on the interpretation of RCD evidence. We need to develop a better understanding of and tools for assessment of risk of bias, generalizability and data validity.
Conclusion
Research using RCD is becoming increasingly popular, but its limitations cannot be overstated. Several suggested improvements may increase the utility of this research but would require additional resources. Studies using RCD should be prioritized for situations where RCTs cannot be conducted. Nevertheless, interpretation of RCD must be done with caution.
KEY POINTSRoutinely collected data (RCD) are increasingly used for biomedical research; however, their utility for understanding treatment effects is probably overestimated.
Many of the perceived advantages of RCD should be viewed cautiously, because of the inevitable biases of observational research and specific biases due to the nature of these data.
Improvements may increase the utility of RCD but require resources for implementation; they include improvements in research priority setting, transparency of data and protocols, and collaborative research networks.
Although many evidence gaps may be better addressed by an improved randomized controlled trial (RCT) agenda, RCD studies may be required in situations where RCTs are difficult or impossible to perform; interpretation of these studies should be cautious.
Footnotes
See also page www.cmaj.ca/lookup/doi/10.1503/cmaj.160410, www.cmaj.ca/lookup/doi/10.1503/cmaj.151470 and CMAJ Open article www.cmajopen.ca/content/4/2/E132
Competing interests: None declared.
This article has been peer reviewed.
Contributors: Lars Hemkens wrote the first draft of the article. All of the authors contributed to the writing and editing of the manuscript, revised it critically for intellectual content, approved the final version to be published and agreed to act as guarantors of the work.