Review Article
A framework provided an outline toward the proper evaluation of potential screening strategies

https://doi.org/10.1016/j.jclinepi.2012.09.018Get rights and content

Abstract

Objectives

Screening tests are often introduced into clinical practice without proper evaluation, despite the increasing awareness that screening is a double-edged sword that can lead to either net benefits or harms. Our objective was to develop a comprehensive framework for the evaluation of new screening strategies.

Study Design and Setting

Elaborating on the existing concepts proposed by experts, a stepwise framework is proposed to evaluate whether a potential screening test can be introduced as a screening strategy into clinical practice. The principle of screening strategy evaluation is illustrated for cervical cancer, which is a template for screening because of the existence of an easily detectable and treatable precursor lesion.

Results

The evaluation procedure consists of six consecutive steps. In steps 1–4, the technical accuracy, place of the test in the screening pathway, diagnostic accuracy, and longitudinal sensitivity and specificity of the screening test are assessed. In steps 5 and 6, the impact of the screening strategy on the patient and population levels, respectively, is evaluated. The framework incorporates a harm and benefit trade-off and cost-effectiveness analysis.

Conclusion

Our framework provides an outline toward the proper evaluation of potential screening strategies before considering implementation.

Introduction

Almost 40 years ago, Wilson and Jungner [1], for the World Health Organization, formulated a number of criteria (called “principles”), which a screening strategy should meet. One of the criteria was that there should be a suitable screening test or examination detecting latent or early phases of the target disease. Unfortunately, still no comprehensive guideline exists concerning the assessment of screening strategies. Moreover, the specific context of screening applied to large groups of apparently healthy persons among whom the disease usually is rare, makes the evaluation of screening strategies a difficult, delicate, and costly exercise.

In this article, we propose a comprehensive framework for the evaluation of new screening strategies, using cervical cancer screening as a case example. When dealing with terms such as screening tests, strategies, or programs, clear definitions should be made. Evaluation of a potential screening strategy, involving a new screening test, comprises the test, patient, and population level. Generally, it includes the determination of age ranges and screening intervals and assessment of its cost-effectiveness. Although the effectiveness of a screening program depends on the properties of the screening test itself, other factors including natural history of the disease, screening organization, level of participation of the target population, compliance with follow-up and efficacy of workup, and treatment of the screen-detected lesion also determine the success [2], [3], [4], [5]. The evaluation of screening programs in all their aspects, however, lies beyond the scope of this article, which focuses on the evaluation of new screening strategies.

Our reasoning starts from the assumption that before a possible screening strategy is considered, a clear decision has been made on the exact aim and the general target of the intervention. The aim should be formulated as the net benefit for the screenees and in terms of avoiding worsening public health. The broad target population can be, depending on the target condition, either an age and sex subgroup of the general population or a high-risk subgroup, for example, people working or living in specific conditions or exposed to risk factors [6]. These general ideas will guide the researcher to precise screening intervals and target populations chosen for the individual observational studies and trials.

When people actively present with a health problem that requires treatment, they accept that the diagnostic process or treatment carries some risk of inflicting harm. When the same processes are applied to healthy people, the acceptable level of risk is much lower. Additionally, motivation for screening often is encouraged by invitations and often includes some degree of social pressure.

Screening can effectively prevent cervical cancer. The International Agency for Research on Cancer estimated that well-organized cytologic screening for cervical cancer precursors every 3–5 years between ages of 35 and 64 years can reduce the incidence of cervical cancer by 80% or more among the women screened [7]. Nevertheless, cervical cancer was worldwide the third most common cancer in women and the fourth most common cause of cancer death and even the most common cause in many developing countries in 2008 [8]. It occurs at a relatively young age when women are actively involved in their careers or caring for their families, resulting in proportionally more life-years lost compared with most cancers [8]. The rationale of cervical cancer cytologic screening is to identify and treat high-grade cervical intraepithelial neoplasia (CIN) or precancerous lesion and prevent its progression to invasive cancer. The mean time of initial dysplasia to invasion is at least 10 years, and the probability of detection increases as the preclinical phase progresses [9], [10]. Removing these precursor lesions is effective in avoiding progression to invasive malignancy. Although screening for cervical cancer is well established, there were until recently no randomized clinical trials to demonstrate its effectiveness. The observational evidence, however, showing a reduced incidence of and mortality from cervical cancer is widely accepted [11], [12]. The recognition of a strong causal relationship between the persistent high-risk human papillomavirus (HPV) infection of the genital tract and occurrence of cervical cancer has resulted in the development of several HPV detection systems providing new preventive strategies that could potentially result in an even greater reduction in incidence and mortality than cytology.

The main purpose of screening is to reduce the disease-specific mortality. Therefore, the primary indicator of effect is the observed disease-specific mortality compared with the expected mortality in the absence of screening, best expressed in terms of absolute risk difference or its reciprocal, the number needed to screen. In addition, several alternative end points can be used as a proxy. Table 1 shows a list of indicators used to establish effectiveness of cervical cancer screening ranked by decreasing level of evidence [13].

Studying cervical cancer mortality is particularly difficult because the certified cause of death often does not indicate the exact anatomical origin but rather is indicated as death from uterine cancer. An alternative end point can be all-cause mortality as has been advocated for breast cancer [14], but a significant effect on all-cause mortality is rarely demonstrable with screening. In cervical cancer screening, in which precursors are detected and treated, reduction in cervical cancer incidence, is a convincing end point, but reaching this outcome requires hundreds of thousands of women to monitor over many years. CIN3 as a direct precursor of invasive cancer is an acceptable proxy outcome of effectiveness [15]. The increased detection of CIN2+ or CIN3+ is clinically not so relevant as they rarely progress to cancer [16], leading to overtreatment. Consequently, outcomes 6 and 7 in Table 1 should not be targeted by a screening strategy.

Contradictory to most diagnostic studies, in screening, the prevalence of disease, especially for cancer, is typically low. This has an impact on the predictive values. Sensitivity is an indicator of the proportion of detected and missed prevalent predisease and determines the effectiveness. A very high specificity is needed to minimize the number of false-positive test cases. However, a high specificity can still be associated with high absolute numbers of false-positive test results (and thus anxiety, costs, and additional procedures in a lot of people) in case of low prevalence, for instance in a population that is well covered by HPV vaccination. A sensitivity and specificity of 90% can relate to around 78%, 92%, and 97% of false-positive results if the outcome prevalence is 3%, 1%, and 0.3%, respectively, proportions we are dealing with in screening situations.

What should a screening intervention in healthy people achieve? With regard to cervical cancer, people who profit from screening are those who (1) would have died of the cancer but are cured, owing to earlier detection; (2) would have been successfully treated for their cancer anyhow, but whose quality of life is improved owing to earlier detection (down staging) and less mutilating treatment; and (3) do not have a cancer or cancer precursor and are reassured by the negative results of a screening test that correctly shows that they do not have the disease.

However, screening can also be harmful. People who might be harmed by screening are those who (1) die from a screen-detected cancer and whose clinical course was not improved by treatment; (2) have cancer that normally would have showed up clinically at a later point of life, but whose mortality and morbidity do not differ compared with without early detection; (3) have screen-detected nonprogressive cancer precursor, resulting in overdiagnosis and unnecessary treatment; (4) have cancer or progressive precursor but have a false-negative screening test result leading to a false feeling of security and delayed effective diagnoses and possibly delayed diagnosis and treatment; and (5) have a false-positive result, which results in anxiety or unnecessary further investigation and treatment. A screening strategy may cause both benefits and harms, resulting in the need to trade-off into a net benefit or harm [2], [4].

For screening, the randomized controlled trial (RCT) with mortality (or incidence of overt disease when the screening targets preclinical disease) as the outcome and intention-to-treat analysis is the only study design that allows unbiased comparison of outcomes in screened and unscreened groups.

All the observational studies are prone to biases, which can be summarized into two broad types of bias: selection and information biases. There are, however, some biases particularly important for studies evaluating screening strategies. In observational studies comparing screened and unscreened people, those whose disease was diagnosed through screening can appear to survive longer than those who presented with symptoms, even if there is no benefit from screening. This is caused by clinical symptoms presenting later in the natural history compared with abnormal screening results, which is called “lead time bias.” In addition, for cancer screening, tumors that are detected as a result of screening are more likely to be indolent, slow growing, or less aggressive than tumors in nonscreened patients who present with symptoms or in the interval between two scheduled rounds of screening (interval cancers). This phenomenon is referred to as “length bias” and results in the false conclusion that patients die less or later if their cancer is detected by screening. Overdiagnosis is an extreme more general case of length bias: it refers to the detection of nonprogressive (pre)disease, which would never have caused overt disease. This results in unnecessary treatment and, at the very least, cause anxiety and possible adverse effects [17], [18]. Whether a screen-detected predisease is an overdiagnosis cannot be determined in the individual case.

In the case of HPV-based cervical cancer screening, for example, the extent of overdiagnosis can be estimated from randomized trials, in which an initially elevated incidence of cancer or precursor in the HPV arm during the first screening rounds persists during subsequent years or screening rounds [6], [19].

In a diagnostic study, all subjects, test positives and test negatives, should receive the reference test or gold standard to assess the accuracy of the test [20]. However, in screening, because of ethical or cost considerations, especially when the ascertainment of true disease status requires invasive testing, either none or a small proportion of patients whose test results are negative, may receive the reference test. This results in an inflated estimate of the sensitivity and an underestimated estimate of the specificity. However, this verification bias does not influence the relative sensitivity (=detection rate of confirmed disease among subjects with positive screen test A vs. screen test B) nor the relative positive predictive value (PPV) in studies comparing the effect of two or more screening tests [13], [21].

Assessment of the gold standard knowing the screen test result includes a serious risk of overestimation of both the sensitivity and specificity. Therefore, in diagnostic research, in which the objective is to evaluate the cross-sectional accuracy of a screening test, verification should be performed independently. This can be difficult when screening test and gold standard are based on the same principle, for instance in case of visual inspection with acetic acid screening (visual inspection of the cervix after application of acetic acid) validated using colposcopy. It is usually assumed that colposcopy followed by histologic examination of material obtained from suspected areas provides a valid ascertainment of the true disease status, making it the gold standard. Recent prospective studies, however, suggested that up to 50% of prevalent precancers might be missed during colposcopy [22], [23], which also has a high interobserver variability. Random biopsies from normal-appearing regions and follow-up can be used to compensate partially for the lack of sensitivity of colposcopy.

Section snippets

A framework for the evaluation of screening strategies

Introducing a new screening strategy in clinical practice requires the evaluation of its characteristics as an added value to the existing procedures. Guidelines regarding appropriate study designs to address questions on benefits of screening for disease precursors in which the target disease is not yet present and in which the management is restricted to screen positives are urgently needed [13]. We propose a stepwise framework for the evaluation of screening strategies building further on

Conclusion

Screening strategies are often introduced into clinical practice without proper evaluation. We have proposed a stepwise framework to evaluate new screening strategies, in which an increasing level of evidence is gathered but require progressively more stringent and expensive studies. Regional, national, or international authorities should impose requirements for testing and implementing new screening strategies, as currently has been done in a number of countries. Together with other recently

Acknowledgments

All authors contributed substantially to conception and design, revised the article critically for important intellectual content, and gave final approval of the version to be published.

Special thanks should be given to the the Belgian Foundation Against Cancer (Brussels, Belgium) for providing the financial means.

M.A. received financial support from the Belgian Foundation Against Cancer (Brussels, Belgium), the International Agency for Research on Cancer (IARC, Lyon, France), the 7th Framework

References (42)

  • A. Van den Bruel et al.

    The evaluation of diagnostic tests: evidence on technical and diagnostic accuracy, impact on patient outcome and cost-effectiveness is needed

    J Clin Epidemiol

    (2007)
  • P. Martin-Hirsch et al.

    Efficacy of cervical-smear collection devices: a systematic review and meta-analysis

    The Lancet

    (1999)
  • M. Arbyn et al.

    European guidelines for quality assurance in cervical cancer screening

    Second edition - Summary Document. Ann Oncol

    (2010)
  • M. van Ballegooijen et al.

    Overview of important cervical cancer screening process values in European Union (EU) countries, and tentative predictions of the corresponding effectiveness and cost-effectiveness

    Eur J Cancer

    (2000)
  • Wilson J, Jungner G. Principles and practice of screening for disease. WHO public health pap. 1968;34. Available at:...
  • R. Harris et al.

    Reconsidering the criteria for evaluating proposed screening programs: reflections from 4 current and former members of the U.S. Preventive Services Task Force

    Epidemiol Rev

    (2011)
  • Health Council of the Netherlands

    Population screening for cervical cancer

    (2011)
  • Cervix Cancer Screening/IARC working group on the evaluation of cancer-preventive strategies

    (2004)
  • G.J. Van Oortmarssen et al.

    Epidemiological evidence for age-dependent regression of pre-invasive cervical cancer

    Br J Cancer

    (1991)
  • M. Hakama

    Implications of screening on the biology of cervical cancer

    Nowotwory

    (1986)
  • M. Zwahlen et al.

    Population-based screening—the difficulty of how to do more good than harm and how to achieve it

    Swiss Med Wkly

    (2010)
  • Cited by (19)

    • Beware of Kinked Frontiers: A Systematic Review of the Choice of Comparator Strategies in Cost-Effectiveness Analyses of Human Papillomavirus Testing in Cervical Screening

      2015, Value in Health
      Citation Excerpt :

      The range of possible strategies continues to expand, in part because of the recent advent of HPV DNA testing. HPV testing offers better sensitivity for the detection of high-grade lesions, but at the cost of lower specificity [10–12]. HPV testing is typically used in conjunction with cytology, for example, using HPV and cytology as the primary test and the triage test, respectively.

    • The predictive value of human papillomavirus testing for the outcome of patients conservatively treated for stage IA squamous cell cervical carcinoma

      2015, Journal of Clinical Virology
      Citation Excerpt :

      The observation time was censored at diagnosis of CIN1+, repeat conisation, or hysterectomy, whatever occurred first. The detection rate, the NPV (that is, the capability of a negative HPV test result to identify a patient who will not be diagnosed with CIN during follow-up), and the PPV (that is, the capability of a positive HPV test result to identify a patient who will be diagnosed with CIN during follow-up) were longitudinally [9,10] calculated as cumulative probabilities using the Kaplan–Meier method. Follow-up time was calculated from the date of treatment.

    • Precocious cervical ripening as a screening target to predict spontaneous preterm delivery among asymptomatic singleton pregnancies: A systematic review

      2015, American Journal of Obstetrics and Gynecology
      Citation Excerpt :

      The fourth team was Reiter et al48 from Denmark, who published the only review that chose to target “premature cervical ripening” and reported unclear methods for the estimation and the insufficient evidence for routine screening; however, they neglected to justify this target and to include CF from studies, such as the one from Iams et al.8 Finally, Barros-Silva et al53 from Portugal reported inconsistent findings in comparing combined screening targets with short CL alone in 3 studies and recommended combining CL “with other markers (sonographic, biochemical and/or clinical) that reflect the multiplicity of mechanisms involved in the pathogenesis of SPTD.” We hypothesized that comprehensive assessment for multidimensional PCCR (eg, CL, CF, cervical consistency, and cervical dilation in combination) is more effective (eg, improved sensitivity and LR+) than screening for short CL alone with the use of either TVU or DE.54 DE is suitable for resource-limited settings, serves as a historical comparison, and is included.

    • Guideline for cervical cancer screening in Spain, 2014

      2014, Progresos de Obstetricia y Ginecologia
    View all citing articles on Scopus

    Conflict of interest statement: the authors report no biomedical financial interests or potential conflicts of interest.

    View full text