|
From the Health Information Research Unit (Wilczynski, Haynes), the Department of Clinical Epidemiology and Biostatistics, (Haynes, Lavis), the Department of Medicine (Haynes), the Centre for Health Economics and Policy Analysis (Lavis) and the Department of Political Science (Lavis), McMaster University Health Sciences Centre, and the Department of Humanities and Social Sciences, Mohawk College (Ramkissoonsingh), Hamilton, Ont.; and the Department of Psychology, York University, Toronto, Ont. (Arnold-Oatley)Members of the HSR Hedges team: Nancy L. Wilczynski, R. Brian Haynes, John N. Lavis, Ann McKibbon, Douglas Morgan, Adrienne Stevens, Stephen Walter and Stephen Werre, McMaster University, Hamilton, Ont.; Ravi Ramkissoonsingh, Mohawk College, Hamilton, Ont.; Alexandra E. Arnold-Oatley, York University, Toronto, Ont.
| Abstract |
|---|
|
|
|---|
Methods: The retrieval performance of 7445 methodologic search terms and phrases in MEDLINE (the test) were compared with a hand search of the literature (the gold standard) for each issue of 68 journal titles for the year 2000 (a total of 25 936 articles). We determined sensitivity, specificity and precision (the positive predictive value) of the MEDLINE search strategies.
Results: A majority of the articles that were classified as outcome assessment, but fewer than half of those in the other categories, were considered methodologically acceptable (no methodologic criteria were applied for cost studies). Combining individual search terms to maximize sensitivity, while keeping specificity at 50% or more, led to sensitivities in the range of 88.1% to 100% for several categories (specificities ranged from 52.9% to 97.4%). When terms were combined to maximize specificity while keeping sensitivity at 50% or more, specificities of 88.8% to 99.8% were achieved. When terms were combined to maximize sensitivity and specificity while minimizing the differences between the 2 measurements, most strategies for HSR categories achieved sensitivity and specificity of at least 80%.
Interpretation: Sensitive and specific search strategies were validated for retrieval of HSR literature from MEDLINE. These strategies have been made available for public use by the US National Library of Medicine at www.nlm.nih.gov/nichsr/hedges/search.html.
HSR has been defined as the scientific study of the effect of health care delivery; the organization and management of health care access, quality, cost and financing; and the evaluation of the impact of health services and technology (Allmang NA, Koonce TY. Health services research topic searches. Bethesda [MD]: National Library of Medicine; 2000. Unpublished report). More recently, HSR has been defined as the multidisciplinary field of scientific investigation that studies how social factors, financing systems, organizational structures and processes, health technologies and personal behaviours affect access to health care, the quality and cost of health care and, ultimately, health and well-being.5 HSR articles constitute only a tiny fraction of the MEDLINE database and are spread through a large number of journals; hence, MEDLINE searching is challenging. Conversely, journal browsing is impractical as a means of retrieving all relevant studies for a given question or staying abreast of the literature. Our aim was to develop methodologic search filters for MEDLINE to enable end-users to efficiently retrieve articles of relevance to clinical practice guidelines (CPGs) and the appropriateness, process, outcomes, cost and economics of health services.
| Methods |
|---|
|
|
|---|
Search terms
Candidate content and methodologic terms (text words and Medical Subject Headings [MeSH] [exploded and nonexploded], publication types) were compiled by reviewing "gold standard" articles and their MEDLINE indexing, the definitions in Table 1 and the criteria in Table 2; by consulting experts in bibliographic database searching for HSR topics (mainly health sciences librarians); and by consulting experts in studying HSR-related questions. All suggested search terms were tested. The terms are available on request from the corresponding author.
|
|
McMaster HSR database
A database of journals containing relevant studies of HSR was created. We looked for journals that published adequate numbers of relevant articles, such that manual searching of these journals would provide an adequate benchmark against which the MEDLINE searches could be compared. Three independently derived lists were examined to identify appropriate journals.
The first list comprised journals that are reviewed by 4 publications: ACP Journal Club, Evidence-Based Medicine, Evidence-Based Nursing and Evidence-Based Mental Health. These publications provide synopses of the articles in 170 journals with the intent of giving health care workers an overview of new developments in medicine and nursing; the journals have been selected on the basis of their yield of studies that meet explicit criteria for methodologic merit and relevance to clinical practice.11 This list of 170 journals was reduced to 161 by including only those that were indexed in MEDLINE and by conducting hand searches of issues for the year 2000 to determine which journals had published at least 1 study concerning the appropriateness, process or outcomes of care or CPGs.
The second set of journals was derived from a survey by Elixhauser and associates12 of HSR literature for studies of the economics of health care and the Science Citation Index listing of top-rated journals in the field for the category health care sciences and services. We deleted 2 pharmacy journals from this list because we judged them too narrow in focus for our purposes. Input by a convenience sample of policy-makers led to 2 journal nominations and resulted in 10 unique HSR journals (i.e., 10 titles that were not included in the first list). The third list consisted of journals identified in a report on MEDLINE searches for HSR written by 2 National Library of Medicine associate fellows (Allmang NA, Koonce TY. Health services research topic searches. Bethesda [MD]: National Library of Medicine; 2000. Unpublished report). Three HSR experts had selected the journals in that list. The 3 journal lists were merged and duplicates deleted to yield the final list of 68 journals (for the complete merged list, see the online appendix at www.cmaj.ca/cgi/content/full/171/10/1179/DC1).
Manual review of the literature
Four research assistants reviewed each issue of the 68 journal titles for the calendar year 2000. Each journal article was read independently by 2 research assistants and coded for the following HSR categories, according to definitions derived using the MeSH scope notes (Table 1): appropriateness, process assessment, outcome assessment, CPGs, cost and economics. All original research and review articles that met the category definitions were evaluated for scientific merit on the basis of the criteria in Table 2, which were based on the "Users' guides to the medical literature" articles published in the Journal of the American Medical Association.6,7,8,9 Although empirical evidence of design-related bias is not directly available for the HSR categories, research concerning diagnosis13 and treatment14 shows that studies with methodologic shortcomings may overestimate the accuracy or the effect being studied. To pass the criteria, an explicit statement relevant to each criterion had to appear in the article, and all criteria for the appropriate category had to be met. When disagreements arose between the assessments of the 2 research assistants, a third research assistant, blinded to the other assessments, reviewed the article in question. If the coding of the third appraiser agreed with the coding of 1 of the original reviewers, that coding was taken to be correct; otherwise, the article was referred to a more senior member of the research team, who reviewed all coding and determined the final classification.
Assessment of search terms
The candidate search terms were treated as "diagnostic tests" for sound studies, and the manual review of the literature was treated as the "gold standard." The concepts of diagnostic test evaluation and library science were used to determine the sensitivity, specificity and precision of MEDLINE searches as shown in Table 3. The sensitivity for a given topic was defined as the proportion of high-quality articles for that topic that were retrieved, specificity was the proportion of low-quality or nonrelevant articles that were not retrieved, and precision was the proportion of retrieved articles that were of high quality. Search performance was determined by an iterative computer program for each single term. Single terms that yielded sensitivity greater than 25% and specificity greater then 75% were used to form 2-term Boolean "or" strategy combinations. Two-term strategies that yielded sensitivity greater than 75% and specificity greater than 50% were used in 3-term Boolean "or" strategy development to optimize sensitivity. Two-term strategies that yielded sensitivity greater than 50% and specificity greater than 75% were used in 3-term Boolean "or" strategy development to optimize specificity. We did not test "and" combinations because of their predictably adverse effect on sensitivity. We also did not test "and not" combinations because, when we have tested this approach for clinical topics, the performance of the search strategies was not materially affected.
|
MEDLINE searches were conducted through Ovid (Ovid Technologies, New York; http://gateway2.ovid.com). For the defined subset of journal issues included in the database, we downloaded the full MEDLINE record, including full citation, abstract and MeSH terms. The MEDLINE records were then matched to the corresponding records in the hand-search file, by means of unique identifiers.
| Results |
|---|
|
|
|---|
|
In total, 7445 unique search terms were tested, of which 5330 returned results. Predictably, single search terms had lower yields than 2-term strategies, but the difference between 2- and 3-term combinations was small. As expected, combining terms increased the sensitivity over single search terms. A somewhat unexpected finding was that some combinations of terms for each category of studies also led to increases in specificity and precision. Thus, for brevity, only 3-term strategies are presented here (see Table 5 for the denominators of the data presented in the tables for these strategies, as detailed below), unless 2-term or single-term strategies performed as well as the best 3-term strategy. The best sensitivities ranged from 95% to 100% for methodologically sound articles for all categories, including appropriateness studies (Table 6), but the estimate for the latter category was imprecise, as only 5 studies in the database met this criterion. Precision was 9.5% or less for all searches, a consequence of the low prevalence of HSR even in these selected journals and the suboptimal specificities for the most sensitive searches.
|
|
Terms that yielded the best specificity while maintaining sensitivity of 50% or more for each HSR category are presented in Table 7. In achieving the highest specificity for combined terms, sensitivity decreased in all HSR categories while precision rose somewhat.
|
The combinations of terms that optimized both sensitivity and specificity while minimizing the differences between the 2 measurements for each HSR category are presented in Table 8. These strategies provide the best separation of relevant from nonrelevant retrievals, but do so without regard for whether sensitivity or specificity is affected.
|
| Interpretation |
|---|
|
|
|---|
Few search filters have been developed to retrieve journal articles on a small range of topics of direct relevance to HSR. A pilot project created preliminary search strategies for economics and qualitative research in the HSR literature in 2000 (Allmang NA, Koonce TY. Health services research topic searches. Bethesda [MD]: National Library of Medicine; 2000. Unpublished report) but lacked a gold standard against which to assess the quality of the searches. Search filters developed for the National Health Service Economic Evaluation Database,16 the Health Economic Evaluation Database17 and the London School of Economics (LSE) Strategy,18 which are designed to retrieve economic evaluation articles, were compared with one another, to generate a relative standard, giving estimates of sensitivity of 72% and specificity of 75% for the LSE strategy in MEDLINE.18 Our findings for economics articles appear to be somewhat better but are not directly comparable, as our gold standard was a hand search. Additional filters have been designed to retrieve articles on outcome measurement19 (just 3 strategies based on hand searches in just 2 journals) and quality of care20 (in which only precision was measured).
Our study had some limitations. First, we could not find secure methodologic features for the HSR categories of appropriateness and cost that lend themselves to retrieving the best studies. Second, the number of appropriateness articles in our database was small, giving rise to imprecise estimates of search performance for that category. Third, our database was not large enough to permit testretest searches to validate the strategies. Fourth, we have not studied the effect of combining research filters with content terms (such as a disease, technology or type of health service) and thus cannot report on the characteristics of such searches; such a study would require considerably more resources than were available to us. Fifth, we tested only Ovid's search engine for MEDLINE; other search engines, including the PubMed search engine of the National Library of Medicine, may handle terms somewhat differently, with slightly differing results.
The best search strategies found in our research leave some room for improvement. Better search performance may require maturation of research methods for HSR, similar to those for some forms of clinical research, and better indexing. Improvements may also be possible through more sophisticated search strategies, for example, with more search terms, use of other Boolean operators ("and," "and not"), natural language processing and multivariate statistical techniques such as logistic regression and discriminant function analysis. In our limited experience with the use of other Boolean operators and logistic regression for clinical topics such as diagnostic tests,21 we have observed trade-offs between sensitivity and specificity and no substantive improvements with more complex search strategies, but we have not attempted these approaches for HSR topics. We look forward to other researchers taking up the challenge of developing better search strategies for HSR.
| Footnotes |
|---|
Contributors: Nancy Wilczynski and Brian Haynes contributed to the conception and design of the study and to the analysis and interpretation of data. Nancy Wilczynski, Ravi Ramkissoonsingh and Alexandra Arnold-Oatley contributed to the acquisition of the data. John Lavis and Ravi Ramkissoonsingh contributed to the analysis and interpretation of the data. Nancy Wilczynski, Brian Haynes, Ravi Ramkissoonsingh and Alexandra Arnold-Oatley were involved in drafting the article. All authors were involved in critically revising the article for important intellectual content and gave final approval of the version submitted to be published.
Acknowledgements: This study was conducted under a contract from the National Information Center on Health Services Research and Health Care Technology (NICHSR). We thank Ione Auston for encouragement and constructive comments on study reports. Ovid Technologies Inc. relaxed its limits on search volumes. Vivek Goel, Professor in the Department of Health Policy, Management and Evaluation, University of Toronto, provided assistance concerning health services managers and their information needs, as well as a list of key journals for publication of health services research.
Competing interests: None declared.
Correspondence to: Dr. R. Brian Haynes, Department of Clinical Epidemiology and Biostatistics, Room 2C10B, Health Sciences Centre, McMaster University Faculty of Health Sciences, 1200 Main St. W, Hamilton ON L8N 3J5; fax 905 577-0017; bhaynes{at}mcmaster.ca
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
G. LEWISON, G. THORNICROFT, G. SZMUKLER, and M. TANSELLA Fair assessment of the merits of psychiatric research The British Journal of Psychiatry, April 1, 2007; 190(4): 314 - 318. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. L. Whitener, V. V. Van Horne, and A. K. Gauthier Health Services Research Tools for Public Health Professionals Am J Public Health, February 1, 2005; 95(2): 204 - 207. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||