Abstract
BACKGROUND: Guideline recommendations may be affected by flaws in the process, inappropriate panel member selection or conduct, conflicts of interest and other factors. To our knowledge, no validated tool exists to evaluate guideline development from the perspective of those directly involved in the process. Our objective was to develop and validate a universal tool, the PANELVIEW instrument, to assess guideline processes, methods and outcomes from the perspective of the participating guideline panellists and group members.
METHODS: We performed a systematic literature search and surveys of guideline groups (identified through contacting international organizations and convenience sampling of working panels) to inform item generation. Subsequent groups of guideline methodologists and panellists reviewed items for face validity and missing items. We used surveys, interviews and expert review for item reduction and phrasing. For reliability assessment and feedback, we tested the PANELVIEW tool in 8 international guideline groups.
RESULTS: We surveyed 62 members from 13 guideline panels, contacted 19 organizations and reviewed 20 source documents to generate items. Fifty-three additional key informants provided feedback about phrasing of the items and response options. We reduced the number of items from 95 to 34 across domains that included administration, training, conflict of interest, group dynamics, chairing, evidence synthesis, formulating recommendations and publication. The tool takes about 10 minutes to complete and showed acceptable measurement properties.
INTERPRETATION: The PANELVIEW instrument fills a gap by enabling guideline organizations to involve clinicians, patients and other participants in evaluating their guideline processes. The tool can inform quality improvement of existing or new guideline programs, focusing on insight into and transparency of the guideline development process, methods and outcomes.
As a product of a group process that involves project planning, synthesizing evidence and deliberation by guideline group members to reach consensus and formulate recommendations, health guidelines are highly influential in determining practice.1,2 Guideline development also requires careful coordination of multiple teams with specialized knowledge.1,3,4 These teams typically include an oversight committee responsible for project planning, working groups responsible for preparation and technical aspects of evidence synthesis, and a guideline panel tasked with prioritizing questions and formulating recommendations.
Guideline group processes may be prone to influence by people with strong opinions, imbalanced group member characteristics or unqualified members, or may not use the best available evidence.2,5–8 Currently available instruments assessing the trustworthiness of practice guidelines rely on what the guideline authors report, typically in peer-reviewed publications, or reports of organizations or manuals.9,10 However, what authors report may be generic or incomplete, lack transparency or be inconsistent with the assessment of all group members, and what is reported may not always reflect what happened.11 For example, the Appraisal of Guidelines, Research and Evaluation (AGREE)9 and Reporting Items for Practice Guidelines in Healthcare (RIGHT)10 tools appropriately call for conduct of systematic reviews as part of guideline development and appropriate disclosure of potential influence of conflicts. Although systematic reviews should inform guidelines, their conduct guarantees neither their quality nor that they are used appropriately by the panel for making recommendations. Likewise, having a conflict-of-interest declaration and management policy for guidelines does not necessarily guarantee that conflicts are well managed when guideline groups make recommendations.
Existing tools do not evaluate essential steps and processes, such as giving appropriate consideration to the evidence and ensuring that all panel members have an equal voice, as they take place.9,10,12 An internal evaluation by participating guideline group members would provide this valuable insight. In addition, ensuring that panel members view the process as appropriate and one that results in a credible guideline will help ensure they see value in their contribution. By obtaining an assessment from the participants, guideline developers could identify areas of their processes participants view as needing improvement, as well as dissenting views among participants. They could then use this information to modify their methods and approaches, and to ensure the credibility of their guidelines and the trustworthiness of the recommendations.
The objective of this research was to develop and validate a tool for assessing guideline panel members’ perception of the appropriateness of, and satisfaction with, the process, methods and outcome of the development of a health guideline.
Methods
For the development of the instrument, which we named PANELVIEW, we followed methods for scale development, including item generation based on existing literature, item reduction through key informant and expert feedback and consensus and field testing with guideline panels (Table 1).13 At each step, participating panel members and guideline methodologists drawn from organizations from diverse geographic areas who produced guidelines about different clinical topics, assessed information about the items to evaluate the appropriateness of guideline development.
Overview of steps and participants in the PANELVIEW tool development
Item generation
Item generation began with discussion by 2 investigators (W.W., H.J.S.) of key domains for capturing the evaluation of guideline-related processes based on domains in the GIN-McMaster Guideline Development Checklist.1 We hypothesized that all parts of the process might be relevant for assessing appropriateness and satisfaction of panel members.
We then conducted a systematic literature search to identify steps and themes in guideline development that relate to the appropriateness of the process. We searched MEDLINE and Embase from inception to November 2018 to identify studies that discussed or evaluated steps of guideline development.1 We used controlled vocabulary and keywords to capture evaluation of the guideline development process and panel member perceptions (see Appendix 1, Supplemental Figure S1 and Supplemental Box S1, available at www.cmaj.ca/lookup/doi/10.1503/cmaj.200193/tab-related-content, for additional details).
To supplement our literature search, we contacted a convenience sample of 19 key informants, identified in a previous project,1 who represented major guideline-development organizations globally. We asked whether the organizations currently conducted internal evaluation of their guideline-development processes or used specific tools (Table 1, step 1). For each step involving key informants, we used a new sample of participants as a method of confirming data and views obtained in the preceding step, and to ensure broad representation of views and perspectives.
Panel surveys
We surveyed members of 13 guideline panels to obtain primary data (Table 1, step 1). After their panel meetings had adjourned, 62 panellists completed hard-copy surveys consisting of 6 open-ended questions inquiring about the factors that affected their satisfaction and perception of the appropriateness of the process (Appendix 1, Supplemental Table S1, Supplemental Figure S2). We included the survey responses as a source document for data abstraction.
Data abstraction
We developed and pilot tested a structured data-abstraction form. Study team members (Y.Z., R.L.M., K.-T.L., U.R., M.V., J.J.Y.-N., R.A.M., N.S., S.K., T.B.) reviewed full texts of source documents and abstracted independently and in duplicate items that related to the appropriateness of the methods or processes of guideline development, panel members’ views about methods or processes, or panel members’ satisfaction. Supporting quotations from the source document were included for each item, along with proposed themes for grouping of items (e.g., conflict-of-interest management, training, group interaction).
A subgroup of the study team members (W.W., Y.Z., R.L.M., S.K., J.J.Y.-N.) independently identified and merged duplicate items (i.e., items that measured or asked about the same aspect of the guideline-development process). The decisions were assessed by a second reviewer and discussed in a team meeting during which we finalized the deduplication, initial item phrasing and allocation to specific themes of the guideline-development process.
Item reduction
Feedback from key informants
We surveyed and conducted interviews with a convenience sample of 22 key informants, including guideline developers, methodologists and panel members, to obtain feedback about the initial list of items (Table 1, step 2; Appendix 1, Supplemental Table S2). Respondents were asked to rate on a 7-point Likert-type scale ranging from 1 (not important) to 7 (very important) how important they considered each of the items to be for evaluating the guideline-development process, to suggest modifications and to identify any missing items. We also sought to obtain in-depth feedback about the initial list and asked participants to comment on the level of detail, clarity and redundancy in the items. We then sent the list of items to participants for review in advance and conducted interviews in person at guideline panel meetings in presence of a note taker. We pilot tested both the survey and the interview guide.
Study team review and consensus meeting
Concurrently with the key informant surveys and interviews, we provided study team members (N.S., I.E.I., Y.Z., K.-T.L., S.K., R.B.-P., M.V., M.F., G.P.M.) with a structured feedback form to review the initial list of items and provide suggestions for modifications and theme categorization, or to suggest potentially missing items. For each item, we summarized the study team’s suggestions, and the key informants’ feedback and rating of importance. Study team members reviewed the summary and individually suggested to keep, modify, merge or delete items in preparation for a consensus meeting, during which the study team finalized decisions about each item based on discussion and group consensus. We then refined item wording based on our experience with developing other measurement instruments.14,15
Phrasing of items and response options
We conducted surveys with 26 additional key informants to determine the phrasing of items and response options (Table 1, step 3; Appendix 1, Supplemental Table S2). We asked respondents to indicate their preference for one of three 7-point Likert-type options regarding phrasing of responses and items, presenting 8 example items from the tool. The first option asked about appropriateness, the second about satisfaction, and the third, representing the original Likert scale, about agreement with the topic presented by the item.
Testing with panels
After item reduction, we pilot tested the PANELVIEW tool with 1 guideline panel consisting of 12 members. Subsequently, we made minor revisions to clarify item wording and the order of items in the tool and then used the tool with an additional 8 guideline panels consisting of 94 panellists (Table 1, step 4; Appendix 1, Supplemental Table S3). Panel members completed the PANELVIEW survey individually, expressing their agreement with each survey item on a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree) (e.g., “There was appropriate management of potential bias in panel members’ interpretation of evidence and alignment with prior beliefs”). Panel members also provided feedback on the clarity of the instructions, clarity of items and the survey length.
Analysis
We used generalizability theory (G theory), a method for evaluating the reliability of measurements, to assess the reliability of scores obtained across the different panel groups.13 We calculated the item mean scores, standard deviations (SDs) and ranges across individual panellists. For individual panellists, we calculated overall scores as the mean of their item ratings. We conducted the preliminary reliability analyses at the individual panellist level as well as at the panel level, whereby we obtained item means by collapsing across individual panellists. We estimated multiple sources of variance (G), including the respondent, panel, item and domain, using a nested G theory study.13 The guideline-development process of different panel groups served as the object of measurement, individual respondents were nested within panels, and individual items were nested within the PANELVIEW survey domains (Appendix 1, Supplemental Table S5, Supplemental Figure S5).
Ethics approval
The Hamilton Integrated Research Ethics Board approved this study before data collection.
Results
Our systematic literature search, contact with key informants and surveys of 13 guideline panels yielded 17 published articles,6,16–31 3 additional source documents (Appendix 1, Supplemental Figure S3) and 62 survey responses. We abstracted a list of 694 items, which, after evaluation and deduplication, resulted in 95 items grouped across 17 themes covering guideline development (Appendix 1, Supplemental Table S4).
Informed by the rating of importance of the 95 items and feedback from key informants, we removed 23 items that scored low on importance as part of our consensus process. We merged 38 items with other items considered redundant. We phrased each item to ensure it assessed only 1 component of the guideline-development process. The final list included 34 items. Of the 26 key informants surveyed about the phrasing of response options, 19 (73%) indicated preference for using the Likert scale.
Generalizability and reliability
The analysis of variance from the nested G study showed an overall test reliability coefficient of 0.35 (Appendix 1, Supplemental Table S5). This result is likely an effect of enrolling homogeneous panels with regard to processes and methods. The tool’s domains and individual items within the domains each accounted for 4% of the variance, which also suggests that the processes for the guideline efforts we evaluated were similar across the domains and items. The guideline panels accounted for 28% of the variance, and participants within panels accounted for 55% of the variance in scores. This indicates that variation was captured in panellists’ assessments between the guideline panels and in panellists’ ratings within the panels. Despite the homogeneity of the groups, the tool was able to identify varying views of guideline panel members indicating higher and lower satisfaction or perception of appropriateness.
Response variation, item–item correlation and internal consistency
Within the panels, mean scores for items ranged from 4.0 to 7.0, item–item correlation values ranged from –0.76 to 0.96, and item–total correlation values ranged from −0.17 to 0.89. Across the 8 panels, the mean scores for items ranged from 5.5 to 6.8 (Table 2). There was high internal consistency in rating of satisfaction and appropriateness of the process within the 8 panels, with Cronbach’s α ranging from 0.85 to 0.98 (Table 3). For individual panellists, item responses ranged from 1 to 7, and item–item correlation values ranged from 0.003 to 0.719. This suggests, on an individual respondent level, that the tool distinguished between responses and that there was no end-of-scale aversion. Item–total correlation values by individual raters ranged from 0.40 to 0.80, which suggests that the items were measuring different aspects of the guideline process.
Mean scores across panels for the PANELVIEW tool
PANELVIEW tool mean scores and internal consistency across guideline panels
Feedback from guideline panel group members
Respondents reported that they did not have difficulty completing the questionnaire (mean rating on Likert scale 6.4 [SD 0.6]). Respondents, on average, felt that the questionnaire was neither too long nor too short (mean rating 3.5 [SD 1.7], with a rating < 4 suggesting that the questionnaire was not too long). For 68 respondents who completed the survey online, the average completion time was 12 (SD 7) minutes, and the median time was 10 minutes (we excluded 12 respondents with a recorded completion time of ≥ 30 min, who presumably took a break while completing the questionnaire). In response to a suggestion from 8 respondents, we added an option to respond to relevant items as “not applicable” (e.g., to allow panel chairs to skip items that request evaluation of their chairing of the panel). The final PANELVIEW tool is available at https://heigrade.mcmaster.ca/guideline-development/panelview.
Interpretation
We developed a tool, the PANELVIEW instrument, that allows guideline developers to assess their processes, methods and outcomes by directly involving clinicians, patients and any other guideline group member in the evaluation.
Existing instruments for assessing guideline credibility rely on the guideline authors’ report, which may describe the process as planned but not as implemented or as viewed by all group members, and may not reflect all relevant nuances of the process that affect the trustworthiness of recommendations. The PANELVIEW tool focuses on these important nuances and on the transparency of the guideline-development process, allowing organizations responsible for guideline development to inform their quality-improvement efforts.
We followed best practice for instrument development, including reviewing the literature, contacting key informants at guideline organizations and surveying panellists about key factors affecting guideline development. We tested the tool successfully with panels from international guideline organizations.
The PANELVIEW instrument is designed to identify strengths and weaknesses of a guideline-development group’s process and methods in a structured manner, and highlight specific areas for improvement as identified by the participants by assessing ratings within individual domains. The tool enables evaluation of guideline development by participating group members in its entirety or in phases. How the guideline process is organized may differ between organizations, for example, between those that convene 1 final panel meeting and those that maintain a standing panel with repeated meetings. This will determine whether developers administer the PANELVIEW tool once at the conclusion of a guideline project or throughout the process as the steps take place.
The PANELVIEW instrument is not intended to replace existing tools that offer guidance on the appropriate steps for guideline development or assess the credibility of published guideline reports. It offers an approach for identification of issues in the guideline-development process and methods by those who participate in it or directly observe it, such as technical experts and methodologists. The tool can serve to inform evaluation or quality improvement of new or existing guideline programs, respectively.
The rigour of development with the end-user in mind is the main strength of our work. First, we applied item-generation methods drawing on multiple sources: literature, contacting organizations, panel surveys and a team with extensive experience in the guideline field. Second, we involved other key informants from multiple organizations and participation on panels for input on items and face validity, allowing data saturation. Third, we field tested the tool with groups that focused on a variety of guideline topics.
We plan to administer the PANELVIEW tool with additional, diverse panels from various guideline organizations. Guideline organizations can access the tool at https://heigrade.mcmaster.ca/guideline-development/panelview to participate. We will seek further feedback on use of the tool, for example, about the potential for public reporting of PANELVIEW assessments to increase transparency. The high Cronbach α coefficients may indicate the presence of redundant items. Sampling of more panels will allow us to assess whether any refinement of tool items is necessary and conduct factor analysis for further evaluation of tool domains. Additional opportunities include comparative studies, for example, comparing PANELVIEW assessments of panellists to those of other group members (e.g., nonvoting observers), as well as evaluating global ratings and judgments of panel success and guideline credibility against ratings of the tool.
Limitations
A potential limitation of our research is that we did not conduct systematic searches of the nonmedical literature in the areas of business, education and policy-making for relevant items. At each step involving key informants, we used convenience sampling, which may introduce sampling bias. To address this, we drew on a broad representation of working guideline panellists, with varying levels of experience, as well as guideline-development experts from organizations representing a wide range of processes and methods.
The 8 guideline groups involved in field testing the PANELVIEW tool were recruited through key informants, and, for some aspects of development, the groups used similar methods (e.g., using the GRADE approach for assessing quality of evidence and strength of recommendations) and involved experienced group chairs. The high scores on many items and the lower overall reliability coefficient of 0.35 to discriminate between groups indicated that the groups were likely all high performing. Despite this, we observed variability in scores within the groups, which would allow guideline developers to identify whether individual panellists viewed the process and specific aspects of the process as more or less appropriate.
Conclusion
The PANELVIEW instrument allows capturing of panellists’ perspectives when they participate in guideline development, which addresses a gap in the field. Given the importance of guidelines and their impact on recipients and providers of care, optimizing the quality of their development is a logical step.
Acknowledgements
The authors acknowledge Dr. John J. Riva for helping to pilot test the item-reduction survey and Dr. Gordon Guyatt for helping to field test the PANELVIEW tool. They also acknowledge all the participating guideline panel members who provided feedback for item generation and reduction, and helped with field testing of the tool.
Footnotes
↵* PANELVIEW Working Group members: Yngve Falck-Ytter MD, Division of Gastroenterology, Case Western Reserve University Cleveland, Ohio.; Iván D. Florez MD PhD, Department of Pediatrics, Universidad de Antioquia, Medellin, Colombia; Amir Qaseem MD PhD, American College of Physicians, Philadelphia, Penn.; Richard M. Rosenfeld MD MPH, Department of Otolaryngology, SUNY Downstate Medical Center, Brooklyn, NY; Craig W. Robbins MD MPH, Center for Clinical Information Services, Kaiser Permanente Care Management Institute, Oakland, Calif.; Judith Thornton PhD, Centre for Clinical Practice, National Institute for Health and Clinical Excellence, Manchester, UK
Competing interests: Authors of this manuscript have been involved in the development of various guideline manuals that are referenced in this article. Wojtek Wiercioch, Elie Akl and Holger Schünemann will retain copyright of the PANELVIEW tool through McMaster University (Technology ID 20-031), and the tool may be licensed in the future. Maicon Falavigna received consulting fees from Novartis, AbbVie, Roche, Boehringer Ingelheim, PTC Therapeutics and Sanofi Genzyme, unrelated to the submitted work. No other competing interests were declared.
This article has been peer reviewed.
Contributors: Holger Schünemann was the principal investigator. Holger Schünemann, Elie Akl and Wojtek Wiercioch conceptualized and designed the study with input from Nancy Santesso and Meghan McConnell. Meghan McConnell and Wojtek Wiercioch analyzed the data. Wojtek Wiercioch and Holger Schünemann drafted the manuscript. All of the authors contributed to the acquisition and interpretation of data, revised the manuscript critically for important intellectual content, approved the final version submitted for publication and agreed to be accountable for all aspects of the work.
Funding: This study was supported by internal funds from the McMaster GRADE centre, McMaster University. Kaja-Triin Laisaar receives institutional research support through grant IUT34-17 from the Ministry of Education and Research through the Estonian Research Council.
Data sharing: Data not shared.
- Accepted June 10, 2020.