- © 2005 CMA Media Inc. or its licensors
Abstract
Background: Tools for early identification of workers with back pain who are at high risk of adverse occupational outcome would help concentrate clinical attention on the patients who need it most, while helping reduce unnecessary interventions (and costs) among the others. This study was conducted to develop and validate clinical rules to predict the 2-year work disability status of people consulting for nonspecific back pain in primary care settings.
Methods: This was a 2-year prospective cohort study conducted in 7 primary care settings in the Quebec City area. The study enrolled 1007 workers (participation, 68.4% of potential participants expected to be eligible) aged 18–64 years who consulted for nonspecific back pain associated with at least 1 day's absence from work. The majority (86%) completed 5 telephone interviews documenting a large array of variables. Clinical information was abstracted from the medical files. The outcome measure was “return to work in good health” at 2 years, a variable that combined patients' occupational status, functional limitations and recurrences of work absence. Predictive models of 2-year outcome were developed with a recursive partitioning approach on a 40% random sample of our study subjects, then validated on the rest.
Results: The best predictive model included 7 baseline variables (patient's recovery expectations, radiating pain, previous back surgery, pain intensity, frequent change of position because of back pain, irritability and bad temper, and difficulty sleeping) and was particularly efficient at identifying patients with no adverse occupational outcome (negative predictive value 78%– 94%).
Interpretation: A clinical prediction rule accurately identified a large proportion of workers with back pain consulting in a primary care setting who were at a low risk of an adverse occupational outcome.
Since the 1950s, back pain has taken on the proportions of a veritable epidemic, counting now among the 5 most frequent reasons for visits to physicians' offices in North America1,2,3 and ranking sixth among health problems generating the highest direct medical costs.4 Because of its high incidence and associated expense, effective intervention for back pain has great potential for improving population health and for freeing up extensive societal resources.
So-called red flags to identify pain that is specific (i.e., pain in the back originating from tumours, fractures, infections, cauda equina syndrome, visceral pain and systemic disease)5 account for about 3% of all cases of back pain.6 The overwhelming majority of back-pain problems are thus nonspecific. One important feature of nonspecific back pain among workers is that a small proportion of cases (< 10%) accounts for most of the costs (> 70%).7,8,9,10,11,12,13,14 This fact has led investigators to focus on the early identification of patients who are at higher risk of disability, so that specialized interventions can be provided earlier, whereas other patients can be expected to recover with conservative care.9,15,16,17,18,19,20,21,22,23,24,25 Although this goal has become much sought-after in back-pain research, most available studies in this area have 3 methodological problems:
-
Potential predictors are often limited to administrative or clinical data, whereas it is clear that back pain is a multidimensional health problem.
-
The outcome variable is most often a 1-point dichotomous measure of return to work, time off work or duration of compensation, although some authors have warned against the use of first return to work as a measure of recovery. Baldwin and colleagues,26 for instance, point out that first return to work is frequently followed by recurrences of work absence.
-
Most published prediction rules developed for back pain have not been successfully validated on any additional samples of patients.
Our study aimed to build a simple predictive tool that could be used by primary care physicians to identify workers with nonspecific back pain who are at higher risk of long-term adverse occupational outcomes, and then to validate this tool on a fresh sample of subjects.
Methods
From 1998 though 2002, we conducted a prospective cohort study called the RAMS-Prognosis Study (RAMS stands for recherche sur les affections musculo-squelettiques), which had a qualitative and a quantitative phase. The study was approved annually by the ethics committees of the participating hospitals.
In the first phase, 2 focus groups composed of people who have had severe back pain were held in November and December, 1998. The focus groups enriched the list of potential predictors (Appendix 1)10,27 to be investigated quantitatively in the interview-based second phase. The variables they identified were added to a list of those collected from the existing literature, to document as many potential predictors as possible.
Subjects for the second phase were recruited in 7 primary care settings of the Quebec City area: 4 emergency departments and 3 family medicine units, all having medical teaching responsibilities. Subjects of interest were adult workers aged 18–64 years who consulted for back pain, whatever its character or duration, between June 1999 and September 2000. Patients were eligible if the back pain was nonspecific and had caused them to be absent from their regular job for at least 1 day. Excluded were patients with any other condition that could affect their work capacity (pregnancy, for example, or a serious comorbidity) and those whose pain was located in the cervical spine only or had a specific cause.
Lists of potential subjects at each site were submitted weekly. A medical archivist telephoned these patients to request their participation in the study; those who agreed and were eligible were mailed an informed consent form to sign and return.
Participants were telephoned by trained interviewers for a baseline interview at around 3 weeks after their medical consultation, with repeated measurements at 6 and 12 weeks and 1 and 2 years. (Note that all timeframes are relative to each patient's index medical consultation.) The data collected described demographic, socioeconomic, behavioural, anthropometric, clinical, occupational and psychosocial variables, as well as information on the utilization of health services for back pain. Clinical information was drawn by a single individual from study participants' medical files. An online appendix (available at www.cmaj.ca/cgi/content/full/172/12/1559/DC1) summarizes the standardized instruments used to measure key constructs.28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60 When an instrument was unavailable in French, Vallerand's double-inverse translation method61 was applied.
Return to work in good health (RWGH) is an index of back- pain outcome that takes into account work status, functional limitations and number of days of work-absence since the medical consultation. This variable has 4 categories based on the work of Baldwin's group26 (presented in detail in Table 1): success, partial success, failure after attempt, and failure. Our work- absence data were self-reported; however, a validation study was conducted with data obtained from the employers of 40 volunteers. The type 1 intraclass coefficient correlating the 2 sets of data was 0.97 (95% confidence interval [CI] 0.93–0.98) for number of days of modified work, and 0.99 (95% CI 0.98–1.00) for number of days off any work.
Table 1.
Subsequent to descriptive analysis, participants were randomly assigned (by means of a randomization function in SAS software) into 2 subgroups: about 40% into a “training sample” and the rest into a “validation sample.” Predictive analyses were conducted to identify, not causal relationships, but only prognostic indicators that could be clinically useful.62 Recursive partitioning with KnowledgeSeeker software (version 3.0, Angoss Software International Ltd., Toronto, Ont.) was used to build a predictive model of 2-year RWGH in the training sample.63 Because the distinction between the “failure after attempt” and “failure” groups has no clinical pertinence, data for these 2 categories of outcome were lumped together. The threshold for statistical significance was fixed first at 0.01, to favour the selection of the strongest predictors and simplicity in the model built. Afterward, it was relaxed to p ≤ 0.05 to permit us to identify complementary and alternative associations.
Classification error rate, sensitivity, specificity, and positive and negative predictive values (the probability, respectively, of having and not having an actual adverse occupational outcome when so predicted by the model) were computed for each model by comparing the predicted classification of subjects to their actual status at 2 years. In these analyses, the subjects were dichotomized along different groupings of the outcome categories. The “success” category was always considered among the “non- diseased” group. Area under the receiver operating characteristic curve64 was used as the main criterion to select one model over another, followed by the number of variables. Each selected model was then applied to the validation sample. We “pruned” the final model to try to keep it as simple as possible. All baseline, 6-week and 12-week variables were candidate predictors, including individual items or questions of specific measurement tools (Appendix 1).
Results
Numbers of eligible subjects, refusals and participants are shown in Fig. 1. The proportion of eligibility among subjects who were reached (37.6%) was applied to those not reached, to estimate a total number of eligible subjects. A total of 1007 subjects (68.4% of the 1471 expected to have been eligible) participated in the baseline interview, conducted an average of 25 days after the index consultation (standard deviation [SD] 10.2 d). Of those who participated in the baseline interview, 923 (91.7%) completed the interview at 6 weeks of follow-up, 907 (90.1%) at 12 weeks, 913 (90.7%) at 1 year, and 864 (85.8%) at 2 years. Three patients died during the follow-up period. Completed records of all 5 interviews were available for 860 participants, whose data were used for the recursive partitioning analysis.
Fig. 1: Eligible subjects, refusals and participants in the study. *Eligibility unknown. R = randomization.
The mean age of the participants was just under 39 years; a majority (58.5%) were male (Table 2). Almost half (47.8%) had earned a postsecondary diploma, from either a community college or a university.
Table 2.
At baseline, a majority of subjects reported their back pain to be recurrent (“it comes and goes”) or persistent (“the pain is always there, to different degrees”); fewer than one-quarter reported that theirs was a one-time problem (“never had back pain before”). The median time since the beginning of patients' first episode was nevertheless 6 years. Pain was mostly situated in the lumbar and lumbosacral areas. Over half of our study subjects reported pain radiating to the arms or legs.
Fig. 2: Progress among study participants toward return to work in good health. Note that subjects could not go back into the “failure” group; this category could thus only diminish over time.
The 2-year evolution of RWGH among study subjects is illustrated i n Fig. 2. The most important changes in RWGH occurred at about 12 weeks, at which time about 50% of subjects were in the RWGH success category (compared with 18% at 6 weeks). At 2 years, close to 20% were still in the “failure after attempt(s)” and “failure” groups.
Fig. 3: Clinical algorithm to predict an outcome at 2 years of return to work in good health (RWGH) among workers consulting in primary care settings for back pain. All values shown are percentages. High-probability categories in each group, as were used to calculate the measures of validity, are circled. Note that the “failure” category includes lack of successful return to work at 2 years of follow-up, either with no attempt to return or despite 1 or more attempts to return to work. CI = confidence interval, Q = question. Q1 is item 15 of the Fear Avoidance Beliefs Questionnaire. Questions 5–7 are from the Roland– Morris Disability Questionnaire: Q5 is item 2; Q6, item 22; and Q7, item 18.
Example 1. Mr. Jones answered “Yes” to question 1, “No” to question 3, “7” to question 4 and “No” to question 5. (Notice that because Mr. Jones said “Yes” to question 1, question 2 is not useful.) His estimated probability of success at 2 years in returning to work in good health is 84%. The clinician will reassure him and use a conservative approach. A rapid return to normal activities is the objective.
Example 2. Mr. Smith answered “No” to question 1 and “Yes” to question 2. (Questions 3–7 are unnecessary for Mr. Smith.) He thus appears to have a particularly high probability (46%) of failure to return to work in good health by 2 years. The clinician may wish to refer him to a specialized rehabilitation program.
Example 3. Mrs. Watson answered “Yes” to question 1, “No” to question 3, “8” to question 4, “Yes” to questions 5 and 6, and “No” to question 7. (Again, question 2 is not needed in Mrs. Watson's case.) Her probability of either success or partial success in returning to work in good health by 2 years is quite high (50% + 45% = 95%). The clinician could ask to see her again and eventually refer her to occupational health services to monitor and improve her work conditions. Keeping the patient at work is the objective.
Fig. 3 presents the final predictive model as a clinical algorithm that allows estimation of the probability, for a given individual, of RWGH success, partial success and failure (which includes failure after attempt). The classification error rate was 37.0% in the training sample and 40.5% in the validation sample. Measures of validity to detect RWGH failure, partial success or both outcomes taken together for this model are presented for both samples in Table 3, with a set of example calculations shown in Fig. 4. All validity measures were quite stable when applied to the validation sample. In all cases, findings for negative predictive value were high (74%–91%). It was highest for predicting “failure after attempt/failure” (91%), whereas the highest positive predictive value was for detecting the combined outcome of “failure after attempt/failure” or “partial success” simultaneously (57%).
Table 3.
Fig. 4: Example of calculations of the measures of validity presented in Table 3.
Interpretation
This study corroborates the complex nature of back pain and the inherent difficulty in developing clinical prediction tools for such conditions. With a large coverage of potential predictors measured at baseline and at 6 and 12 weeks, the best model we obtained contains 7 predictors measured at baseline (patient's recovery expectations, radiating pain, previous back surgery, intense pain, frequent change of position because of back pain, irritability and bad temper, and difficulty sleeping). It is far from perfect; nonetheless, its high negative predictive value may constitute a strong advantage.
Most previous studies on the prediction of the long-term outcome of back pain used a dichotomous measure of return to work, duration of work absence or compensation data. Because of methodological differences and the nature of predictive analyses (which are not intended to identify causal relationships, and should not be interpreted as doing so),64 it is not relevant to directly compare the predictors. In fact, 2 studies could end up with different sets of predictors; but the nature of the predictors is not very important, as long as they constitute an efficient and reproducible prognostic tool. The comparison must be made on the predictive validity of the models, that is, on their capacity to classify subjects correctly with respect to their outcome and on demonstration of the reproducibility of this validity. The few existing previously determined models explained some 25%–30% of the variance of continuous outcomes,25,65 similar to what we observed when we applied our final predictive model to long-term absence (data not shown). With regard to other measures of predictive validity, it is quite often negative predictive values that are highest,65,66,67 as in our study.
This study had several strengths: the inclusion of all workers consulting for back pain, whatever the source of pain (i.e., not only workers' compensation cases); the large sample; the prospective design, with repeated measures taken at key points in the natural history of the disease; a high participation rate; coverage of numerous variables that were considered; the use of a more specific and eventually more valid measure of occupational outcome; and the use of recursive partitioning.
Classically in prospective cohort studies, subjects must be free of the disease at the beginning of observation, so that only new (incident) cases are observed and the directionality of associations is non-equivocal. Since only about 1 in 5 back-pain patients consulting in primary care settings have never had back pain before, a so-called inception cohort study of incident cases would require great resources and include only subjects with homogeneous characteristics, which would limit its external validity. Primary care physicians meet with a heterogeneous population of back- pain patients (most with recurrent or persistent pain) for whom they must give a prognosis. A prediction tool that works for a subgroup of patients only is unlikely to be useful to clinicians, especially if the subgroup includes only a minority of their clients.
In this study, several measures of the type and severity of pain were used as potential predictors. If these variables had been important to the prognosis, they would have been retained. Our model is thus applicable to all sorts of back-pain problems, as seen in day-to-day primary care practice. It can be used systematically to assist the physician in deciding the best allocation of clinical resources for patients with back pain (see the example cases accompanying Fig. 3).
Reproducibility of predictors is an important consideration in building a prognostic instrument.68 The fact that 5 of the variables included in our final model are items drawn from well-validated measurement instruments is reassuring. The 2 other items, radiating pain and previous back surgery, are relatively “hard” events, measurement of which is likely to be highly reproducible.
Because the baseline interviews were conducted some 3 weeks after the related medical consultation, it is possible that some variables may have changed during that period, improving the predictive potential over data that would have been collected, had the interviews been held at the time of consultation. However, variables measured at 6 weeks and 12 weeks offered no better prediction than the baseline measures, which is reassuring.
Traditionally, physicians look for clinical decision rules that have high positive predictive value, and attribute generally less importance to negative predictive value. However, it is the nature of the outcome that must determine the most important measures of validity for a given clinical decision tool. Considering the frequency of back pain and the resources that are spent on benign cases, an instrument that allows identification of a group of subjects who are at low risk of adverse outcomes may be quite useful.
𝛃 See related article page 1575
Appendix 1
Appendix 1.
Appendix 1. Continued.
Footnotes
-
This article has been peer reviewed.
Contributors: Clermont Dionne initiated the study. Clermont Dionne, Renée Bourbonnais, Pierre Frémont, Michel Rossignol and Susan Stock designed the study protocol. All of the authors discussed core ideas, participated in analysis and interpretation of the data, and contributed to the writing of the paper under Clermont Dionne's lead. Clermont Dionne and Isabelle Larocque coordinated the study and data collection. All of the authors are guarantors for the paper.
Acknowledgements: We thank all of the study participants, the staff of the Unité de recherche en santé des populations and all of the research assistants who worked on this study. Thanks also to physicians Stéphane Bergeron, Alexandra Dansereau, Georges Dufresne, Louis Larue, Natalie Le Sage, Jean Maziade and Jean Ouellet for their help with the recruitment of subjects. Special thanks to Arie Nouwen for his help with the questionnaire on self-efficacy, Julie Soucy for her participation in the coordination of the study and Eric Demers for his contribution to the statistical analyses.
This study was supported by grant 97-061 from the Quebec Institute for Occupational Safety and Health (Institut de recherche Robert-Sauvé en santé et en sécurité du travail du Québec), which did not interfere in any way in the scientific and publication processes. Clermont Dionne and Renée Bourbonnais are Quebec Health Research Fund (Fonds de la recherche en santé du Québec) Scholars.
Competing interests: None declared.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵