Abstract
Background: Physicians diagnose and treat suspected hypogonadism in older men by extrapolating from the defined clinical entity of hypogonadism found in younger men. We conducted a systematic review to estimate the accuracy of clinical symptoms and signs for predicting low testosterone among aging men.
Methods: We searched the MEDLINE and Embase databases (January 1966 to July 2014) for studies that compared clinical features with a measurement of serum testosterone in men. Three of the authors independently reviewed articles for inclusion, assessed quality and extracted data.
Results: Among 6053 articles identified, 40 met the inclusion criteria. The prevalence of low testosterone ranged between 2% and 77%. Threshold testosterone levels used for reference standards also varied substantially. The summary likelihood ratio associated with decreased libido was 1.6 (95% confidence interval [CI] 1.3–1.9), and the likelihood ratio for absence of this finding was 0.72 (95% CI 0.58–0.85). The likelihood ratio associated with the presence of erectile dysfunction was 1.5 (95% CI 1.3–1.8) and with absence of erectile dysfunction was 0.83 (95% CI 0.76–0.91). Of the multiple-item instruments, the ANDROTEST showed both the most favourable positive likelihood ratio (range 1.9–2.2) and the most favourable negative likelihood ratio (range 0.37–0.49).
Interpretation: We found weak correlation between signs, symptoms and testosterone levels, uncertainty about what threshold testosterone levels should be considered low for aging men and wide variation in estimated prevalence of the condition. It is therefore difficult to extrapolate the method of diagnosing pathologic hypogonadism in younger men to clinical decisions regarding age-related testosterone decline in aging men.
Male hypogonadism is defined as the presence of low serum testosterone and spermatozoa levels, accompanied by clinical signs and symptoms.1 The Endocrine Society divides the symptoms and signs of androgen deficiency into 2 groups, based on expert consensus.1 The first group, which is considered more specific, includes incomplete or delayed sexual development; eunuchoidism; reduced sexual desire (libido); erectile dysfunction; gynecomastia; decreased axillary, facial and pubic hair; small testes (i.e., volume < 5 mL); infertility: low-trauma fracture; low bone mineral density; and hot flushes.1 The second group includes less specific signs and symptoms, such as decreased energy and motivation, depressed mood, poor concentration and memory, sleep disturbance, mild anemia, reduced muscle bulk and strength, increased body fat or body mass index, and diminished physical performance. 1 Similar definitions have recently been developed by the Canadian Men’s Health Foundation Multidisciplinary Guidelines Task Force on Testosterone Deficiency.2
In young men, hypogonadism is more commonly characterized by signs and symptoms from the first group, such as reduced libido and erectile dysfunction. This condition is most often caused by testicular or pituitary pathology, including hyperprolactinemia, pituitary or hypothalamic disorders, testicular disease, radiation exposure or genetic diseases such as Klinefelter syndrome.3 Testosterone replacement is indicated in these cases of “classic hypogonadism,” as it ameliorates the clinical symptoms.4
In contrast, although these entities exist in older men too, they are less frequent causes of low testosterone than age-related changes. There is evidence that testosterone levels decline with age in all men, regardless of symptoms, at an estimated rate of 1%–3% per year.5,6 One study found that serum testosterone levels were below the normal range in 20% of men in their 60s and in close to 50% of men in their 80s.7 However, the prevalence of symptomatic low testosterone (hypogonadism) is estimated by some to be much lower in this population, at about 2%.8 Given the high prevalence of low testosterone and more limited correlation with symptoms in aging men, it is uncertain to what extent this represents a physiologic or pathologic event.6,7 Moreover, symptoms typically associated with low testosterone are less specific in older men and may be caused by other comorbidities. For example, erectile dysfunction can be the result of vascular insufficiency, neurologic impairment, psychogenic causes or substance use.9 Conditions such as diabetes mellitus and atherosclerosis are more common in older men, with up to 40% of men over 50 years of age having evidence of vascular insufficiency as the primary cause of their erectile dysfunction.10 Low libido similarly can result from psychiatric or medical conditions that are more common in older men.11
Currently, many clinicians diagnose hypogonadism in older men on the basis of low serum testosterone levels, with or without symptoms, largely on the assumption that this is a pathologic condition requiring treatment. The purpose of this study was to systematically review the available literature to estimate the accuracy and operating characteristics of signs and symptoms for predicting low testosterone in aging men.
Methods
Literature search and quality assessment
Using MEDLINE and Embase (January 1966 to July 2014), 3 of the authors (A.C.M., A.N.C.L., A.K.) retrieved articles on patient history or physical examination findings used in the diagnosis of hypogonadism in aging men. Medical Subject Headings and keywords included “hypogonadism,” “androgen deficiency” and relevant terms for various signs and symptoms of male hypogonadism (see Appendix 1, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.150262/-/DC1). This search was supplemented with a manual review of the bibliographies of the identified articles, as well as additional articles that have been used to develop recent guidelines on treatment of male hypogonadism. We included studies that compared clinical findings with a measurement of testosterone (total, bioavailable or free), with a defined range of normal testosterone values. We excluded review articles, as well as those that were nonclinical in context, that focused on therapy for hypogonadism, that had no raw data available for calculations of likelihood ratios or that were limited to specific disease conditions.
Reference standard for low testosterone
The reference standard for diagnosing low testosterone is 2 values for morning serum total, free or bioavailable testosterone below a defined normal limit, determined with an accurate and reliable assay.1 The free or bioavailable testosterone measure is preferred for cases in which alterations in sex hormone binding globulin are suspected; for example, obesity, diabetes and glucocorticoids may lower this protein, whereas cirrhosis, HIV and anticonvulsants may increase it.1 There is no universally accepted threshold value for low serum testosterone in older men. Testosterone thresholds vary greatly across studies, with many studies not stating how their threshold values were determined. We therefore based the reference standard for low serum testosterone on study-specific thresholds. When available, the bioavailable testosterone measurement was the preferred basis for diagnosis because it is considered by many to be most accurate.12 If bioavailable testosterone was unavailable, we used total testosterone, and if neither of these was available we used free testosterone. Although free testosterone measured by equilibrium dialysis is considered highly accurate, calculated free testosterone values vary depending on the formula used,13 and analogue methods of measurement are recognized as having poor accuracy.14
Data extraction and analysis
Three authors (A.C.M., A.N.C.L., A.K.) independently reviewed the selected articles for inclusion and quality. The potential for bias in all studies was assessed using the Quality Assessment of Diagnostic Accuracy tool (Appendix 2, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.150262/-/DC1), adapted to the topic of hypogonadism.15,16 Two of the authors (A.C.M., A.N.C.L.) independently extracted the data from each of the selected studies and compared the electronic tables of results for any discrepancies, which were resolved by discussion. A third author (G.T.) reviewed the results from each study to ensure internal consistency.
We used the extracted data to calculate sensitivity, specificity and likelihood ratios associated with signs, symptoms or combinations thereof.17 Additionally, for each study, we calculated the kappa values measuring agreement between low testosterone and the clinical variables examined. We considered signs and symptoms with a positive likelihood ratio greater than 2.0 or a negative likelihood ratio less than 0.5 to be clinically useful.18 For clinical variables that were examined in only 2 studies, we reported the range. For findings from 3 or more studies, we derived the summary sensitivity, specificity, likelihood ratios and 95% confidence intervals (CIs) using the DerSimonian and Laird random-effects approach.19 We used the usual method for ratios of proportions to estimate variances of log-likelihood ratios and used their reciprocals as study weights; we pooled sensitivity and specificity on the logit scale. We assessed heterogeneity between multiple studies examining the same clinical variable with the I2 statistic and used a test of heterogeneity based on Cochran’s Q statistic.20 Where findings were reported in 4 or more studies, we also calculated summary measures of diagnostic accuracy from a bivariable model21,22 using the mada package of R (Meta-analysis of diagnostic accuracy, R package, version 0.5.7/r79). Where there were 8 or more studies, we used bivariable meta-regression to assess the dependence of diagnostic accuracy on the mean age of men in the study. All analyses were performed using R version 3.1.
Results
In total, 6053 articles were identified by the search strategy, of which 40 met the inclusion criteria and were included in the analysis (Figure 1 and Appendices 3 and 4, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.150262/-/DC1). Overall, these articles accounted for a total of 37 565 patients. In the 27 studies that reported mean age, the range was 43 to 82 years. Seven of the studies included men under age 40 years, but in all of these, the mean age was over 40. According to the Quality Assessment of Diagnostic Accuracy system, the most frequent causes of suspected bias were lack of justification for the cut-off used to define hypogonadism, use of nonconsecutive patients and lack of explanation for patients withdrawn from studies (Table 1).
Literature search and study selection. ADAM = Androgen Deficiency in Aging Males, AMS = Aging Males’ Symptoms, MMAS = Massachusetts Male Aging Study.
Assessment of bias for included studies
Prevalence of low testosterone
The prevalence of low testosterone differed widely among the studies, ranging between 2%62 and 77%.49 Testosterone thresholds and approaches to measuring testosterone also varied considerably across the studies (total testosterone, 29 studies, range 200–433 ng/dL [6.9–15 nmol/L]; bioavailable testosterone, 9 studies, range 69.4–198.4 ng/dL [2.4–6.9 nmol/L]; free testosterone, 4 studies, range 4.6–7.0 ng/dL [0.16–0.24 nmol/L]) (see Appendices 3 and 4 for details).23,60 Whereas some researchers assumed a normal distribution of testosterone values to define their threshold level, others used the cut-off values provided by the manufacturer of the testosterone kit. Some investigators relied on testosterone thresholds proposed by consensus guideline statements, whereas others provided no reasoning as to why they selected a specific threshold value.
Accuracy of signs and symptoms
Of the 40 studies included in this review, 26 examined individual signs and symptoms and their relation to low testosterone in aging men (Table 2). We did not evaluate “classic” signs, such as testicular volume and gynecomastia, because of a lack of adequate data in the included studies. The description of how each sign and symptom was defined can be found in Appendix 3.
Diagnostic accuracy of symptoms, signs and composite measures for hypogonadism*
The positive likelihood ratio was less than 2.0 for all tests, except for the following: hot flushes (positive likelihood ratio 2.0); decreased pubic hair (2.4); inability to complete chair stands, defined as the ability to stand from a seated position at least 5 times without support from the arms of a chair (2.4); and inability to perform Nottingham power rigs, which involves a device used to measure leg extension power (2.1). No negative likelihood ratio was lower than 0.5, except for decreased vigour (negative likelihood ratio 0.32). The specific test characteristics of each sign and symptom are presented in Appendix 5 (available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.150262/-/DC1). For most studies, the test characteristics were close to the line of identity on the receiver operating characteristic curve, which is the diagonal line where sensitivity equals the complement of specificity (i.e., 1 – specificity), indicating that these clinical features added little change to the diagnostic probability of low testosterone (Figure 2). The agreement between low testosterone and individual signs and symptoms was weak, with only 3 of the 70 kappa values being larger than 0.3 and only one having an upper 95% CI above 0.5 (Figure 3). Additionally, we created forest plots of the sensitivity and specificity of each individual sign and symptom (Appendix 6, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.150262/-/DC1).
Receiver operating characteristics of individual findings. The figure has a point for each of the individual reported sensitivities and specificities for the studies included in the review. Findings for combinations are plotted in greys and black, for sexual function in red and orange, for activities in greens, for mood in browns, for signs of low testosterone in blues and for body mass index in purple. The area of each circle is proportional to the sample size used to calculate the findings. The diagonal line shows where sensitivity equals the complement of specificity (i.e., 1 – specificity), which indicates that a clinical feature added little change to the diagnostic probability of low testosterone. The particular sign or symptom for any point can be located by referring to Figure 3, which uses the same colour coding. ADAM = Androgen Deficiency in Aging Males, AMS = Aging Males’ Symptoms, BMI = body mass index, MMAS = Massachusetts Male Aging Study, Sex fn = sexual function, T = testosterone.
Display of kappa values for each study. For each finding, the square shows the estimated kappa value between the 2 dichotomous variables “low testosterone” and “positive sign or symptom”; horizontal lines show the 95% confidence intervals, and the area of each square is proportional to the estimated variance of the kappa value. Each sign or symptom is plotted in a different colour, with similar signs and symptoms sharing the same colour family (see Figure 2). ADAM = Androgen Deficiency in Aging Males, AMS = Aging Males’ Symptoms, BMI = body mass index, ED = erectile dysfunction, MMAS = Massachusetts Male Aging Study, T = testosterone.
Accuracy of multiple-item instruments
Of the 40 included studies, 16 measured the accuracy of prespecified questionnaires of signs and symptoms to identify low testosterone (Table 2). Five instruments to identify low testosterone in older men have been studied. The ANDROTEST appears to have both the most favourable positive likelihood ratio (range 1.9–2.2) and negative likelihood ratio (range 0.37–0.49), but not all instruments have undergone head-to-head comparisons to determine which is the most accurate. Their specific test characteristics can be found in Appendix 7 (available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.150262/-/DC1). In addition, we created forest plots of the sensitivity and specificity of the multiple-item instruments (Appendix 6). None of the multiple-item instruments had clinically useful (as defined in this paper) positive or negative likelihood ratios.
Bivariable models and meta-regression
For all variables, summary estimates of sensitivity and specificity from bivariable models were within 0.015 of values from univariable models. Similarly, values of likelihood ratios from the 2 approaches were within 0.1 of each other. Age was not a statistically significant predictor of sensitivity or specificity in any of the bivariable meta-regression models for Androgen Deficiency in Aging Males (ADAM) score, erectile dysfunction or libido.
Interpretation
Our review of 40 studies showed weak associations between signs and symptoms and serum testosterone levels in aging men. The unimpressive positive likelihood ratios may be because many symptoms and signs of low testosterone are nonspecific — the result of other comorbid conditions that commonly occur in older men. In addition, weak negative likelihood ratios may be because a high proportion of older men — many of whom are asymptomatic — have lower levels of testosterone than the currently proposed thresholds derived from younger men. The true threshold below which serum testosterone is abnormal may be lower in older men, and thresholds at which different signs and symptoms occur may also vary.63
This review raises the following important question: In the face of a low correlation between symptoms and biochemical testosterone levels, how should low testosterone in older men be defined and interpreted? To answer this question, we first require rigorously performed studies comparing the signs and symptoms of hypogonadism in aging men to the results of standardized testosterone assays to determine whether a correlation exists. Next, we need to determine the threshold testosterone level that discriminates those with the syndrome from those without. Third, we need rigorously performed large-scale trials to determine the benefits and risks of testosterone replacement in men categorized as having testosterone deficiency.
Because the likelihood ratios of the clinical findings were mostly between 0.5 and 2.0, the estimate of prevalence becomes the main determinant of post-test probability. In other words, the post-test probability is not altered from the pretest probability in any meaningful way. This highlights the importance of generating better information on the actual prevalence of clinically significant low testosterone in older men.
A high-quality study by Wu and associates8 could not be included in the present study because published raw data necessary for the calculation of sensitivity, specificity and likelihood ratios were lacking. That study of 3369 older men suggested that, compared with the absence of any symptoms, combined symptoms of poor morning erection, low sexual desire and erectile dysfunction were associated with a modest odds ratio of 1.7 (95% CI 1.1–2.6) for androgen deficiency, based on a serum total testosterone of less than 11 nmol/L (317 ng/dL) in men between the ages of 40 and 79 years.8 Similar to our findings, Wu and associates8 noted a “weak overall association between symptoms and testosterone levels in this population.” They also stated that there was “substantial overlap between late-onset hypogonadism and nonspecific symptoms of aging.” They concluded that applying their criteria could “guard against the excessive diagnosis of hypogonadism and curb the injudicious use of testosterone therapy in older men.”8
The diagnosis of low testosterone in older men is complicated by controversies surrounding the potential benefits and harms of testosterone replacement therapy. A joint US Food and Drug Administration advisory committee recently stated that “both safety and efficacy of testosterone replacement in older men has not been established.” 64 It recommended that a potential signal regarding cardiovascular risk be included in labelling and that “the use of testosterone replacement should exclude men with age-related testosterone decline.”64
The recently published Testosterone Trials consisted of 3 trials that examined the effects of testosterone therapy in symptomatic men 65 years of age and older with total testosterone levels less than 275 ng/dL (9.54 nmol/L).65 These trials showed modest improvements in measures of sexual function, although these effects declined over time. Small improvements in mood, depressive symptoms and walking distance were also reported. As noted in the accompanying editorial,66 the clinical significance of these treatment responses remains unclear, and no benefits for overall vitality were noted. In addition, the sample sizes were too small to determine potential risks of testosterone therapy in this population.65 These findings further support the lack of clarity regarding how to define and treat low testosterone in older men.
Limitations
This review had several important limitations. First, the studies included in the review used different assays for measuring testosterone and different thresholds for defining abnormal values. Recognizing the substantial variability among testosterone assays, the US Centers for Disease Control and Prevention is leading the Hormone Standardization Project, which will help to standardize testosterone measurement in the United States.67 Second, many of the studies had modest sample sizes and used nonconsecutive patients, which limits the quality of their data. Third, there were no data on the accuracy of physical findings such as gynecomastia (no studies) and testicular size (one study,68 which was ultimately excluded because raw data were unavailable). Fourth, many of the studies had heterogeneous definitions for terms relating to signs and symptoms, such as libido, which can be difficult to quantify objectively. Fifth, many of the studies assessed in this paper did not examine patients with other major comorbidities.
Conclusion
This systematic review, based on all relevant existing data, highlights the current lack of clarity regarding the definition and management of age-related declines in testosterone levels. Weak correlations between signs, symptoms and testosterone levels, uncertainty about what threshold testosterone levels should be considered low for aging men and wide variation in estimated prevalence of the condition make it difficult to extrapolate the method of diagnosing pathologic hypogonadism in younger men to clinical decisions regarding age-related testosterone decline in aging men.
Acknowledgements:
The authors thank Paul Shekelle, Sheri A. Keitz, Matthew J. Crowley, and Cathleen Colon-Emeric, for their thoughtful comments on earlier drafts of the manuscript.
Footnotes
Competing interests: David Simel receives honoraria for work submitted to JAMAEvidence.com.
No other competing interests were declared.
This article has been peer reviewed.
Contributors: Adam Millar, Allan Detsky, Adrian Lau, David Simel and Lorraine Lipscombe contributed to the study concept and design. George Tomlinson was responsible for the data analysis, and all of the authors contributed to the interpretation of the data. Adam Millar, Adrian Lau and Alan Kraguljac systematically reviewed and rated the studies. Adam Millar and Adrian Lau drafted the manuscript, and all of the authors revised it critically for important intellectual content. All of the authors gave final approval of the version to be published and agreed to act as guarantors of the work.
Funding: Lorraine Lipscombe is supported by a Canadian Institutes of Health Research New Investigator Award.
- Accepted March 14, 2016.