Variable | Development | Validation | Combined | Reduced |
---|---|---|---|---|

Male model | ||||

Discrimination | ||||

C-statistic (95% CI) | 0.82 (0.81–0.83) | 0.79 (0.76–0.81) | 0.82 (0.81–0.83) | 0.82 (0.81–0.83) |

Ratio of 95 to 5 risk percentile | 298.2 (0.0963/0.0003) | 468.7 (0.0770/0.0002) | 345.3 (0.0914/0.0003) | 337.8 (0.0913/0.0003) |

Calibration | ||||

Observed v. predicted, % | 0.08 | 1.38 | 0.28 | 0.28 |

5-year cumulative incidence (observed) (95% CI) | 0.027 (0.026–0.029) | 0.023 (0.020–0.025) | 0.026 (0.025–0.028) | 0.026 (0.025–0.028) |

5-year risk (predicted) | 0.027 | 0.022 | 0.026 | 0.026 |

Overall performance | ||||

Brier_{scaled} score | 0.025 | 0.022 | 0.024 | 0.024 |

Nagelkerke R^{2} | 0.096 | 0.086 | 0.089 | 0.089 |

Female model | ||||

Discrimination | ||||

C–statistic (95% CI) | 0.87 (0.86–0.88) | 0.85 (0.83–0.87) | 0.86 (0.85–0.87) | 0.86 (0.85–0.87) |

Ratio of 95 to 5 risk percentile | 645.0 (0.0811/0.0001) | 810.5 (0.0709/0.0001) | 482.3 (0.0794/0.0002) | 477.5 (0.0794/0.0002) |

Calibration | ||||

Observed v. predicted, % | 0.30 | 7.13 | 0.39 | 0.38 |

5-year cumulative incidence (observed) (95% CI) | 0.018 (0.017–0.019) | 0.017 (0.015–0.019) | 0.018 (0.017–0.019) | 0.018 (0.017–0.019) |

5-year risk (predicted) | 0.018 | 0.016 | 0.018 | 0.018 |

Overall performance | ||||

Brier_{scaled} score | 0.017 | 0.016 | 0.017 | 0.017 |

Nagelkerke R^{2} | 0.124 | 0.126 | 0.117 | 0.117 |

Note: CI = confidence interval

↵* Three types of performance tests were examined:28 1) Discrimination is the ability of a prediction model to differentiate between those who do and do not develop the outcome of interest. C-statistic is a rank order statistic for predictions against true outcomes.18,29 The statistic ranges from 0 to 1: a value of 0.5 indicates the model is no better than random prediction, a value of 1 indicates the model perfectly predicts those who will develop the outcome of interest and those who will not. Ratio of 95 to 5 risk percentiles is a test of discrimination. A higher ratio indicates a more discriminating algorithm. For example, a ratio of 100 indicates that the absolute risk is 100 times higher for a person in the 95th percentile than for a person in the 5th percentile. The ratio can be used to gauge the potential absolute benefit of treatment for different individuals in the development and validation cohorts. For an intervention with the same relative benefit, a risk ratio of 100 indicates that 1 person will have 100 times the absolute benefit of the comparative person. 2) Calibration reflects agreement between the observed outcomes and predictions. Calibration (or accuracy) describes how well the predicted probability of disease agrees with the observed outcome. Observed versus predicted (O v. P) is the relative difference between the observed incidence and predicted risk. A 1% difference in O v. P indicates 1% more cardiovascular events were observed than predicted. This table shows overall O v. P. Appendices 6 and 7, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.170914/-/DC1, show O v. P for specific subgroups. This table presents an absolute measure of O v. P as the observed 5-year cumulative incidence and the predicted 5-year risk. A graphical assessment of calibration is presented in Appendix 8, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.170914/-/DC1 (calibration plots). 3) Overall performance measures. Brier

_{scaled}score is a measure of overall agreement between observed and predictive risk with values between 0 and 1.30 This scaled Brier score happens to be very similar to the Pearson*R*^{2}statistic.31 Nagelkerke*R*^{2}is a measure of amount the model explains the variation of risk between respondents in the development or validation data with values from 0 to 1.32,33 Larger*R*^{2}values indicate that more of the variation is explained by the model, to a maximum of 1.