Biostatistical seminar with by Andrew Vickers, Memorial Sloan Kettering Cancer Center

If calibration, discrimination, Brier, lift gain, precision recall, F1, Youden, AUC, and 27 other accuracy metrics can’t tell you if a prediction model (or diagnostic test, or marker) is of clinical value, what should you use instead?

By Andrew Vickers, Memorial Sloan Kettering Cancer Center, Attending Research Methodologist.

Abstract: A typical paper on a prediction model (or diagnostic test or marker) presents some accuracy metrics - say, an AUC of 0.75 and a calibration plot that doesn’t look too bad – and then recommends that the model (or test or marker) can be used in clinical practice. But how high an AUC (or Brier or F1 score) is high enough? What level of miscalibration would be too much? The problem is redoubled when comparing two different models (or tests or markers). What if one prediction model has better discrimination but the other has better calibration? What if one diagnostic test has better sensitivity but worse specificity? Note that it doesn’t help to state a general preference, such as “if we think sensitivity is more important, we should take the test with the higher sensitivity” because this does not allow to evaluate trade-offs (e.g. test A with sensitivity of 80% and specificity of 70% vs. test B with sensitivity of 81% and specificity of 30%). The talk will start by showing a series of everyday examples of prognostic models, demonstrating that it is difficult to tell which is the better model, or whether to use a model at all, on the basis of routinely reported accuracy metrics such as AUC, Brier or calibration. We then give the background to decision curve analysis, a net benefit approach first introduced about 15 years ago, and show how this methodology gives clear answers about whether to use a model (or test or marker) and which is best. Decision curve analysis has been recommended in editorials in many major journals, including JAMA, JCO and the Annals of Internal Medicine, and is very widely used in the medical literature, with well over 1000 empirical uses a year.

The seminar will be held at CSS (“det gamle Kommunehospital”), Øster Farimagsgade 5, 1353 Copenhagen K, room 5.2.46. Tea will be served in the library of the section of Biostatistics half an hour before the seminar starts.

For a detailed overview of planned future seminars at the section of Biostatistics, UCPH, see