Introducing some of the most common or useful (and sometimes not so common nor useful) performance metrics and curves.
Discussing how and when to use (and when not to use) the mentioned performance metrics and curves.
All interactive plots in this presentation were created with rtichoke (I am the author π).
You are also invited to explore rtichoke blog for reproducible examples and some theory.
Discrimination π: Modelβs ability to separate between events and non-events.
Calibration βοΈ: Agreement between predicted probabilities and the observed outcomes.
Utility π: The usefulness of the model in terms of decision-making.
π
π€’
π
π€¨
π€’
π€¨
When the intervention carries a potential risk and there is a trade-off of risks between the intervention and the outcome we will use probability threshold in order to classify each probability to Predicted Negative (Do not Treat) or Predicted Positive (Treat π).
This type of dichotomization is related to individuals with different preferences.
pΜ |
0.11 |
0.15 |
0.18 |
0.29 |
0.31 |
0.33 |
0.45 |
0.47 |
0.63 |
0.72 |
Y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
|
|
|
|
|
|
|
|
|
|
Low Probability Threshold means that Iβm worried about the outcome:
pΜ |
0.11 |
0.15 |
0.18 |
0.29 |
0.31 |
0.33 |
0.45 |
0.47 |
0.63 |
0.72 |
---|---|---|---|---|---|---|---|---|---|---|
ΕΆ |
0 |
0 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
Y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
|
|
|
π |
π |
π |
π |
π |
π |
π |
|
TN |
TN |
TN |
FP |
TP |
FP |
TP |
FP |
TP |
TP |
High Probability Threshold means that Iβm worried about the Intervention:
Iβm worried about Biopsy π
Iβm worried about Statins π
Iβm worried about Antibiotics π
pΜ |
0.11 |
0.15 |
0.18 |
0.29 |
0.31 |
0.33 |
0.45 |
0.47 |
0.63 |
0.72 |
---|---|---|---|---|---|---|---|---|---|---|
ΕΆ |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
Y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
|
|
|
|
|
|
|
|
π |
π |
|
TN |
TN |
TN |
TN |
FN |
TN |
FN |
TN |
TP |
TP |
Curve | Sens | Spec | PPV | PPCR | Lift |
---|---|---|---|---|---|
ROC | y | x | |||
Lift | x | y | |||
Precision- Recall | x | y | |||
Gains | y | x |
Curve | Sens | Spec | PPV | PPCR | Lift |
---|---|---|---|---|---|
ROC | y | x | |||
Lift | x | y | |||
Precision- Recall | x | y | |||
Gains | y | x |
The most famous form of Performance Metrics Visualization
Displays Sensitivity (also known as True Positive Rate or Percision) on the y axis
Displays 1 - Specificity (also known as False Positive Rate) on the x axis.
Honestly, I didnβt find anywhere why 1 - Specificity is more insightful than just Specificity.
Honestly, I didnβt find anywhere why 1 - Specificity is more insightful than just Specificity.
Sensitivity: \(\begin{aligned} \ {\scriptsize \frac{\text{TP}}{\text{TP + FN}} = \text{Prob( Predicted Positive | Real Positive )}}\end{aligned}\)
Specificity: \(\begin{aligned} \ {\scriptsize \frac{\text{TN}}{\text{TN + FP}} = \text{Prob( Predicted Negative | Real Negative )} } \end{aligned}\)
We do not know the condition of the conditional probability: Not the number of future Real Positives nor the number of Real Negatives in the future.
PPV: \(\begin{aligned} \ {\scriptsize \frac{\text{TP}}{\text{TP + FP}} = \text{Prob( Real Positive | Predicted Positive )}}\end{aligned}\)
NPV: \(\begin{aligned} \ {\scriptsize \frac{\text{TN}}{\text{TN + FN}} = \text{Prob( Real Negative | Predicted Negative )} } \end{aligned}\)
We know the condition of the Conditional Probability: The number of Predicted Positives and the number of Predicted Negatives.
Generally speaking more area under a curve with two βGoodβ performance metrics means a better model. Other than that, there is no context and performance metrics with no context might lead to ambiguity and bad decisions.
Another Curve: Precision-Recall is made of PPV (Precision) and Sensitivity (Recall). How much PRAUC is enough?
High Ink-to-information ratio π΅
One might suggest that the visual aspect is useful, but as human beings we are really bad at interpreting round things (Thatβs why pie-charts are considered to be bad practice).
If youβll take randomly one event and one non-event, the probability that the event will be estimated with higher probability than the non-event is exactly the AUROC.
AUROC = p( pΜ(π€¨) < pΜ(π€’) )
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.47 | 0 | π€¨ |
0.45 | 1 | π€’ |
0.33 | 0 | π€¨ |
0.31 | 1 | π€’ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.72 | π€’ π€¨ | 0.47 | |
0.72 | π€’ π€¨ | 0.33 | |
0.72 | π€’ π€¨ | 0.29 | |
0.72 | π€’ π€¨ | 0.18 | |
0.72 | π€’ π€¨ | 0.15 | |
0.72 | π€’ π€¨ | 0.11 |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} }\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.72 | π€’ > π€¨ | 0.47 | π |
0.72 | π€’ > π€¨ | 0.33 | π |
0.72 | π€’ > π€¨ | 0.29 | π |
0.72 | π€’ > π€¨ | 0.18 | π |
0.72 | π€’ > π€¨ | 0.15 | π |
0.72 | π€’ > π€¨ | 0.11 | π |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 +}}{\text{6 + }}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.63 | π€’ π€¨ | 0.47 | |
0.63 | π€’ π€¨ | 0.33 | |
0.63 | π€’ π€¨ | 0.29 | |
0.63 | π€’ π€¨ | 0.18 | |
0.63 | π€’ π€¨ | 0.15 | |
0.63 | π€’ π€¨ | 0.11 |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + }}{\text{6 + }}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.63 | π€’ π€¨ | 0.47 | |
0.63 | π€’ π€¨ | 0.33 | |
0.63 | π€’ π€¨ | 0.29 | |
0.63 | π€’ π€¨ | 0.18 | |
0.63 | π€’ π€¨ | 0.15 | |
0.63 | π€’ π€¨ | 0.11 |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + }}{\text{6 + }}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.63 | π€’ > π€¨ | 0.47 | π |
0.63 | π€’ > π€¨ | 0.33 | π |
0.63 | π€’ > π€¨ | 0.29 | π |
0.63 | π€’ > π€¨ | 0.18 | π |
0.63 | π€’ > π€¨ | 0.15 | π |
0.63 | π€’ > π€¨ | 0.11 | π |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 +}}{\text{6 + 6 + }}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.45 | π€’ π€¨ | 0.47 | |
0.45 | π€’ π€¨ | 0.33 | |
0.45 | π€’ π€¨ | 0.29 | |
0.45 | π€’ π€¨ | 0.18 | |
0.45 | π€’ π€¨ | 0.15 | |
0.45 | π€’ π€¨ | 0.11 |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 +}}{\text{6 + 6 +}}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.45 | π€’ < π€¨ | 0.47 | π |
0.45 | π€’ > π€¨ | 0.33 | π |
0.45 | π€’ > π€¨ | 0.29 | π |
0.45 | π€’ > π€¨ | 0.18 | π |
0.45 | π€’ > π€¨ | 0.15 | π |
0.45 | π€’ > π€¨ | 0.11 | π |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 + 5 +}}{\text{6 + 6 + 6 +}}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.31 | π€’ π€¨ | 0.47 | |
0.31 | π€’ π€¨ | 0.33 | |
0.31 | π€’ π€¨ | 0.29 | |
0.31 | π€’ π€¨ | 0.18 | |
0.31 | π€’ π€¨ | 0.15 | |
0.31 | π€’ π€¨ | 0.11 |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 + 5 +}}{\text{6 + 6 + 6 +}}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | pΜ | π | |
---|---|---|---|
0.31 | π€’ < π€¨ | 0.47 | π |
0.31 | π€’ < π€¨ | 0.33 | π |
0.31 | π€’ > π€¨ | 0.29 | π |
0.31 | π€’ > π€¨ | 0.18 | π |
0.31 | π€’ > π€¨ | 0.15 | π |
0.31 | π€’ > π€¨ | 0.11 | π |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 + 5 + 4}}{\text{6 + 6 + 6 + 6}}}\end{aligned}\)
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{21}}{\text{24}} = 0.875}\end{aligned}\)
probs <- c(0.11, 0.15, 0.18, 0.29, 0.31, 0.33, 0.45, 0.47, 0.63, 0.72)
reals <- c(0, 0, 0, 0, 1, 0, 1, 0, 1, 1)
pROC::auc(reals, probs)
Area under the curve: 0.875
probs_events <- probs[reals == 1]
probs_nonevents <- probs[reals == 0]
prop.table(
table(
sample(probs_events, replace = TRUE, size = 10000) >
sample(probs_nonevents, replace = TRUE, size = 10000)
)
)
FALSE TRUE
0.121 0.879
import numpy as np
import random
probs = np.array([0.11, 0.15, 0.18, 0.29, 0.31, 0.33, 0.45, 0.47, 0.63, 0.72])
reals = np.array([0, 0, 0, 0, 1, 0, 1, 0, 1, 1])
probs_events = probs[reals == 1]
probs_nonevents = probs[reals == 0]
event_prob_greater_than_nonevent_prob = np.greater(
random.choices(sorted(probs_events),
k = 10000),
random.choices(sorted(probs_nonevents),
k = 10000)
)
unique_elements, counts_elements = np.unique(
event_prob_greater_than_nonevent_prob, return_counts=True)
counts_elements / 10000
array([0.13, 0.87])
Age |
7 |
6 |
49 |
56 |
64 |
54 |
72 |
68 |
91 |
86 |
pΜ |
0.11 |
0.15 |
0.18 |
0.29 |
0.31 |
0.33 |
0.45 |
0.47 |
0.63 |
0.72 |
Y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
|
|
|
|
|
|
|
|
|
|
AUROC shows how well your model discriminates between events and non-events given a target population.
Age |
7 |
6 |
49 |
56 |
64 |
54 |
72 |
68 |
91 |
86 |
pΜ |
0.11 |
0.15 |
0.18 |
0.29 |
0.31 |
0.33 |
0.45 |
0.47 |
0.63 |
0.72 |
Y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
|
|
|
|
|
|
|
|
|
|
This model has AUROC = 0875, but the number is misleading:
The Target Population is not well defined.
Age | 49 | 56 | 64 | 54 | 72 | 68 |
pΜ | 0.18 | 0.29 | 0.31 | 0.33 | 0.45 | 0.47 |
Y | 0 | 0 | 1 | 0 | 1 | 0 |
π€¨ |
π€¨ |
π€’ |
π€¨ |
π€’ |
π€¨ |
This model has AUROC = 0.625, but the number is misleading:
The Target Population is well defined.
pΜ | Y | |
---|---|---|
0.72 | 1 | π€’ |
0.63 | 1 | π€’ |
0.45 | 1 | π€’ |
0.31 | 1 | π€’ |
pΜ | Y | |
---|---|---|
0.47 | 0 | π€¨ |
0.33 | 0 | π€¨ |
0.29 | 0 | π€¨ |
0.18 | 0 | π€¨ |
0.15 | 0 | π€¨ |
0.11 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{21}}{\text{24}} = 0.875}\end{aligned}\)
AGE | pΜ | Y | |
---|---|---|---|
86 | 0.72 | 1 | π΅ |
91 | 0.63 | 1 | π΅ |
72 | 0.45 | 1 | π€’ |
64 | 0.31 | 1 | π€’ |
AGE | pΜ | Y | |
---|---|---|---|
68 | 0.47 | 0 | π€¨ |
54 | 0.33 | 0 | π€¨ |
56 | 0.29 | 0 | π€¨ |
49 | 0.18 | 0 | π€¨ |
6 | 0.15 | 0 | π§ |
7 | 0.11 | 0 | π§ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{21}}{\text{24}} = 0.875}\end{aligned}\)
AGE | pΜ | Y | |
---|---|---|---|
72 | 0.45 | 1 | π€’ |
64 | 0.31 | 1 | π€’ |
AGE | pΜ | Y | |
---|---|---|---|
68 | 0.47 | 0 | π€¨ |
54 | 0.33 | 0 | π€¨ |
56 | 0.29 | 0 | π€¨ |
49 | 0.18 | 0 | π€¨ |
\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{5}}{\text{8}} = 0.625}\end{aligned}\)
Curve | Sens | Spec | PPV | PPCR | Lift |
---|---|---|---|---|---|
ROC | y | x | |||
Lift | x | y | |||
Precision- Recall | x | y | |||
Gains | y | x |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
4 (0.4%) | ||
Real |
6 (0.6%) | ||
10 (1%) |
\[\frac{\sum \text{Real-Positives}}{\sum \text{Observations}} = \frac{4}{10}\]
1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
π€’ | π€’ | π€¨ | π€’ | π€¨ | π€’ | π€¨ | π€¨ | π€¨ | π€¨ |
\(\begin{aligned} \ {\scriptsize \frac{\text{TP + FP}}{\text{TP + FP + TN + FN}}}\end{aligned} = \begin{aligned} \ {\scriptsize \frac{\text{Predicted Positives}}{\text{Total Population}}}\end{aligned}\)
Sometimes we will classify each observation according to the ranking of the risk In order to prioritize high-risk patients regardless their absolute risk.
The implied assumption is that the highest risk patients will gain the highest benefit from the treatment and that the treatment does not carry a significant potential risk.
This type of dichotomization is being used when the organization face resource constraint, In healthcare we call it also risk percentile.
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
|||
Real |
|||
2 (0.2%) |
8 (0.8%) |
10 (1%) |
\[\frac{\sum \text{Predicted-Positives}}{\sum \text{Observations}} = \frac{2}{10}\]
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
π | π | ||||||||
π· | π· | π· | π· | π· | π· | π· | π· | π· | π· |
pΜ |
0.11 |
0.15 |
0.18 |
0.29 |
0.31 |
0.33 |
0.45 |
0.47 |
0.63 |
0.72 |
R |
||||||||||
ΕΆ |
||||||||||
Y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
|
|
|
|
|
|
|
|
|
|
|
pΜ |
0.11 |
0.15 |
0.18 |
0.29 |
0.31 |
0.33 |
0.45 |
0.47 |
0.63 |
0.72 |
R |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
ΕΆ |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
Y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
|
|
|
|
|
|
|
|
π |
π |
|
TN |
TN |
TN |
TN |
FN |
TN |
FN |
TN |
TP |
TP |
\(\begin{aligned} \text{Lift} = \frac{\text{PPV}}{\text{Prevalence}} = \frac{\cfrac{\text{TP}}{\text{TP + FP}}}{\cfrac{\text{TP + FN}}{\text{TP + FP + TN + FN}}} \end{aligned}\)
Lift Curve displays Lift on the Y axis and PPCR (Predicted Positives Conditional Rate) on the X axis.
In other words, lift shows how much the prediction is doing better than a random guess in terms of PPV.
The reference line stands for a random guess: the Lift is equal to 1 (PPV = Prevalence).
The Curve is not defined if there are no Predicted Positives (probability threshold is too high or PPCR = 0).
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
0 (0%) |
4 (0.4%) |
4 (0.4%) |
Real |
0 (0%) |
6 (0.6%) |
6 (0.6%) |
0 (0%) |
10 (1%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
1 (0.1%) |
3 (0.3%) |
4 (0.4%) |
Real |
0 (0%) |
6 (0.6%) |
6 (0.6%) |
1 (0.1%) |
9 (0.9%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
2 (0.2%) |
2 (0.2%) |
4 (0.4%) |
Real |
0 (0%) |
6 (0.6%) |
6 (0.6%) |
2 (0.2%) |
8 (0.8%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
2 (0.2%) |
2 (0.2%) |
4 (0.4%) |
Real |
1 (0.1%) |
5 (0.5%) |
6 (0.6%) |
3 (0.3%) |
7 (0.7%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
3 (0.3%) |
1 (0.1%) |
4 (0.4%) |
Real |
1 (0.1%) |
5 (0.5%) |
6 (0.6%) |
4 (0.4%) |
6 (0.6%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
3 (0.3%) |
1 (0.1%) |
4 (0.4%) |
Real |
2 (0.2%) |
4 (0.4%) |
6 (0.6%) |
5 (0.5%) |
5 (0.5%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
4 (0.4%) |
0 (0%) |
4 (0.4%) |
Real |
2 (0.2%) |
4 (0.4%) |
6 (0.6%) |
6 (0.6%) |
4 (0.4%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
4 (0.4%) |
0 (0%) |
4 (0.4%) |
Real |
3 (0.3%) |
3 (0.3%) |
6 (0.6%) |
7 (0.7%) |
3 (0.3%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
4 (0.4%) |
0 (0%) |
4 (0.4%) |
Real |
4 (0.4%) |
2 (0.2%) |
6 (0.6%) |
8 (0.8%) |
2 (0.2%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
4 (0.4%) |
0 (0%) |
4 (0.4%) |
Real |
5 (0.5%) |
1 (0.1%) |
6 (0.6%) |
9 (0.9%) |
1 (0.1%) |
10 (1%) |
Predicted Positives |
Predicted Negatives |
||
---|---|---|---|
Real |
4 (0.4%) |
0 (0%) |
4 (0.4%) |
Real |
6 (0.6%) |
0 (0%) |
6 (0.6%) |
10 (1%) |
0 (0%) |
10 (1%) |
Curve | Sens | Spec | PPV | PPCR | Lift |
---|---|---|---|---|---|
ROC | y | x | |||
Lift | x | y | |||
Precision- Recall | x | y | |||
Gains | y | x |
Precision-Recall Curve displays PPV on the y axis and Sensitivity on the x axis.
The reference line stands for a random guess: the PPV is equal to the Prevalence, the Sensitivity depends on the Probability Threshold or PPCR.
The Curve is not defined if there are no Predicted Positives (probability threshold is too high or PPCR = 0).
Curve | Sens | Spec | PPV | PPCR | Lift |
---|---|---|---|---|---|
ROC | y | x | |||
Lift | x | y | |||
Precision- Recall | x | y | |||
Gains | y | x |
Gains Curve displays Sensitivity on the y axis and PPCR on the x axis.
Gains shows the Sensitivity for a given PPCR.
Reference Line for a Random Guess: The sensitivity is equal to the proportion of predicted positives.
Reference Line for a Perfect Prediction: All Predicted Positives are Real Positives until there are no more Real Positives (PPCR = Prevalence, Sensitivity = 1).
How well the model is βcalibratedβ: Patients with a probability of about 0.2 are expected to have a proportion of about 0.2 observed events.
In order to asses calibration we need to use quantiles of estimated probabilities (discrete version of calibration) or some kind of smoothing algorithm.
The main idea is to visually inspect for similarity with the linear line of 45 degrees.
Visual inspection might be problematic, but from our experience it is a good-enough practice.
An accurate model in terms of discrimination might produce uncalibrated estimated probabilities what will lead to poor decisions.
Logistic Regression is calibrated by default, but if you use fancy ML prediction models they might not be calibrated.
Uncalibrated models can be fixed by remodeling the predictions with a simple logistic regression (recalibration).
Python users might use sklearn.calibration.CalibratedClassifierCV
.
Optimizing the logloss function will produce calibrated model.
pΜ |
0.11 |
0.15 |
0.18 |
0.29 |
0.31 |
0.33 |
0.45 |
0.47 |
0.63 |
0.72 |
R |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
Y |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
|
|
|
|
|
|
|
|
|
|
pΜ | 0.11 | 0.15 | 0.18 | 0.29 |
R | 10 | 9 | 8 | 7 |
ΕΆ | 0 | 0 | 0 | 0 |
Y | 0 | 0 | 0 | 0 |
π€¨ |
π€¨ |
π€¨ |
π€¨ |
\[\begin{aligned} \scriptsize{ \\\text{Observed: }\frac{\text{0}}{\text{4}} = 0} \end{aligned}\]
\[\begin{aligned} \scriptsize{ \\\text{Predicted: }\frac{\text{0.11 + 0.15 + 0.18 + 0.29}}{\text{4}} = 0.1825} \end{aligned}\]
pΜ | 031 | 0.33 | 0.45 |
R | 6 | 5 | 4 |
Y | 1 | 0 | 1 |
π€’ |
π€¨ |
π€’ |
\[\begin{aligned} \scriptsize{ \\\text{Observed: }\frac{\text{2}}{\text{3}} = 0.66'} \end{aligned}\]
\[\begin{aligned} \scriptsize{ \\\text{Predicted: }\frac{\text{0.31 + 0.33 + 0.45}}{\text{3}} = 0.363'} \end{aligned}\]
pΜ | 0.47 | 0.63 | 0.72 |
R | 3 | 2 | 1 |
Y | 0 | 1 | 1 |
π€¨ |
π€’ |
π€’ |
\[\begin{aligned} \scriptsize{ \\\text{Observed: }\frac{\text{2}}{\text{3}} = 0.66'} \end{aligned}\]
\[\begin{aligned} \scriptsize{ \\\text{Predicted: }\frac{\text{0.47 + 0.63 + 0.72}}{\text{3}} = 0.607} \end{aligned}\]
A different approach is to use a smoothing algorithm, in {rtichoke} I use gam for large samples and lowess for small samples.
If you use smooth calibration, take a moment to explore the ranges of the curve!
It might look bad if you donβt zoom-in to the reasonable range of estimated probabilities. Thatβs why many times you might see Histograms or Rug-plots under the Calibration Curve.
\[\begin{aligned} \\{\text{Net Benefit}} = \frac{\text{TP}}{\text{N}} - \frac{\text{FP}}{\text{N}} * {\frac{{p_{t}}}{{1 - p_{t}}}} \end{aligned}\]
In order to make a decision, we need to optimize utility. This requires some kind of price from the clinicians.
This price is the odds of the probability threshold.
Unlike other performance metrics, NB is based on decision making theory (which is reasonable, because we want to do better decision making).
Always consider two baseline approaches: Treat All and Treat None.
\[\begin{aligned} \scriptsize{ \\{\text{Net Benefit}} = \frac{\text{TP}}{\text{N}} - \frac{\text{FP}}{\text{N}} * {\frac{{p_{t}}}{{1 - p_{t}}}}} \end{aligned}\]
\[\begin{aligned} \scriptsize{ \text{Net Benefit Treat All} = {\text{Prevalence}} - {\text{(1 - Prevalence)}} *{\frac{{p_{t}}}{{1 - p_{t}}}}} \end{aligned}\]
\[\begin{aligned} \scriptsize{ \text{Net Benefit Treat None} = {\text{0}} } \end{aligned}\]
I will be indifferent π for having 1 TP for 4 FP \[p_t = \frac{1}{1 + 4} = 0.2\] \[\frac{p_t}{1 - p_t} = \frac{0.2}{1 - 0.2} = \frac{1}{4}\]
\[\begin{aligned}[t] {\text{Net Benefit}} &= {\frac{\text{1}}{\text{5}} - \frac{\text{4}}{\text{5}} * {\frac{1}{4}} = 0} \end{aligned}\]
I will be sad π for having 1 TP for 5 FP \[p_t = \frac{1}{1 + 4} = 0.2\] \[\frac{p_t}{1 - p_t} = \frac{0.2}{1 - 0.2} = \frac{1}{4}\] \[\begin{aligned}[t] {\text{Net Benefit}} &= {\frac{\text{1}}{\text{6}} - \frac{\text{5}}{\text{6}} * {\frac{1}{4}} = -0.04166'} \end{aligned}\]
I will be happy π for having 1 TP for 3 FP \[p_t = \frac{1}{1 + 4} = 0.2\] \[\frac{p_t}{1 - p_t} = \frac{0.2}{1 - 0.2} = \frac{1}{4}\]
\[\begin{aligned}[t] {\text{Net Benefit}} &= {\frac{\text{1}}{\text{4}} - \frac{\text{3}}{\text{4}} * {\frac{1}{4}} = 0.0625} \end{aligned}\]
Decision Curve displays Net Benefit on the y axis and Probability Threshold on the x axis.
Reference Line for Treat All Strategy.
Reference line for Treat None Strategy.