I’ve got the following table:
x
SELECT key, Precision, Recall, F1_score, Accuracy
FROM scores_table
LIMIT 5;
key Precision Recall F1_score Accuracy
2500004088-783 13.33 18.18 15.38 77.32
2500004086-918 11.76 22.22 15.38 77.32
2500010000-768 100 100 100 100
2500010000-908 100 71.43 83.33 97.92
2500010000-305 85.71 92.31 88.89 96.88
I can bucket those percentages into deciles and create a crosstab table in Pandas like this:
pd.crosstab(pd.cut(df.Precision, bins=range(0, 101, 10), include_lowest=True), pd.cut(df.Recall, bins=range(0, 101, 10), include_lowest=True))
Recall (0, 10] (10, 20] (20, 30] (30, 40] (40, 50] (50, 60] (60, 70] (70, 80] (80, 90] (90, 100]
Precision
(0, 10] 593 178 56 39 20 3 2 1 0 4
(10, 20] 13 107 149 121 90 14 14 7 2 12
(20, 30] 0 14 36 54 44 13 14 4 2 4
(30, 40] 0 3 5 13 12 16 15 3 0 3
(40, 50] 0 1 3 5 2 5 5 0 0 3
(50, 60] 0 0 0 0 4 2 1 3 0 1
(60, 70] 0 0 0 1 1 1 2 2 1 1
(70, 80] 0 0 0 0 1 4 2 4 1 1
(80, 90] 0 0 0 0 0 0 6 2 3 2
(90, 100] 0 0 0 0 1 2 1 4 3 7
I’m trying to find a way to do this entirely in MySQL, without saving the table to a Pandas DataFrame first. Is that possible?
Thanks!!
Advertisement
Answer
Here’s the final code. To exactly match the Pandas output, I needed to force Precision values of 0 up slightly to put those values into the first bucket of >0-10
select (ceiling(case when Precision = 0 then 0.1 else Precision end) / 10) * 10 as Precision,
sum(case when Recall >= 0 and Recall <= 10 then 1 else 0 end) as recall_0_10,
sum(case when Recall > 10 and Recall <= 20 then 1 else 0 end) as recall_10_20,
sum(case when Recall > 20 and Recall <= 30 then 1 else 0 end) as recall_20_30,
sum(case when Recall > 30 and Recall <= 40 then 1 else 0 end) as recall_30_40,
sum(case when Recall > 40 and Recall <= 50 then 1 else 0 end) as recall_40_50,
sum(case when Recall > 50 and Recall <= 60 then 1 else 0 end) as recall_50_60,
sum(case when Recall > 60 and Recall <= 70 then 1 else 0 end) as recall_60_70,
sum(case when Recall > 70 and Recall <= 80 then 1 else 0 end) as recall_70_80,
sum(case when Recall > 80 and Recall <= 90 then 1 else 0 end) as recall_80_90,
sum(case when Recall > 90 and Recall <= 100 then 1 else 0 end) as recall_90_100
from scores_table
group by 1
order by Precision;