I’ve got the following table:
SELECT key, Precision, Recall, F1_score, Accuracy FROM scores_table LIMIT 5; key Precision Recall F1_score Accuracy 2500004088-783 13.33 18.18 15.38 77.32 2500004086-918 11.76 22.22 15.38 77.32 2500010000-768 100 100 100 100 2500010000-908 100 71.43 83.33 97.92 2500010000-305 85.71 92.31 88.89 96.88
I can bucket those percentages into deciles and create a crosstab table in Pandas like this:
pd.crosstab(pd.cut(df.Precision, bins=range(0, 101, 10), include_lowest=True), pd.cut(df.Recall, bins=range(0, 101, 10), include_lowest=True)) Recall (0, 10] (10, 20] (20, 30] (30, 40] (40, 50] (50, 60] (60, 70] (70, 80] (80, 90] (90, 100] Precision (0, 10] 593 178 56 39 20 3 2 1 0 4 (10, 20] 13 107 149 121 90 14 14 7 2 12 (20, 30] 0 14 36 54 44 13 14 4 2 4 (30, 40] 0 3 5 13 12 16 15 3 0 3 (40, 50] 0 1 3 5 2 5 5 0 0 3 (50, 60] 0 0 0 0 4 2 1 3 0 1 (60, 70] 0 0 0 1 1 1 2 2 1 1 (70, 80] 0 0 0 0 1 4 2 4 1 1 (80, 90] 0 0 0 0 0 0 6 2 3 2 (90, 100] 0 0 0 0 1 2 1 4 3 7
I’m trying to find a way to do this entirely in MySQL, without saving the table to a Pandas DataFrame first. Is that possible?
Thanks!!
Advertisement
Answer
Here’s the final code. To exactly match the Pandas output, I needed to force Precision values of 0 up slightly to put those values into the first bucket of >0-10
select (ceiling(case when Precision = 0 then 0.1 else Precision end) / 10) * 10 as Precision, sum(case when Recall >= 0 and Recall <= 10 then 1 else 0 end) as recall_0_10, sum(case when Recall > 10 and Recall <= 20 then 1 else 0 end) as recall_10_20, sum(case when Recall > 20 and Recall <= 30 then 1 else 0 end) as recall_20_30, sum(case when Recall > 30 and Recall <= 40 then 1 else 0 end) as recall_30_40, sum(case when Recall > 40 and Recall <= 50 then 1 else 0 end) as recall_40_50, sum(case when Recall > 50 and Recall <= 60 then 1 else 0 end) as recall_50_60, sum(case when Recall > 60 and Recall <= 70 then 1 else 0 end) as recall_60_70, sum(case when Recall > 70 and Recall <= 80 then 1 else 0 end) as recall_70_80, sum(case when Recall > 80 and Recall <= 90 then 1 else 0 end) as recall_80_90, sum(case when Recall > 90 and Recall <= 100 then 1 else 0 end) as recall_90_100 from scores_table group by 1 order by Precision;