Skip to content
Advertisement

How to create a crosstab table of percentages bucketed into deciles in mysql

I’ve got the following table:

SELECT key, Precision, Recall, F1_score, Accuracy
FROM scores_table
LIMIT 5;

key             Precision   Recall  F1_score    Accuracy
2500004088-783  13.33       18.18   15.38       77.32
2500004086-918  11.76       22.22   15.38       77.32
2500010000-768  100         100     100         100
2500010000-908  100         71.43   83.33       97.92
2500010000-305  85.71       92.31   88.89       96.88

I can bucket those percentages into deciles and create a crosstab table in Pandas like this:

pd.crosstab(pd.cut(df.Precision, bins=range(0, 101, 10), include_lowest=True), pd.cut(df.Recall, bins=range(0, 101, 10), include_lowest=True))


Recall  (0, 10] (10, 20]    (20, 30]    (30, 40]    (40, 50]    (50, 60]    (60, 70]    (70, 80]    (80, 90]    (90, 100]
Precision                                       
(0, 10]     593 178 56  39  20  3   2   1   0   4
(10, 20]    13  107 149 121 90  14  14  7   2   12
(20, 30]    0   14  36  54  44  13  14  4   2   4
(30, 40]    0   3   5   13  12  16  15  3   0   3
(40, 50]    0   1   3   5   2   5   5   0   0   3
(50, 60]    0   0   0   0   4   2   1   3   0   1
(60, 70]    0   0   0   1   1   1   2   2   1   1
(70, 80]    0   0   0   0   1   4   2   4   1   1
(80, 90]    0   0   0   0   0   0   6   2   3   2
(90, 100]   0   0   0   0   1   2   1   4   3   7

I’m trying to find a way to do this entirely in MySQL, without saving the table to a Pandas DataFrame first. Is that possible?

Thanks!!

Advertisement

Answer

Here’s the final code. To exactly match the Pandas output, I needed to force Precision values of 0 up slightly to put those values into the first bucket of >0-10

select (ceiling(case when Precision = 0 then 0.1 else Precision end) / 10) * 10 as Precision,
       sum(case when Recall >= 0 and Recall <= 10 then 1 else 0 end) as recall_0_10,
       sum(case when Recall > 10 and Recall <= 20 then 1 else 0 end) as recall_10_20,
       sum(case when Recall > 20 and Recall <= 30 then 1 else 0 end) as recall_20_30,
       sum(case when Recall > 30 and Recall <= 40 then 1 else 0 end) as recall_30_40,
       sum(case when Recall > 40 and Recall <= 50 then 1 else 0 end) as recall_40_50,
       sum(case when Recall > 50 and Recall <= 60 then 1 else 0 end) as recall_50_60,
       sum(case when Recall > 60 and Recall <= 70 then 1 else 0 end) as recall_60_70,
       sum(case when Recall > 70 and Recall <= 80 then 1 else 0 end) as recall_70_80,
       sum(case when Recall > 80 and Recall <= 90 then 1 else 0 end) as recall_80_90,
       sum(case when Recall > 90 and Recall <= 100 then 1 else 0 end) as recall_90_100
from scores_table
group by 1
order by Precision;
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement