I have a data with following structure
ColumnA ColumnB ColumnC
-------------------------
1 1 Test
1 1 FAIL
1 1 FAIL
1 2 FAIL
1 2 FAIL
1 2 FAIL
2 1 TEST
2 1 FAIL
As title suggests, I want to count occurrence of ‘FAIL’ in ColumnC
after grouping by ColumnA
, and ColumnB
and while counting the occurrence, I want to count only one ‘FAIL’ in the group.
For example, if I did the counting for the above example data, the result will be:
ColumnA ColumnB count
---------------------------
1 1 1 (not 2)
1 2 1 (not 3)
2 1 1
SQL I’ve made so far:
SELECT
SUM((CASE ColumnC WHEN 'FAIL' THEN 1 WHEN 'TEST' THEN 0 WHEN 'Test' THEN 0 END))
FROM
table
GROUP BY
ColumnA, ColumnB
Above query counts every single ‘FAIL’ (above example gives 4 for 1 in ColumnA
) in the group but I only want to count only one occurrence in the group. How should I tackle this?
In pandas
, I could do something like change value by adding sequence to it by using cumcount
and change the values by adding sequence number then count only ‘FAIL1’ after a GROUP BY
.
Is it possible to do it in SQL?
Advertisement
Answer
This answers the original version of the question.
I think you want count(distinct)
:
select columnA,
count(distinct case when columnC = 'Fail' then columnB end)
from t
group by columnA;
Or more concisely:
select columnA, count(distinct columnB)
from t
where columnC = 'Fail'
group by columnA;