I have a data with following structure
ColumnA ColumnB ColumnC ------------------------- 1 1 Test 1 1 FAIL 1 1 FAIL 1 2 FAIL 1 2 FAIL 1 2 FAIL 2 1 TEST 2 1 FAIL
As title suggests, I want to count occurrence of ‘FAIL’ in ColumnC
after grouping by ColumnA
, and ColumnB
and while counting the occurrence, I want to count only one ‘FAIL’ in the group.
For example, if I did the counting for the above example data, the result will be:
ColumnA ColumnB count --------------------------- 1 1 1 (not 2) 1 2 1 (not 3) 2 1 1
SQL I’ve made so far:
SELECT SUM((CASE ColumnC WHEN 'FAIL' THEN 1 WHEN 'TEST' THEN 0 WHEN 'Test' THEN 0 END)) FROM table GROUP BY ColumnA, ColumnB
Above query counts every single ‘FAIL’ (above example gives 4 for 1 in ColumnA
) in the group but I only want to count only one occurrence in the group. How should I tackle this?
In pandas
, I could do something like change value by adding sequence to it by using cumcount
and change the values by adding sequence number then count only ‘FAIL1’ after a GROUP BY
.
Is it possible to do it in SQL?
Advertisement
Answer
This answers the original version of the question.
I think you want count(distinct)
:
select columnA, count(distinct case when columnC = 'Fail' then columnB end) from t group by columnA;
Or more concisely:
select columnA, count(distinct columnB) from t where columnC = 'Fail' group by columnA;