Skip to content
Advertisement

How to write a SQL to count total number of occurrences of value in column after group by while taking count as 1 if the group has the value?

I have a data with following structure

ColumnA  ColumnB  ColumnC
-------------------------
  1         1      Test
  1         1      FAIL
  1         1      FAIL

  1         2      FAIL
  1         2      FAIL
  1         2      FAIL

  2         1      TEST
  2         1      FAIL

As title suggests, I want to count occurrence of ‘FAIL’ in ColumnC after grouping by ColumnA, and ColumnB and while counting the occurrence, I want to count only one ‘FAIL’ in the group.

For example, if I did the counting for the above example data, the result will be:

ColumnA ColumnB count
---------------------------
   1      1       1 (not 2)
   1      2       1 (not 3)
   2      1       1

SQL I’ve made so far:

SELECT
    SUM((CASE ColumnC WHEN 'FAIL' THEN 1 WHEN 'TEST' THEN 0 WHEN 'Test' THEN 0 END))
FROM
    table
GROUP BY 
    ColumnA, ColumnB

Above query counts every single ‘FAIL’ (above example gives 4 for 1 in ColumnA) in the group but I only want to count only one occurrence in the group. How should I tackle this?

In pandas, I could do something like change value by adding sequence to it by using cumcount and change the values by adding sequence number then count only ‘FAIL1’ after a GROUP BY.

Is it possible to do it in SQL?

Advertisement

Answer

This answers the original version of the question.

I think you want count(distinct):

select columnA,
       count(distinct case when columnC = 'Fail' then columnB end)
from t
group by columnA;

Or more concisely:

select columnA, count(distinct columnB)
from t
where columnC = 'Fail'
group by columnA;
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement