Skip to content
Advertisement

BigQuery – How to find count of Unique overlapping values in 1 or or more categories (Count of categorical values)?

I am very new to BigQuery and standard SQL. I might not be able to figure out the correct approach to solve a problem. Please help me out.

Please help me change this code to get the desired output.

I have a Color Column and a ID column. Example shown below:

Color | ID
Blue | id_1
Blue | id_5
Blue | id_6
Blue | id_2
Blue | id_7
Green | id_1
Green | id_2
Green | id_3
Red | id_3
Red | id_5
Red | id_2
Red | id_4
Red | id_7
Yellow | id_3
Yellow | id_4

I want a resulting output as mentioned below:

  • I want to look at how many times id_1 has repeated across multiple colors and capture the number of colors as a bucket.
  • I am not looking for ‘Color’ names instead, i am looking at the Number of Colors

Reference for input and Expected output/results.

Sample output:

Number of Brands || 1 Color :|: 2 Colors :|: 3 Colors :|: 4 Colors
Count of Unique Ids || 1 | 5 | 1 | 0 |

Thank you for your patience and time in advance.

The difficulty I have is that there is of course overlap so a GROUP BY or a DISTINCT to continuously populate 1 Color, 2 colors, 3 colors and so on dynamically.

SELECT id, Color, COUNT(1) number
FROM project.dataset.table
GROUP BY id, Color

Advertisement

Answer

I would do this on separate rows rather than columns:

select cnt, count(*) as num_colors
from (select id, count(*) as cnt
      from t
      group by id
     ) i
group by cnt
order by cnt

If you want this by columns, you can use conditional aggregation:

select countif(cnt = 1),
       countif(cnt = 2),
       countif(cnt = 3),
       countif(cnt = 4)
from (select id, count(*) as cnt
      from t
      group by id
     ) i;

Note: These assumes that the id/color rows are unique in the original data. Otherwise, use count(distinct color) as cnt.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement