I am very new to BigQuery and standard SQL. I might not be able to figure out the correct approach to solve a problem. Please help me out.
Please help me change this code to get the desired output.
I have a Color Column and a ID column. Example shown below:
Color | ID
Blue | id_1
Blue | id_5
Blue | id_6
Blue | id_2
Blue | id_7
Green | id_1
Green | id_2
Green | id_3
Red | id_3
Red | id_5
Red | id_2
Red | id_4
Red | id_7
Yellow | id_3
Yellow | id_4
I want a resulting output as mentioned below:
- I want to look at how many times id_1 has repeated across multiple colors and capture the number of colors as a bucket.
- I am not looking for ‘Color’ names instead, i am looking at the Number of Colors
Reference for input and Expected output/results.
Sample output:
Number of Brands || 1 Color :|: 2 Colors :|: 3 Colors :|: 4 Colors
Count of Unique Ids || 1 | 5 | 1 | 0 |
Thank you for your patience and time in advance.
The difficulty I have is that there is of course overlap so a GROUP BY or a DISTINCT to continuously populate 1 Color, 2 colors, 3 colors and so on dynamically.
SELECT id, Color, COUNT(1) number FROM project.dataset.table GROUP BY id, Color
Advertisement
Answer
I would do this on separate rows rather than columns:
select cnt, count(*) as num_colors from (select id, count(*) as cnt from t group by id ) i group by cnt order by cnt
If you want this by columns, you can use conditional aggregation:
select countif(cnt = 1), countif(cnt = 2), countif(cnt = 3), countif(cnt = 4) from (select id, count(*) as cnt from t group by id ) i;
Note: These assumes that the id/color rows are unique in the original data. Otherwise, use count(distinct color) as cnt
.