I am new to SQL. I have a table that has billion of records with multiple columns, let’s say a,b,c,d,e,f.
I would like to create a query for number of records for each column with certain condition and the next column is the percentage of that result from total number of records, then column b, c,d and e with the same condition. I would like to get the output of my query in a single row.
So, for column a, the query that I would like is something like this:
select count(a) from table 1 where a >0 and date > '2020-01-01'
the next column would the percentage of the above results relative to total number of records using query like this
select count(*) from table 1 where date >'2020-01-01'
Column 3 will be the result of this query
select count(b) from table 1 where a >0 and date > '2020-01-01'
column 4 will be the percentage of the number of records of column b with the avove condition relative to total number records
and the same thing for column c,d, and e.
The expected results would be something like this
number of records for column a with condition, percentage of a with condition relative to total records, number of records for column b with condition, percentage of b with condition relative to total records, …………………
How should the query be written correctly?
Advertisement
Answer
Just use conditional aggregation. Here is one method:
select avg(case when a > 0 then 1.0 else 0 end) from table where date >'2020-01-01'
Or, what is a wee bit shorter in BigQuery:
select countif(a > 0) / count(*) from table where date >'2020-01-01' ;
You can repeat either of these for the rest of the columns.