I am looking for a way to calculate the distribution of column values in my table.
e.g. if I have two rows containing “red” and “blue”, each should have 50%.
I want to count the occurence of col
and compare that to the overall amount of rows.
My attempt:
SELECT log_domain, count(log_domain), count(log_domain) over(), ROUND( COUNT(log_domain) / COUNT(*) OVER() ,2) AS percentage FROM logs GROUP BY log_domain
Any help? Thank you!
Advertisement
Answer
Just pay attention to integer division. I often just multiply by 1.0:
SELECT log_domain, COUNT(*), COUNT(*) OVER (), ROUND(COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), 2) as ratio FROM logs GROUP BY log_domain;
I also notice that the denominator needs to be SUM(COUNT(*))
rather than COUNT(*)
. Your version just divides by the number of rows in the result set — that is, the number of values of log_domain
.