I am looking for a way to calculate the distribution of column values in my table.
e.g. if I have two rows containing “red” and “blue”, each should have 50%.
I want to count the occurence of col
and compare that to the overall amount of rows.
My attempt:
x
SELECT
log_domain,
count(log_domain),
count(log_domain) over(),
ROUND(
COUNT(log_domain)
/
COUNT(*) OVER()
,2) AS percentage
FROM logs
GROUP BY log_domain
Any help? Thank you!
Advertisement
Answer
Just pay attention to integer division. I often just multiply by 1.0:
SELECT log_domain, COUNT(*), COUNT(*) OVER (),
ROUND(COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), 2) as ratio
FROM logs
GROUP BY log_domain;
I also notice that the denominator needs to be SUM(COUNT(*))
rather than COUNT(*)
. Your version just divides by the number of rows in the result set — that is, the number of values of log_domain
.