Skip to content
Advertisement

Percentage distribution of column occurence?

I am looking for a way to calculate the distribution of column values in my table. e.g. if I have two rows containing “red” and “blue”, each should have 50%. I want to count the occurence of col and compare that to the overall amount of rows.

My attempt:

SELECT
    log_domain,
    count(log_domain),
    count(log_domain) over(),
    ROUND(
        COUNT(log_domain)
        /
        COUNT(*) OVER()
    ,2) AS percentage

FROM logs

GROUP BY log_domain


Any help? Thank you!

Advertisement

Answer

Just pay attention to integer division. I often just multiply by 1.0:

SELECT log_domain, COUNT(*), COUNT(*) OVER (),
        ROUND(COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), 2) as ratio
FROM logs
GROUP BY log_domain;

I also notice that the denominator needs to be SUM(COUNT(*)) rather than COUNT(*). Your version just divides by the number of rows in the result set — that is, the number of values of log_domain.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement