Skip to content
Advertisement

How to select data with percentages from Clickhouse?

Given the following table:

I’m trying to fetch all statuses by host, with their relative percentage. In order to do that, I need to count results grouped by host and status, and divide each counted field by its total host fields count.

For example, this query would have worked in MySQL:

But ClickHouse throws:

DB::Exception: Unknown identifier: host_total; there are columns: host_id, status_code, count(): While processing host_id, status_code, count() AS status_count, (count() / host_total) * 100 AS percent. (UNKNOWN_IDENTIFIER)

Probably because correlated (dependent) subqueries are not supported.

I was suggested to use CTE, so I tried this:

But still no luck:

DB::Exception: Missing columns: ‘main.host_id’ while processing query: ‘SELECT host_total FROM sub WHERE host_id = main.host_id’, required columns: ‘host_total’ ‘host_id’ ‘main.host_id’ ‘host_total’ ‘host_id’ ‘main.host_id’: While processing (SELECT host_total FROM sub WHERE sub.host_id = main.host_id) AS _subquery20: While processing count() / ((SELECT host_total FROM sub WHERE sub.host_id = main.host_id) AS _subquery20): While processing (count() / ((SELECT host_total FROM sub WHERE sub.host_id = main.host_id) AS _subquery20)) * 100 AS percent. (UNKNOWN_IDENTIFIER)

Advertisement

Answer

CH throws an incorrect error. https://github.com/ClickHouse/ClickHouse/issues/4567

host_total should be in groupby section or under aggregate function

But there is better ways to solve it:

window functions

arrays

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement