Need to analyze values’ length in DB column and get the % of the Number of the values with the same length.
Desire result:
Same length values in COL1 = 70% with LENGTH = 10 chars
It’s not a ‘find most frequent value and calculate its length’, because if we have a KEY or ID column with high cardinality – all values will be different.
Need some fast-working SQL (DB2 dialect is prefered) – not to overload the DB engine (billions of rows).
Example 1
COL1 (VARCHAR 10) ------------------ X01 X02 X03 X04 X05
Result:
100%, 3
Example 2
COL1(VARCHAR 20) ------------------------- New York London Los Angeles Paris San Francisco
Result:
20%, 5 (or 20%, 13 - does not matter because values are different)
Advertisement
Answer
A single SELECT statement using GROUP BY GROUPING SETS
operator for any number of columns. The example below presumes, that the constants are result of the corresponding length(varchar_col).
with tab as ( select length(a) a , length(b) b , count(1) cnt , grouping(length(a)) a_grp , grouping(length(b)) b_grp from table(values ('X01', 'New York') , ('X02', 'London') , ('X03', 'Los Angeles') , ('X04', 'Paris') , ('X05', 'San Francisco') ) t (a, b) group by grouping sets ((length(a)), (length(b)), ()) ) , row_count as (select cnt from tab where a_grp + b_grp = 2) , top as ( select a, b, cnt, rownumber() over(partition by a_grp, b_grp order by cnt desc) rn_ from tab where a_grp + b_grp = 1 -- number of columns - 1 ) select a, b, cnt, 100*cnt/nullif((select cnt from row_count), 0) pst from top where rn_=1; A B CNT PST -- -- --- --- 3 - 5 100 - 5 1 20