Need to analyze values’ length in DB column and get the % of the Number of the values with the same length.
Desire result:
x
Same length values in COL1 = 70% with LENGTH = 10 chars
It’s not a ‘find most frequent value and calculate its length’, because if we have a KEY or ID column with high cardinality – all values will be different.
Need some fast-working SQL (DB2 dialect is prefered) – not to overload the DB engine (billions of rows).
Example 1
COL1 (VARCHAR 10)
------------------
X01
X02
X03
X04
X05
Result:
100%, 3
Example 2
COL1(VARCHAR 20)
-------------------------
New York
London
Los Angeles
Paris
San Francisco
Result:
20%, 5
(or 20%, 13 - does not matter because values are different)
Advertisement
Answer
A single SELECT statement using GROUP BY GROUPING SETS
operator for any number of columns. The example below presumes, that the constants are result of the corresponding length(varchar_col).
with tab as (
select
length(a) a
, length(b) b
, count(1) cnt
, grouping(length(a)) a_grp
, grouping(length(b)) b_grp
from table(values
('X01', 'New York')
, ('X02', 'London')
, ('X03', 'Los Angeles')
, ('X04', 'Paris')
, ('X05', 'San Francisco')
) t (a, b)
group by grouping sets ((length(a)), (length(b)), ())
)
, row_count as (select cnt from tab where a_grp + b_grp = 2)
, top as (
select a, b, cnt, rownumber() over(partition by a_grp, b_grp order by cnt desc) rn_
from tab
where a_grp + b_grp = 1 -- number of columns - 1
)
select a, b, cnt, 100*cnt/nullif((select cnt from row_count), 0) pst
from top
where rn_=1;
A B CNT PST
-- -- --- ---
3 - 5 100
- 5 1 20