Skip to content
Advertisement

SQL: How to determine the most frequent data length in the DB column?

Need to analyze values’ length in DB column and get the % of the Number of the values with the same length.

Desire result:

            Same length values in COL1 = 70%  with LENGTH = 10 chars

It’s not a ‘find most frequent value and calculate its length’, because if we have a KEY or ID column with high cardinality – all values will be different.

Need some fast-working SQL (DB2 dialect is prefered) – not to overload the DB engine (billions of rows).

Example 1

         COL1 (VARCHAR 10) 
         ------------------
                     X01   
                     X02   
                     X03   
                     X04   
                     X05   

Result:

            100%, 3

Example 2

           COL1(VARCHAR 20)
         -------------------------
                    New York
                    London
                    Los Angeles
                    Paris
                    San Francisco

Result:

            20%, 5 
           (or 20%, 13 - does not matter because values are different)

Advertisement

Answer

A single SELECT statement using GROUP BY GROUPING SETS operator for any number of columns. The example below presumes, that the constants are result of the corresponding length(varchar_col).

with tab as (
select
  length(a) a
, length(b) b
, count(1) cnt
, grouping(length(a)) a_grp
, grouping(length(b)) b_grp
from table(values
  ('X01', 'New York')     
, ('X02', 'London')       
, ('X03', 'Los Angeles')  
, ('X04', 'Paris')        
, ('X05', 'San Francisco')
) t (a, b)
group by grouping sets ((length(a)), (length(b)), ())
)
, row_count as (select cnt from tab where a_grp + b_grp = 2)
, top as (
select a, b, cnt, rownumber() over(partition by a_grp, b_grp order by cnt desc) rn_
from tab
where a_grp + b_grp  = 1 -- number of columns - 1
)
select a, b, cnt, 100*cnt/nullif((select cnt from row_count), 0) pst
from top
where rn_=1;

 A  B CNT PST
-- -- --- ---
 3  -   5 100
 -  5   1  20
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement