Skip to content
Advertisement

Sorting out similar values in the same column SQL

I’m trying to categorize data that seem to similar from my database table.

Here’s how the table looks like

id account
1 011
2 11
3 023
4 23
5 456

I need a select query that will categorize my data as in the table below ignoring the unique values.

I can achieve this in pandas but I would appreciate if I also learn how to achieve the same in SQL

id account1 account2
1 011 11
2 023 23

Advertisement

Answer

Well the simple answer is properly store account as data type numeric or integer, then just check for duplicates, remove them, and finally put a unique constraint on account – done problem solved. But assuming I will hear I can not change the database becomes a different story. The first thing you need is filter out any non-numeric account values (may just be me but I never trust text columns to actually contain numeric data). Then with that out-of-the-way do a self on the remaining and assign the new id. The result becomes: (see demo)

with num_acct (id, account) as 
     (select *
        from test 
       where account ~ '^[0-9]+$'
     )     
select row_number() over() as id  
     , na1.account as account1 
     , na2.account as account2 
  from num_acct na1
  join num_acct na2
    on (    na1.account::integer = na2.account::integer 
        and na1.id < na2.id
       );   
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement