I’m trying to categorize data that seem to similar from my database table.
Here’s how the table looks like
id | account |
---|---|
1 | 011 |
2 | 11 |
3 | 023 |
4 | 23 |
5 | 456 |
I need a select query that will categorize my data as in the table below ignoring the unique values.
I can achieve this in pandas but I would appreciate if I also learn how to achieve the same in SQL
id | account1 | account2 |
---|---|---|
1 | 011 | 11 |
2 | 023 | 23 |
Advertisement
Answer
Well the simple answer is properly store account
as data type numeric or integer, then just check for duplicates, remove them, and finally put a unique constraint on account
– done problem solved. But assuming I will hear I can not change the database becomes a different story. The first thing you need is filter out any non-numeric account values (may just be me but I never trust text columns to actually contain numeric data). Then with that out-of-the-way do a self on the remaining and assign the new id. The result becomes: (see demo)
with num_acct (id, account) as (select * from test where account ~ '^[0-9]+$' ) select row_number() over() as id , na1.account as account1 , na2.account as account2 from num_acct na1 join num_acct na2 on ( na1.account::integer = na2.account::integer and na1.id < na2.id );