Skip to content
Advertisement

SQL: Flattening multiple rows in ugly data

I’m working on a fairly odd scenario in which I’ve got to compensate for a client’s database that isn’t very well-ordered. The data (example below) concerns some products that have a main serial number (serial) and between 0 and 4 additional alternate serials (alt_serial). Unfortunately, neither of these columns is a primary key, and each alternate serial number produces an extra row duplicating the first.

test_table:
id | serial     | type           | alt_serial
----+------------+----------------+-----------------
0 | XL00007    | AA             | XL700001
1 | XL00007    | AB             | XL700002
2 | MARF665    | AC             | XTRA0001
3 | MARF665    | AD             | XTRA0002
4 | MARF665    | AE             | XTRA0003
5 | GLOMP12    | AF             | GLOMPX01
6 | GLOMP12    | AG             | GLOMPX02
7 | GLOMP12    | AH             | GLOMPX03
8 | SLONK15    | AI             | SLONKX01
9 | SLONK15    | AJ             | SLONKX02

The goal is to produce a single query (unions are ok) that essentially flattens this data into one row per primary serial, with the contents of the alternate serials condensed (e.g. using CONCAT) into a string in a single column. For example, given the table above, the ideal result of this query would be:

serial      | alt_serials
------------+-----------------------------------------------------
XL00007     | XL700001|XL700002
MARF665     | XTRA0001|XTRA0002|XTRA0003
GLOMP12     | GLOMPX01|GLOMPX02|GLOMPX03
SLONK15     | SLONKX01|SLONKX02

I realize that the ideal solution here is to fix the poorly formed table. Unfortunately, I do not have any kind of access or permissions to modify the data – I’m stuck with the existing table unchanged. For reasons specific to this implementation, I’m also not able to create additional databases or tables.

I’m not a DBA, but I dabble in SQL. I’ve tried a couple of approaches, but the complexity of my queries very quickly explodes as I try to account for this data, and I imagine that there are better ways. Rather than post any of the unsuccessful attempts I’ve come up with, I wanted to ask Stack Overflow’s SQL Gurus: If you were stuck with this unfortunate data, how would you approach the problem?

Advertisement

Answer

  • you can use the string_agg function to concat all of them into 1
select
serial,
string_agg(alt_serial,',') as all_serials
from test_table
group by 1
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement