I have a table with a single column in a Postgres 13.1 database. It consists of many rows with comma-separated values – around 20 elements at most.
I want to split the data into multiple columns. But I have only a limited number of columns say 5 and more than 5 CSV values in a single row, so excess values must be shifted to new/next row). How to do this?
Example:
a1, b1, c1 a2, b2, c2, d2, e2, f2 a3, b3, c3, d3, e3, f3, g3, h3, i3, j3 a4 a5, b5, c5 ' ' '
Columns are only 5, so the output would be like:
c1 c2 c3 c4 c5 --------------- a1 b1 c1 a2 b2 c2 d2 e2 f2 a3 b3 c3 d3 e3 f3 g3 h3 i3 j3 a4 a5 b5 c5 ' ' '
Advertisement
Answer
It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.
While stuck with your current situation …
For known small maximum number of elements
A simple solution without trickery or recursion will do:
SELECT id, 1 AS rnk , split_part(csv, ', ', 1) AS c1 , split_part(csv, ', ', 2) AS c2 , split_part(csv, ', ', 3) AS c3 , split_part(csv, ', ', 4) AS c4 , split_part(csv, ', ', 5) AS c5 FROM tbl WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows UNION ALL SELECT id, 2 , split_part(csv, ', ', 6) , split_part(csv, ', ', 7) , split_part(csv, ', ', 8) , split_part(csv, ', ', 9) , split_part(csv, ', ', 10) FROM tbl WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows -- three more blocks to cover a maximum "around 20" ORDER BY id, rnk;
db<>fiddle here
id
being the PK of the original table.
This assumes ‘, ‘ as separator, obviously.
You can adapt easily.
Related:
For unknown number of elements
Various ways. One way use regexp_replace()
to replace every fifth separator before unnesting …
-- for any number of elements SELECT t.id, c.rnk , split_part(c.csv5, ', ', 1) AS c1 , split_part(c.csv5, ', ', 2) AS c2 , split_part(c.csv5, ', ', 3) AS c3 , split_part(c.csv5, ', ', 4) AS c4 , split_part(c.csv5, ', ', 5) AS c5 FROM tbl t , unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk) ORDER BY t.id, c.rnk;
db<>fiddle here
This assumes that the chosen separator ;
never appears in your strings. (Just like ,
can never appear.)
The regular expression pattern is the key: '((?:.*?,){4}.*?),'
(?:)
… “non-capturing” set of parentheses
()
… “capturing” set of parentheses
*?
… non-greedy quantifier
{4}?
… sequence of exactly 4 matches
The replacement '1;'
contains the back-reference 1
.
'g'
as fourth function parameter is required for repeated replacement.
Further reading:
- PostgreSQL & regexp_split_to_array + unnest
- Apply `trim()` and `regexp_replace()` on text array
- PostgreSQL unnest() with element number
Other ways to solve this include a recursive CTE or a set-returning function …
Fill from right to left
(Like you added in How to put values starting from the right side into columns?)
Simply count down numbers like:
SELECT t.id, c.rnk , split_part(c.csv5, ', ', 5) AS c1 , split_part(c.csv5, ', ', 4) AS c2 , split_part(c.csv5, ', ', 3) AS c3 , split_part(c.csv5, ', ', 2) AS c4 , split_part(c.csv5, ', ', 1) AS c5 FROM ...
db<>fiddle here