I have a table with a single column in a Postgres 13.1 database. It consists of many rows with comma-separated values – around 20 elements at most.
I want to split the data into multiple columns. But I have only a limited number of columns say 5 and more than 5 CSV values in a single row, so excess values must be shifted to new/next row). How to do this?
Example:
a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
'
'
'
Columns are only 5, so the output would be like:
c1 c2 c3 c4 c5
---------------
a1 b1 c1
a2 b2 c2 d2 e2
f2
a3 b3 c3 d3 e3
f3 g3 h3 i3 j3
a4
a5 b5 c5
'
'
'
Advertisement
Answer
It is typically bad design to store CSV values in a single column. If at all possible, use an array or a properly normalized design instead.
While stuck with your current situation …
For known small maximum number of elements
A simple solution without trickery or recursion will do:
SELECT id, 1 AS rnk
, split_part(csv, ', ', 1) AS c1
, split_part(csv, ', ', 2) AS c2
, split_part(csv, ', ', 3) AS c3
, split_part(csv, ', ', 4) AS c4
, split_part(csv, ', ', 5) AS c5
FROM tbl
WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows
UNION ALL
SELECT id, 2
, split_part(csv, ', ', 6)
, split_part(csv, ', ', 7)
, split_part(csv, ', ', 8)
, split_part(csv, ', ', 9)
, split_part(csv, ', ', 10)
FROM tbl
WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows
-- three more blocks to cover a maximum "around 20"
ORDER BY id, rnk;
db<>fiddle here
id
being the PK of the original table.
This assumes ‘, ‘ as separator, obviously.
You can adapt easily.
Related:
For unknown number of elements
Various ways. One way use regexp_replace()
to replace every fifth separator before unnesting …
-- for any number of elements
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 1) AS c1
, split_part(c.csv5, ', ', 2) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 4) AS c4
, split_part(c.csv5, ', ', 5) AS c5
FROM tbl t
, unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER BY t.id, c.rnk;
db<>fiddle here
This assumes that the chosen separator ;
never appears in your strings. (Just like ,
can never appear.)
The regular expression pattern is the key: '((?:.*?,){4}.*?),'
(?:)
… “non-capturing” set of parentheses
()
… “capturing” set of parentheses
*?
… non-greedy quantifier
{4}?
… sequence of exactly 4 matches
The replacement '1;'
contains the back-reference 1
.
'g'
as fourth function parameter is required for repeated replacement.
Further reading:
- PostgreSQL & regexp_split_to_array + unnest
- Apply `trim()` and `regexp_replace()` on text array
- PostgreSQL unnest() with element number
Other ways to solve this include a recursive CTE or a set-returning function …
Fill from right to left
(Like you added in How to put values starting from the right side into columns?)
Simply count down numbers like:
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 5) AS c1
, split_part(c.csv5, ', ', 4) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 2) AS c4
, split_part(c.csv5, ', ', 1) AS c5
FROM
db<>fiddle here