I currently working on a table on Oracle and I need to split one its field (RH02X) into multiple rows when it the field exceeds 70 characters but I also need to make sure that no word would be cut. I am only allowed to use SQL without PL/SQL.
For example, I have the following data from the table test_remarks
REFNO, RH01D, RH02X, RH03X 0001156, 08-DEC-10, Document delete Description: FW Test Payment DHIS ID: 10616132 DOCL.source: EMAIL DOCL.sourcid: 10616132 DOCL.imageid: 26724898, DOCD 0001156, 08-DEC-10, Document delete Description: RE Test Payment DHIS ID: 10619470 DOCL.source: EMAIL DOCL.sourcid: 10619470 DOCL.imageid: 26733538, DOCD
What I need to accomplish looks like the table below.
REFNO, RH01D, RH02X, RH03X, SEQNO, LENGTH_RH02X 0001156, 08-DEC-10, Document delete Description: FW Test Payment DHIS ID: 10616132, DOCD, 1, 62 0001156, 08-DEC-10, DOCL.source: EMAIL DOCL.sourcid: 10616132 DOCL.imageid: 26724898, DOCD, 2, 64 0001156, 08-DEC-10, Document delete Description: RE Test Payment DHIS ID: 10619470, DOCD, 3, 62 0001156, 08-DEC-10, DOCL.source: EMAIL DOCL.sourcid: 10619470 DOCL.imageid: 26733538, DOCD, 4, 64
From the expected rows, as you can see, I also need to create a sequence for every unique occurrence of REFNO, RH01D and RH03X.
On my query, I am able to split the characters of the RH02X field by 70 but I can’t do the logical part of the code wherein there should be no word that be cut when doing the split and assigning the correct sequence number when all the fields being used for seqno derivation is same.
I am using the code below:
SELECT y.*, length(y.rh02x) AS length_rh02x FROM ( SELECT x.refno, x.rh01d, SUBSTR(x.rh02x, 70 * (lvl.column_value - 1) + 1, 70) AS rh02x, x.rh03x, lvl.column_value FROM test_remarks x CROSS JOIN TABLE(CAST(MULTISET(SELECT LEVEL FROM dual CONNECT BY LEVEL <= CEIL(LENGTH(x.rh02x)/70)) AS sys.odcinumberlist)) lvl ) y;
REFNO | RH01D | RH02X | RH03X | SEQNO | LENGTH_RH02X |
---|---|---|---|---|---|
0001156 | 08-DEC-10 | Document delete Description: FW Test Payment DHIS ID: 10616132 DOCL.so | DOCD | 1 | 70 |
0001156 | 08-DEC-10 | urce: EMAIL DOCL.sourcid: 10616132 DOCL.imageid: 26724898 | DOCD | 2 | 57 |
0001156 | 08-DEC-10 | Document delete Description: RE Test Payment DHIS ID: 10619470 DOCL.so | DOCD | 1 | 70 |
0001156 | 08-DEC-10 | urce: EMAIL DOCL.sourcid: 10619470 DOCL.imageid: 26733538 | DOCD | 2 | 57 |
From my actual result above, you can see that ‘DOCL.source:’ was split between the 1st and 2nd (1st and 2nd row) (3rd and 4th). Hoping you could help me on this.
Advertisement
Answer
You can use recursive subquery factoring to identify the first point to split the string (if needed) in the anchor member, and pass the remainder of the string to the recursive member to repeat the process.
This uses instr()
to find the last space character in the first 70 characters, as pos
– if there is no space or the string is less than 70 characters that will be zero. The pos
value is then used to split the string if it’s long enough and can be split, or take up to 70 characters otherwise.
with rcte (refno, rh01d, rh02x, rh03x, rn1, rn2, line_num, remaining) as ( select refno, rh01d, substr(rh02x, 1, case when pos = 0 then 70 else pos - 1 end), rh03x, row_number() over (order by refno, rh01d, rh03x), -- preferably also some unique column? dense_rank() over (order by refno, rh01d, rh03x), -- preferably also some unique column? 1, substr(rh02x, case when pos = 0 then 70 else pos end + 1) from test_remarks cross apply ( select case when length(rh02x) <= 70 then 0 else instr(substr(rh02x, 1, 70), ' ', -1) end as pos from dual ) union all select refno, rh01d, substr(remaining, 1, case when pos = 0 then 70 else pos - 1 end), rh03x, rn1, rn2, line_num + 1, substr(remaining, case when pos = 0 then 70 else pos end + 1) from rcte cross apply ( select case when length(remaining) <= 70 then 0 else instr(substr(remaining, 1, 70), ' ', -1) end as pos from dual ) where remaining is not null ) cycle remaining set is_cycle to 'Y' default 'N' select refno, rh01d, rh02x, rh03x, dense_rank() over (partition by rn2 order by rn1, line_num) as seq, length(rh02x) as length_rh02x, rn1, rn2, line_num -- for info only from rcte order by rn1, line_num;
Oracle 12c is required because of the cross apply
; it will be (even) messier on earlier versions.
The recursive member is essentially doing the same as the anchor member, except the anchor is manipulating the original string from the table while the recursive is manipulating the remainder from the previous iteration.
The rn1
, rn2
and line_num
are used to track which original value is being referred to and which segment this is. They are set in the anchor member, and kept (rn1
and rn2
) or incremented (line_num
) in the recursive member. Those are used to order the output so related lines are together and in the same order; and also used to work out the seq
value. I’ve left those values in the output so you can see what’s happening, but they don’t need to be included in your final query.
REFNO | RH01D | RH02X | RH03X | SEQ | LENGTH_RH02X | RN1 | RN2 | LINE_NUM |
---|---|---|---|---|---|---|---|---|
0001156 | 08-DEC-10 | Document delete Description: FW Test Payment DHIS ID: 10616132 | DOCD | 1 | 62 | 1 | 1 | 1 |
0001156 | 08-DEC-10 | DOCL.source: EMAIL DOCL.sourcid: 10616132 DOCL.imageid: 26724898 | DOCD | 2 | 64 | 1 | 1 | 2 |
0001156 | 08-DEC-10 | Document delete Description: RE Test Payment DHIS ID: 10619470 | DOCD | 3 | 62 | 2 | 1 | 1 |
0001156 | 08-DEC-10 | DOCL.source: EMAIL DOCL.sourcid: 10619470 DOCL.imageid: 26733538 | DOCD | 4 | 64 | 2 | 1 | 2 |
db<>fiddle with some extra data to show edge cases, and some of the intermediate steps which might clarify things a bit.
The cycle clause probably shouldn’t be necessary here, as there are no cycles (as far as I can see) and it evaluates to N for all rows. But it errors without it, throwing “ORA-32044: cycle detected while executing recursive WITH query”. This may be related to ANSI behaviour, as in this recent question.