Skip to content
Advertisement

Oracle SQL – Splitting string every 70 characters without splitting a word and deriving sequence number

I currently working on a table on Oracle and I need to split one its field (RH02X) into multiple rows when it the field exceeds 70 characters but I also need to make sure that no word would be cut. I am only allowed to use SQL without PL/SQL.

For example, I have the following data from the table test_remarks

REFNO, RH01D, RH02X, RH03X
0001156, 08-DEC-10, Document delete Description: FW Test Payment DHIS ID: 10616132 DOCL.source: EMAIL DOCL.sourcid: 10616132 DOCL.imageid: 26724898, DOCD
0001156, 08-DEC-10, Document delete Description: RE Test Payment DHIS ID: 10619470 DOCL.source: EMAIL DOCL.sourcid: 10619470 DOCL.imageid: 26733538, DOCD

What I need to accomplish looks like the table below.

REFNO, RH01D, RH02X, RH03X, SEQNO, LENGTH_RH02X
0001156, 08-DEC-10, Document delete Description: FW Test Payment DHIS ID: 10616132, DOCD, 1, 62
0001156, 08-DEC-10, DOCL.source: EMAIL DOCL.sourcid: 10616132 DOCL.imageid: 26724898, DOCD, 2, 64
0001156, 08-DEC-10, Document delete Description: RE Test Payment DHIS ID: 10619470, DOCD, 3, 62
0001156, 08-DEC-10, DOCL.source: EMAIL DOCL.sourcid: 10619470 DOCL.imageid: 26733538, DOCD, 4, 64

From the expected rows, as you can see, I also need to create a sequence for every unique occurrence of REFNO, RH01D and RH03X.

On my query, I am able to split the characters of the RH02X field by 70 but I can’t do the logical part of the code wherein there should be no word that be cut when doing the split and assigning the correct sequence number when all the fields being used for seqno derivation is same.

I am using the code below:

SELECT y.*, length(y.rh02x) AS length_rh02x FROM
(
    SELECT 
    x.refno,
    x.rh01d,
    SUBSTR(x.rh02x, 70 * (lvl.column_value - 1) + 1, 70) AS rh02x,
    x.rh03x,
    lvl.column_value
    FROM test_remarks x
    CROSS JOIN TABLE(CAST(MULTISET(SELECT LEVEL FROM dual
        CONNECT BY LEVEL <= CEIL(LENGTH(x.rh02x)/70)) AS sys.odcinumberlist)) lvl
) y;
REFNO RH01D RH02X RH03X SEQNO LENGTH_RH02X
0001156 08-DEC-10 Document delete Description: FW Test Payment DHIS ID: 10616132 DOCL.so DOCD 1 70
0001156 08-DEC-10 urce: EMAIL DOCL.sourcid: 10616132 DOCL.imageid: 26724898 DOCD 2 57
0001156 08-DEC-10 Document delete Description: RE Test Payment DHIS ID: 10619470 DOCL.so DOCD 1 70
0001156 08-DEC-10 urce: EMAIL DOCL.sourcid: 10619470 DOCL.imageid: 26733538 DOCD 2 57

From my actual result above, you can see that ‘DOCL.source:’ was split between the 1st and 2nd (1st and 2nd row) (3rd and 4th). Hoping you could help me on this.

Advertisement

Answer

You can use recursive subquery factoring to identify the first point to split the string (if needed) in the anchor member, and pass the remainder of the string to the recursive member to repeat the process.

This uses instr() to find the last space character in the first 70 characters, as pos – if there is no space or the string is less than 70 characters that will be zero. The pos value is then used to split the string if it’s long enough and can be split, or take up to 70 characters otherwise.

with rcte (refno, rh01d, rh02x, rh03x, rn1, rn2, line_num, remaining) as (
  select refno,
    rh01d,
    substr(rh02x, 1, case when pos = 0 then 70 else pos - 1 end),
    rh03x,
    row_number() over (order by refno, rh01d, rh03x), -- preferably also some unique column?
    dense_rank() over (order by refno, rh01d, rh03x), -- preferably also some unique column?
    1,
    substr(rh02x, case when pos = 0 then 70 else pos end + 1)
  from test_remarks
  cross apply (
    select case when length(rh02x) <= 70 then 0
      else instr(substr(rh02x, 1, 70), ' ', -1) end as pos
    from dual
  )
  union all
  select refno,
    rh01d,
    substr(remaining, 1, case when pos = 0 then 70 else pos - 1 end),
    rh03x,
    rn1,
    rn2,
    line_num + 1,
    substr(remaining, case when pos = 0 then 70 else pos end + 1)
  from rcte
  cross apply (
    select case when length(remaining) <= 70 then 0
      else instr(substr(remaining, 1, 70), ' ', -1) end as pos
    from dual
  )
  where remaining is not null
)
cycle remaining set is_cycle to 'Y' default 'N'
select refno, rh01d, rh02x, rh03x,
  dense_rank() over (partition by rn2 order by rn1, line_num) as seq,
  length(rh02x) as length_rh02x, rn1, rn2, line_num -- for info only
from rcte
order by rn1, line_num;

Oracle 12c is required because of the cross apply; it will be (even) messier on earlier versions.

The recursive member is essentially doing the same as the anchor member, except the anchor is manipulating the original string from the table while the recursive is manipulating the remainder from the previous iteration.

The rn1, rn2 and line_num are used to track which original value is being referred to and which segment this is. They are set in the anchor member, and kept (rn1 and rn2) or incremented (line_num) in the recursive member. Those are used to order the output so related lines are together and in the same order; and also used to work out the seq value. I’ve left those values in the output so you can see what’s happening, but they don’t need to be included in your final query.

REFNO RH01D RH02X RH03X SEQ LENGTH_RH02X RN1 RN2 LINE_NUM
0001156 08-DEC-10 Document delete Description: FW Test Payment DHIS ID: 10616132 DOCD 1 62 1 1 1
0001156 08-DEC-10 DOCL.source: EMAIL DOCL.sourcid: 10616132 DOCL.imageid: 26724898 DOCD 2 64 1 1 2
0001156 08-DEC-10 Document delete Description: RE Test Payment DHIS ID: 10619470 DOCD 3 62 2 1 1
0001156 08-DEC-10 DOCL.source: EMAIL DOCL.sourcid: 10619470 DOCL.imageid: 26733538 DOCD 4 64 2 1 2

db<>fiddle with some extra data to show edge cases, and some of the intermediate steps which might clarify things a bit.


The cycle clause probably shouldn’t be necessary here, as there are no cycles (as far as I can see) and it evaluates to N for all rows. But it errors without it, throwing “ORA-32044: cycle detected while executing recursive WITH query”. This may be related to ANSI behaviour, as in this recent question.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement