I need to split a string like this
RANDOM(NUL)THIS_SHOULD_BE_SPLIT(NUL)~THIS_IS_NEW(NUL)STRING(NUL)~THIS_IS_ANOTHER_STRING(NUL)
(NUL)
means a character with ASCII value 0
. Don’t know how to type it here.
I need to split this when (NUL)~
occurs.
So the expected result after split is
RANDOM(NUL)THIS_SHOULD_BE_SPLIT
THIS_IS_NEW(NUL)STRING
THIS_IS_ANOTHER_STRING(NUL)
Using OracleDB not sure how it handles NULL characters and if it is possible to split it.
To split it using ~ I am using
SELECT REGEXP_SUBSTR(stringval,'[^~]+', 1, LEVEL) AS error_code FROM DUAL CONNECT BY REGEXP_SUBSTR(stringval, '[^~]+',1,LEVEL) IS NOT NULL;
Have no idea on how to incorporate NULL character here.
Advertisement
Answer
You cannot directly match the ASCII 0 (NUL)
character in a regular expression as the regular expression parser will treat an ASCII 0 (NULL)
character as a string terminator and end the regular expression pattern rather than using it as a character to match. So, if you want to use regular expressions, you will need to replace all occurrences of the ASCII 0 (NUL)
character with a placeholder string and then match that when splitting and then restore the ASCII 0 (NUL)
characters that were not split upon afterwards. But you will need to ensure that your placeholder is never going to occur elsewhere in your string.
Instead, you can use CHR(0)
to match the NUL
character. This method uses simple string functions (rather than slow regular expressions that cannot directly handle ASCII 0 (NUL)
characters) and can handle multiple input rows:
WITH data ( value ) AS ( SELECT 'RANDOM' || CHR(0) || 'THIS_SHOULD_BE_SPLIT' || CHR(0) || '~THIS_IS_NEW' || CHR(0) || 'STRING' || CHR(0) || '~THIS_IS_ANOTHER_STRING' || CHR(0) FROM DUAL UNION ALL SELECT '12345' || CHR(0) || '67890' || CHR(0) || '~23456' FROM DUAL ), split_positions ( value, start_pos, end_pos ) AS ( SELECT value, 1, INSTR( value, CHR(0) || '~' ) FROM data UNION ALL SELECT value, end_pos + 2, INSTR( value, CHR(0) || '~', end_pos + 2 ) FROM split_positions WHERE end_pos > 0 ) SELECT CASE end_pos WHEN 0 THEN SUBSTR( value, start_pos ) ELSE SUBSTR( value, start_pos, end_pos - start_pos ) END AS value FROM split_positions;
Which outputs:
| VALUE | | :------------------------------ | | RANDOM(NUL)THIS_SHOULD_BE_SPLIT | | 12345(NUL)67890 | | THIS_IS_NEW(NUL)STRING | | 23456 | | THIS_IS_ANOTHER_STRING(NUL) |
(Note: the NUL
character has again been replaced by (NUL)
string in the output as db<>fiddle did not like displaying that character.)
db<>fiddle here