In a sense I’ve answered my own question, but I’m trying to understand the answer better:
When using regexp_substr (in oracle) to extract the first occurrence of a number (either single or multi digits), how/why do the modifiers *
and +
impact the results? Why does +
provide the behavior I’m looking for and *
does not? *
is my default usage in most regular expressions so I was surprised it didn’t suit my need.
For example, in the following:
select test, regexp_substr(TEST,'d') Pattern1, regexp_substr(TEST,'d*') Pattern2, regexp_substr(TEST,'d+') Pattern3 from ( select '123 W' TEST from dual union select 'W 123' TEST from dual );
the use of regexp_substr(TEST,'d*')
returns null value for the input “W 123” – since ‘zero or more’ digits exist in the string, I’m confused by this behavior. I’m also confused why it does work on the string ‘123 W’
my understanding is that *
means zero or more occurrences of the element it follows and +
means 1 or more occurrence of the preceding element. In the example provided for pattern2 [d*]
why does it successfully capture “123” from “123 W” but it does not take 123 from “W 123” as zero or more occurrences of a digit do exist, they just don’t exist in the beginning of the string. Is there additional [implied] logic attached to using *
?
Note: I looked around for a while trying to find similar questions that helped me capture the ‘123’ from ‘W 123’ but the closest i found was variations of regexp_replace which would not meet my needs.
Advertisement
Answer
So the regexp_count indicates there are FOUR substrings that match the d* pattern. The third of those is the ‘123’. The implication is that the first and second are derived from the W and space and what you have is a zero length result that ‘consumes’ one character of the source string.
select test, regexp_count(TEST,'d*') Pattern2_c, regexp_substr(TEST,'d*') Pattern2, regexp_substr(TEST,'d*',1,1) Pattern2_1, regexp_substr(TEST,'d*',1,2) Pattern2_2, regexp_substr(TEST,'d*',1,3) Pattern2_3, regexp_substr(TEST,'d*',1,4) Pattern2_4 from (select '123 W' TEST from dual union select 'W 123' TEST from dual );
Oracle has a weird thing about zero length strings and null.
The result doesn’t “feel” right, but then if you ask a computer deep philosophical questions about how many zero length substrings are contained in a string, I wouldn’t bet on any answer.