I have a reviews
table as follows:
r_id | comment |
---|---|
1 | Weight cannot exceed 40 kg |
2 | You must not make the weight go over 31 k.g |
3 | Don’t excel above 94kg |
4 | Optimal weight is 45 kg |
5 | Don’t excel above 62 kg |
6 | Weight cannot exceed 7000g |
What I want to select is the weight a r_id
‘s cannot exceed. So my desired output is
r_id | max weight |
---|---|
1 | 40 |
2 | 31 |
3 | 94 |
5 | 62 |
As you can see, r_id
4 wasn’t included since it wasn’t taking about the maximum weight and 6
wasn’t included because it is in grams. I am struggling with two things.
- There are multiple phrases, how can I do a
OR
operator check in my regex column. - Sometimes the
kg
number is written like40kg
,40 KG
,40 k.g
or40kilos
. While all things arekilograms
, thekg
is written in different ways. How can I only extract the number (but ensuring thekg
is written in one of the above ways, so I don’t accidentally extract something like4000g
.
SELECT r_id, REGEX_SUBSTR(REGEX_SUBSTR('cannot exceed [0-9]+ kg'), '[0-9]+ kg')) as "max weight" FROM reviews;
My statement only checks for one particular type of sentence and doesn’t check if the number is in kilograms.
Advertisement
Answer
You could just extract the number from the string. There only appears to be one and then check if the string looks like certain patterns:
select regexp_substr(comm, '[0-9]+') from reviews where regexp_like(comm, '(exceed|go over|above).*[0-9]+ ?(kg|k.g)');
Here is a db<>fiddle.