Skip to content
Advertisement

How to regex match multiple items

I have a reviews table as follows:

r_id comment
1 Weight cannot exceed 40 kg
2 You must not make the weight go over 31 k.g
3 Don’t excel above 94kg
4 Optimal weight is 45 kg
5 Don’t excel above 62 kg
6 Weight cannot exceed 7000g

What I want to select is the weight a r_id‘s cannot exceed. So my desired output is

r_id max weight
1 40
2 31
3 94
5 62

As you can see, r_id 4 wasn’t included since it wasn’t taking about the maximum weight and 6 wasn’t included because it is in grams. I am struggling with two things.

  1. There are multiple phrases, how can I do a OR operator check in my regex column.
  2. Sometimes the kg number is written like 40kg, 40 KG, 40 k.g or 40kilos. While all things are kilograms, the kg is written in different ways. How can I only extract the number (but ensuring the kg is written in one of the above ways, so I don’t accidentally extract something like 4000g.

SELECT
  r_id,
  REGEX_SUBSTR(REGEX_SUBSTR('cannot exceed [0-9]+ kg'), '[0-9]+ kg')) as "max weight"
FROM reviews;

My statement only checks for one particular type of sentence and doesn’t check if the number is in kilograms.

Advertisement

Answer

You could just extract the number from the string. There only appears to be one and then check if the string looks like certain patterns:

select regexp_substr(comm, '[0-9]+')
from reviews
where regexp_like(comm, '(exceed|go over|above).*[0-9]+ ?(kg|k.g)');

Here is a db<>fiddle.

Advertisement