tricky SQL with substrings

Question

I have a table (postgres) with a varchar field that has content structured like: The uuid can occur in more than one record. But it must not occur for more than one combination of [givenname];[surname], according to a business rule. That is, if the John Smith example above is present in the table, then if uuid 7c32e9e1.. occurs in any

Accepted Answer

Use regular expressions to extract the UUID and the name from the string. Then aggregate per UUID and either count distinct names or compare minimum and maximum name:select  substring(col, 'uuid=([[:alnum:]]+)') as uuid,  string_agg(distinct substring(col, '([[:alnum:]]+;[[:alnum:]]+);uuid'), ' | ') as namesfrom mytablegroup by substring(col, 'uuid=([[:alnum:]]+)')having count(distinct substring(col, '([[:alnum:]]+;[[:alnum:]]+);uuid')) > 1;Demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=907a283a754eb7427d4ffbf50c6f0028If you only want to count:select count(*) as cnt_uuids, sum(num_names) as cnt_names, sum(num_rows) as cnt_rowsfrom( select count(*) as num_rows, count(distinct substring(col, '([[:alnum:]]+;[[:alnum:]]+);uuid')) as num_names from mytable group by substring(col, 'uuid=([[:alnum:]]+)') having count(distinct substring(col, '([[:alnum:]]+;[[:alnum:]]+);uuid')) > 1) flaws;But as has been mentioned already: This is not how a database should be used.

Advertisement

Answer