Skip to content
Advertisement

Count as duplicates similar values

I have following tables:

    Orders 
    order_id
    9
    10
    11

    Order_details 
    order_id, product_id  
    9,        7    
    10,       5
    10,       6
    11,       6
    11,       7

    Products 
    product_id, product_name, price
    5,          potato,       4.99
    6,          potato *,     7.5
    7,          orange,       7.99

I already received feedback how to find orders where product names are duplicated, but now situation is a little more complicated as it turned out duplicated positions have additional sign ‘ *’ after it’s name as showed above.

How to add to this query possibility to count ONLY orders where there is one product without additional characters and other with it?

for example ‘potato’ and ‘potato’ would be ignored, ‘potato *’ and ‘potato *’ would be also ignored but order with ‘potato’ and ‘potato *’ would be in the results

select od.order_id
from order_details od join
     products p
     on od.product_id = p.product_id
group by od.order_id
having count(p.product_name) > count(distinct p.product_name)

Advertisement

Answer

One option might be to just do a simple replacement to remove * from the product name:

SELECT
    od.order_id
FROM order_details od
INNER JOIN products p
    ON od.product_id = p.product_id
GROUP BY
    od.order_id
HAVING
    COUNT(DISTINCT p.product_name) <>
    COUNT(DISTINCT REPLACE(p.product_name, ' *', ''));

Demo

The demo is given for MySQL, but the same query should run on at least several other databases.

Ideally, it would be better to do a regex replacement on the product name, which would avoid the possibility that space followed by * happens to appear elsewhere as a legitimate part of the product name.

Edit:

Since you are using Postgres we can in fact do a more targeted regex replacement:

SELECT
    od.order_id
FROM order_details od
INNER JOIN products p
    ON od.product_id = p.product_id
GROUP BY
    od.order_id
HAVING
    COUNT(DISTINCT p.product_name) <>
    COUNT(DISTINCT REGEXP_REPLACE(p.product_name, ' *$', ''));

Demo

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement