How does this SQL query return results with same id_product?

Question

I am facing a complex SQL query in some code, which is suppose to return products without duplicates (by the use of DISTINCT keywork at the beginning), here is the query: SELECT DISTINCT p.`...

Accepted Answer

SELECT DISTINCT gets all the distinct combinations of all selected fields in your query, not just the first field.Now, you could solve that by using GROUP BY to select only distinct values of id_product specifically, like:SELECT p.`id_product`, p.*, product_shop.*, pl.* , m.`name` AS manufacturer_name, x.`id_feature` , x.`id_feature_value`  , s.`name` AS supplier_nameFROM `ps_product` p    INNER JOIN ps_product_shop product_shop        ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1)    LEFT JOIN `ps_product_attribute` y ON (y.`id_product` = p.`id_product`)     LEFT JOIN `ps_product_attribute_combination` ac ON (y.`id_product_attribute` = ac.`id_product_attribute`)     LEFT JOIN `ps_product_lang` pl ON (p.`id_product` = pl.`id_product`  AND pl.id_shop = 1 )    LEFT JOIN `ps_manufacturer` m ON (m.`id_manufacturer` = p.`id_manufacturer`)    LEFT JOIN `ps_feature_product` x ON (x.`id_product` = p.`id_product`)    LEFT JOIN `ps_supplier` s ON (s.`id_supplier` = p.`id_supplier`)    LEFT JOIN `ps_category_product` c ON (c.`id_product` = p.`id_product`)WHERE pl.`id_lang` = 1 AND c.`id_category` = 18 AND  p.`price` between 0 and 1000  AND product_shop.`visibility` IN ("both", "catalog") AND product_shop.`active` = 1GROUP BY p.`id_product`ORDER BY p.`id_product` ASC LIMIT 1,4However, the problem now is that your query has multiple different values of all the other fields you are selected to choose from, and no deterministic way to pick from them. Even though the id_product is unique in it&#8217;s table, it&#8217;s not unique in the result set because in at least one of your JOINs there is a one-to-many relationship, meaning there are several rows that match the JOIN conditions.On older versions of MySQL, it will just pick the first value it finds in this case, but on SQL Server it will actually error out and tell you that the remaining fields either have to be mentioned in the GROUP BY clause, or they have to be aggregated. So, you&#8217;ve got a few ways you can go from here:You are on an old version of MySQL and you don&#8217;t particularly care which values are returned for the rest of the fields, so leave the query as I&#8217;ve posted and use that. I wouldn&#8217;t recommend this, as it&#8217;s undefined behaviour so in theory it could change at MySQL&#8217;s whim. All the values returned will be from the same result row though.Add aggregate functions, such as MIN() or MAX() to the rest of the remaining fields in the select clause. This will reduce the possible values for the fields down to one, but you will probably end up with a mixture of values from different rows.Remove any one-to-many JOINs from your query so that you only ever get one row back in the result set for each individual id_product. Then, fetch the remaining data you need in a separate query.There may be other alternative solutions, but it depends a lot on which values you want returned for the rest of the rows and what RDBMS you are using. For example, on SQL Server you could potentially make use of PARTITION BY to select the first row for each distinct id_product deterministically.

Advertisement

Answer