I am facing a complex SQL query in some code, which is suppose to return products without duplicates (by the use of DISTINCT keywork at the beginning), here is the query:
SELECT DISTINCT p.`id_product`, p.*, product_shop.*, pl.* , m.`name` AS manufacturer_name, x.`id_feature` , x.`id_feature_value` , s.`name` AS supplier_name FROM `ps_product` p INNER JOIN ps_product_shop product_shop ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1) LEFT JOIN `ps_product_attribute` y ON (y.`id_product` = p.`id_product`) LEFT JOIN `ps_product_attribute_combination` ac ON (y.`id_product_attribute` = ac.`id_product_attribute`) LEFT JOIN `ps_product_lang` pl ON (p.`id_product` = pl.`id_product` AND pl.id_shop = 1 ) LEFT JOIN `ps_manufacturer` m ON (m.`id_manufacturer` = p.`id_manufacturer`) LEFT JOIN `ps_feature_product` x ON (x.`id_product` = p.`id_product`) LEFT JOIN `ps_supplier` s ON (s.`id_supplier` = p.`id_supplier`) LEFT JOIN `ps_category_product` c ON (c.`id_product` = p.`id_product`) WHERE pl.`id_lang` = 1 AND c.`id_category` = 18 AND p.`price` between 0 and 1000 AND product_shop.`visibility` IN ("both", "catalog") AND product_shop.`active` = 1 ORDER BY p.`id_product` ASC LIMIT 1,4
But it returns 4 product with 2 products with same “id_product” (11941)
What I need is to return 4 products but of different ids each.
Anyone ?
Thanks a lot
Aymeric
[EDIT]
The result of this query shows 4 rows, with 2 having the same exact columns values EXCEPT for the id_feature_value column which 36 for one and 38 for the other.
Advertisement
Answer
SELECT DISTINCT
gets all the distinct combinations of all selected fields in your query, not just the first field.
Now, you could solve that by using GROUP BY to select only distinct values of id_product specifically, like:
SELECT p.`id_product`, p.*, product_shop.*, pl.* , m.`name` AS manufacturer_name, x.`id_feature` , x.`id_feature_value` , s.`name` AS supplier_name FROM `ps_product` p INNER JOIN ps_product_shop product_shop ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1) LEFT JOIN `ps_product_attribute` y ON (y.`id_product` = p.`id_product`) LEFT JOIN `ps_product_attribute_combination` ac ON (y.`id_product_attribute` = ac.`id_product_attribute`) LEFT JOIN `ps_product_lang` pl ON (p.`id_product` = pl.`id_product` AND pl.id_shop = 1 ) LEFT JOIN `ps_manufacturer` m ON (m.`id_manufacturer` = p.`id_manufacturer`) LEFT JOIN `ps_feature_product` x ON (x.`id_product` = p.`id_product`) LEFT JOIN `ps_supplier` s ON (s.`id_supplier` = p.`id_supplier`) LEFT JOIN `ps_category_product` c ON (c.`id_product` = p.`id_product`) WHERE pl.`id_lang` = 1 AND c.`id_category` = 18 AND p.`price` between 0 and 1000 AND product_shop.`visibility` IN ("both", "catalog") AND product_shop.`active` = 1 GROUP BY p.`id_product` ORDER BY p.`id_product` ASC LIMIT 1,4
However, the problem now is that your query has multiple different values of all the other fields you are selected to choose from, and no deterministic way to pick from them. Even though the id_product is unique in it’s table, it’s not unique in the result set because in at least one of your JOINs there is a one-to-many relationship, meaning there are several rows that match the JOIN conditions.
On older versions of MySQL, it will just pick the first value it finds in this case, but on SQL Server it will actually error out and tell you that the remaining fields either have to be mentioned in the GROUP BY clause, or they have to be aggregated. So, you’ve got a few ways you can go from here:
- You are on an old version of MySQL and you don’t particularly care which values are returned for the rest of the fields, so leave the query as I’ve posted and use that. I wouldn’t recommend this, as it’s undefined behaviour so in theory it could change at MySQL’s whim. All the values returned will be from the same result row though.
- Add aggregate functions, such as MIN() or MAX() to the rest of the remaining fields in the select clause. This will reduce the possible values for the fields down to one, but you will probably end up with a mixture of values from different rows.
- Remove any one-to-many JOINs from your query so that you only ever get one row back in the result set for each individual id_product. Then, fetch the remaining data you need in a separate query.
There may be other alternative solutions, but it depends a lot on which values you want returned for the rest of the rows and what RDBMS you are using. For example, on SQL Server you could potentially make use of PARTITION BY to select the first row for each distinct id_product deterministically.