I’ve noticed that my table has multiple redundant values that should be cleaned, it’s a table that records price changes, so I would like to clean the following way:
product | price | date ------------------------ 1 | 1 | 1 1 | 1 | 2 1 | 1 | 3 1 | 1 | 4 2 | 77 | 5 1 | 1 | 6 1 | 2 | 7 1 | 2 | 8 1 | 1 | 9 1 | 1 | 10 1 | 1 | 11 1 | 1 | 12 1 | 3 | 13
To this:
product | price | date ------------------------ 1 | 1 | 1 2 | 77 | 5 1 | 2 | 7 1 | 1 | 9 1 | 3 | 13
Also assume that in this case the column id
is the same as date
.
SELECT DISTINCT ON (product, price)
won’t work because it would ignore either the product
1
change on day 9
or 1
,
The problem is that I want to group by product
,price
but only in certain intervals according to relevant changes by date
.
Even it’s possible to order the product
it’s hard to ignore date
and price
change order.
The objective is to delete all ids that are not in the intended result table.
Does anyone have any suggestions?
Advertisement
Answer
This is a gaps-and-islands problem, where you want to group together adjacent rows of the same product having the same price.
Here is one approach using the difference between row numbers to define the groups
select product, price, min(date) date from ( select t.*, row_number() over(partition by product order by date) rn1, row_number() over(partition by product, price order by date) rn2 from mytable t ) t group by product, price, rn1 - rn2 order by min(date)
product | price | date ------: | ----: | ---: 1 | 1 | 1 2 | 77 | 5 1 | 2 | 7 1 | 1 | 9 1 | 3 | 13