Skip to content
Advertisement

SQL clean History repeated intermediate values

I’ve noticed that my table has multiple redundant values that should be cleaned, it’s a table that records price changes, so I would like to clean the following way:

product | price | date
------------------------
1       | 1     | 1
1       | 1     | 2
1       | 1     | 3
1       | 1     | 4
2       | 77    | 5
1       | 1     | 6
1       | 2     | 7
1       | 2     | 8
1       | 1     | 9
1       | 1     | 10
1       | 1     | 11
1       | 1     | 12
1       | 3     | 13

To this:

product | price | date
------------------------
1       | 1     | 1
2       | 77    | 5
1       | 2     | 7
1       | 1     | 9
1       | 3     | 13

Also assume that in this case the column id is the same as date.

SELECT DISTINCT ON (product, price) won’t work because it would ignore either the product 1 change on day 9 or 1, The problem is that I want to group by product,price but only in certain intervals according to relevant changes by date.

Even it’s possible to order the product it’s hard to ignore date and price change order.

The objective is to delete all ids that are not in the intended result table.

Does anyone have any suggestions?

Advertisement

Answer

This is a gaps-and-islands problem, where you want to group together adjacent rows of the same product having the same price.

Here is one approach using the difference between row numbers to define the groups

select product, price, min(date) date
from (
    select 
        t.*,
        row_number() over(partition by product order by date) rn1,
        row_number() over(partition by product, price order by date) rn2
    from mytable t
) t
group by product, price, rn1 - rn2
order by min(date)

Demo on DB Fiddle:

product | price | date
------: | ----: | ---:
      1 |     1 |    1
      2 |    77 |    5
      1 |     2 |    7
      1 |     1 |    9
      1 |     3 |   13
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement