Skip to content
Advertisement

Gaps And Islands: Splitting Islands Based On External Table

My scenario started off similar to a Island and Gaps problem, where I needed to find consecutive days of work. My current SQL query answers “ProductA was produced at LocationA from DateA through DateB, totaling X quantity”.

However, this does not suffice when I needed to throw prices into the mix. Prices are in a separate table and handled in C# after the fact. Price changes are essentially a list of records that say “ProductA from LocationA is now Y value per unit effective DateC”.

The end result is it works as long as the island does not overlap with a price-change date, but if it does overlap, I get a “close” answer, but it’s not precise.

The C# code can handle applying the prices efficiently, what I need to do though is split the islands based on price changes. My goal is to make the SQL’s partioning take into account the ranking of days from the other table, but I’m having trouble applying what I want to do.


The current SQL that generates my island is as follows

SELECT MIN(ScheduledDate) as StartDate, MAX(ScheduledDate) as 
EndDate, ProductId, DestinationId, SUM(Quantity) as TotalQuantity
FROM (
    SELECT ScheduledDate, DestinationId, ProductId, PartitionGroup = DATEADD(DAY ,-1 * DENSE_RANK() OVER (ORDER BY ScheduledDate), ScheduledDate), Quantity
    FROM History
) tmp
GROUP BY PartitionGroup, DestinationId, ProductId;

The current SQL that takes from the PriceChange table and ranks the dates is as follows

DECLARE @PriceChangeDates TABLE(Rank int, SplitDate Date);
INSERT INTO @PriceChangeDates
SELECT DENSE_RANK() over (ORDER BY EffectiveDate) as Rank, EffectiveDate as SplitDate
FROM ProductPriceChange
GROUP BY EffectiveDate;

My thought is to somehow update the first queries inner SELECT statement to somehow take advantage of the @PriceChangeDates table created by the second query. I would think we can multiply the DATEADD’s increment parameter by the rank from the declared table, but I am struggling to write it.

If I was to somehow do this with loops, my thought process would be to determine which rank the ScheduledDate would be from the @PriceChangeDates table, where its rank is the rank of the closest Date that is smaller than itself it can find. Then take whatever rank that gives and, I would think, multiply it by the increment parameter being passed in (or some math, for example doing a *@PriceChangeDates.Count() on the existing parameter and then adding in the new rank to avoid collisions). However, that’s “loop” logic not “set” logic, and in SQL I need to think in sets.


Any and all help/advice is greatly appreciated. Thank you 🙂


UPDATE:

Sample data & example on SQLFiddle: http://www.sqlfiddle.com/#!18/af568/1

Where the data is:

CREATE TABLE History
(
ProductId int,
DestinationId int,
ScheduledDate date,
Quantity float
);

INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
  (0, 1000, '20180401', 5),
  (0, 1000, '20180402', 10),
  (0, 1000, '20180403', 7),
  (3, 5000, '20180507', 15),
  (3, 5000, '20180508', 23),
  (3, 5000, '20180509', 52),
  (3, 5000, '20180510', 12),
  (3, 5000, '20180511', 14);

CREATE TABLE PriceChange
(
  ProductId int,
  DestinationId int,
  EffectiveDate date,
  Price float
);

INSERT INTO PriceChange (ProductId, DestinationId, EffectiveDate, Price)
VALUES
  (0, 1000, '20180201', 1),
  (0, 1000, '20180402', 2),
  (3, 5000, '20180101', 5),
  (3, 5000, '20180510', 20);

The desired results would be to have a SQL statement that generates the result:

StartDate   EndDate     ProductId   DestinationId   TotalQuantity
2018-04-01  2018-04-01  0           1000            5
2018-04-02  2018-04-03  0           1000            17
2018-05-07  2018-05-09  3           5000            90
2018-05-10  2018-05-11  3           5000            26

To clarify, the end result does need the TotalQuantity of each split amount, so the procedural code that manipulates the results and applies the pricing knows how much of each product was one on each side of the price change to accurately determine the values.

Advertisement

Answer

The straight-forward method is to fetch the effective price for each row of History and then generate gaps and islands taking price into account.

It is not clear from the question what is the role of DestinationID. Sample data is of no help here. I’ll assume that we need to join and partition on both ProductID and DestinationID.

The following query returns effective Price for each row from History. You need to add index to the PriceChange table

CREATE NONCLUSTERED INDEX [IX] ON [dbo].[PriceChange]
(
    [ProductId] ASC,
    [DestinationId] ASC,
    [EffectiveDate] DESC
)
INCLUDE ([Price])

for this query to work efficiently.

Query for Prices

SELECT
    History.ProductId
    ,History.DestinationId
    ,History.ScheduledDate
    ,History.Quantity
    ,A.Price
FROM
    History
    OUTER APPLY
    (
        SELECT TOP(1)
            PriceChange.Price
        FROM
            PriceChange
        WHERE
            PriceChange.ProductID = History.ProductID
            AND PriceChange.DestinationId = History.DestinationId
            AND PriceChange.EffectiveDate <= History.ScheduledDate
        ORDER BY
            PriceChange.EffectiveDate DESC
    ) AS A
ORDER BY ProductID, ScheduledDate;

For each row from History there will be one seek in this index to pick the correct price.

This query returns:

Prices

+-----------+---------------+---------------+----------+-------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price |
+-----------+---------------+---------------+----------+-------+
|         0 |          1000 | 2018-04-01    |        5 |     1 |
|         0 |          1000 | 2018-04-02    |       10 |     2 |
|         0 |          1000 | 2018-04-03    |        7 |     2 |
|         3 |          5000 | 2018-05-07    |       15 |     5 |
|         3 |          5000 | 2018-05-08    |       23 |     5 |
|         3 |          5000 | 2018-05-09    |       52 |     5 |
|         3 |          5000 | 2018-05-10    |       12 |    20 |
|         3 |          5000 | 2018-05-11    |       14 |    20 |
+-----------+---------------+---------------+----------+-------+

Now a standard gaps-and-island step to collapse consecutive days with the same price together. I use a difference of two row number sequences here.

I’ve added some more rows to your sample data to see the gaps within the same ProductId.

INSERT INTO History (ProductId, DestinationId, ScheduledDate, Quantity)
VALUES
  (0, 1000, '20180601', 5),
  (0, 1000, '20180602', 10),
  (0, 1000, '20180603', 7),
  (3, 5000, '20180607', 15),
  (3, 5000, '20180608', 23),
  (3, 5000, '20180609', 52),
  (3, 5000, '20180610', 12),
  (3, 5000, '20180611', 14);

If you run this intermediate query you’ll see how it works:

WITH
CTE_Prices
AS
(
    SELECT
        History.ProductId
        ,History.DestinationId
        ,History.ScheduledDate
        ,History.Quantity
        ,A.Price
    FROM
        History
        OUTER APPLY
        (
            SELECT TOP(1)
                PriceChange.Price
            FROM
                PriceChange
            WHERE
                PriceChange.ProductID = History.ProductID
                AND PriceChange.DestinationId = History.DestinationId
                AND PriceChange.EffectiveDate <= History.ScheduledDate
            ORDER BY
                PriceChange.EffectiveDate DESC
        ) AS A
)
,CTE_rn
AS
(
    SELECT
        ProductId
        ,DestinationId
        ,ScheduledDate
        ,Quantity
        ,Price
        ,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
        ,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
    FROM
        CTE_Prices
)
SELECT *
    ,rn2-rn1 AS Diff
FROM CTE_rn

Intermediate result

+-----------+---------------+---------------+----------+-------+-----+------+------+
| ProductId | DestinationId | ScheduledDate | Quantity | Price | rn1 | rn2  | Diff |
+-----------+---------------+---------------+----------+-------+-----+------+------+
|         0 |          1000 | 2018-04-01    |        5 |     1 |   1 | 6665 | 6664 |
|         0 |          1000 | 2018-04-02    |       10 |     2 |   1 | 6666 | 6665 |
|         0 |          1000 | 2018-04-03    |        7 |     2 |   2 | 6667 | 6665 |
|         0 |          1000 | 2018-06-01    |        5 |     2 |   3 | 6726 | 6723 |
|         0 |          1000 | 2018-06-02    |       10 |     2 |   4 | 6727 | 6723 |
|         0 |          1000 | 2018-06-03    |        7 |     2 |   5 | 6728 | 6723 |
|         3 |          5000 | 2018-05-07    |       15 |     5 |   1 | 6701 | 6700 |
|         3 |          5000 | 2018-05-08    |       23 |     5 |   2 | 6702 | 6700 |
|         3 |          5000 | 2018-05-09    |       52 |     5 |   3 | 6703 | 6700 |
|         3 |          5000 | 2018-05-10    |       12 |    20 |   1 | 6704 | 6703 |
|         3 |          5000 | 2018-05-11    |       14 |    20 |   2 | 6705 | 6703 |
|         3 |          5000 | 2018-06-07    |       15 |    20 |   3 | 6732 | 6729 |
|         3 |          5000 | 2018-06-08    |       23 |    20 |   4 | 6733 | 6729 |
|         3 |          5000 | 2018-06-09    |       52 |    20 |   5 | 6734 | 6729 |
|         3 |          5000 | 2018-06-10    |       12 |    20 |   6 | 6735 | 6729 |
|         3 |          5000 | 2018-06-11    |       14 |    20 |   7 | 6736 | 6729 |
+-----------+---------------+---------------+----------+-------+-----+------+------+

Now simply group by the Diff to get one row per interval.

Final query

WITH
CTE_Prices
AS
(
    SELECT
        History.ProductId
        ,History.DestinationId
        ,History.ScheduledDate
        ,History.Quantity
        ,A.Price
    FROM
        History
        OUTER APPLY
        (
            SELECT TOP(1)
                PriceChange.Price
            FROM
                PriceChange
            WHERE
                PriceChange.ProductID = History.ProductID
                AND PriceChange.DestinationId = History.DestinationId
                AND PriceChange.EffectiveDate <= History.ScheduledDate
            ORDER BY
                PriceChange.EffectiveDate DESC
        ) AS A
)
,CTE_rn
AS
(
    SELECT
        ProductId
        ,DestinationId
        ,ScheduledDate
        ,Quantity
        ,Price
        ,ROW_NUMBER() OVER (PARTITION BY ProductId, DestinationId, Price ORDER BY ScheduledDate) AS rn1
        ,DATEDIFF(day, '20000101', ScheduledDate) AS rn2
    FROM
        CTE_Prices
)
SELECT
    ProductId
    ,DestinationId
    ,MIN(ScheduledDate) AS StartDate
    ,MAX(ScheduledDate) AS EndDate
    ,SUM(Quantity) AS TotalQuantity
    ,Price
FROM
    CTE_rn
GROUP BY
    ProductId
    ,DestinationId
    ,Price
    ,rn2-rn1
ORDER BY
    ProductID
    ,DestinationId
    ,StartDate
;

Final result

+-----------+---------------+------------+------------+---------------+-------+
| ProductId | DestinationId | StartDate  |  EndDate   | TotalQuantity | Price |
+-----------+---------------+------------+------------+---------------+-------+
|         0 |          1000 | 2018-04-01 | 2018-04-01 |             5 |     1 |
|         0 |          1000 | 2018-04-02 | 2018-04-03 |            17 |     2 |
|         0 |          1000 | 2018-06-01 | 2018-06-03 |            22 |     2 |
|         3 |          5000 | 2018-05-07 | 2018-05-09 |            90 |     5 |
|         3 |          5000 | 2018-05-10 | 2018-05-11 |            26 |    20 |
|         3 |          5000 | 2018-06-07 | 2018-06-11 |           116 |    20 |
+-----------+---------------+------------+------------+---------------+-------+
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement