The schema
Simplified, a sales table with non-unique sales IDs. Within each ID there will be a sale of an element S, which can be followed by a sale of up to two elements L linked to the element S. For example:
CREATE TABLE Sales ([id] int, [element] varchar(2)) ; INSERT INTO Sales ([id], [element]) VALUES (100, 'S'), (100, 'S'), (100, 'L'), (100, 'L'), (100, 'S'), (100, 'L'), (101, 'S'), (101, 'L'), (101, 'L'), (101, 'S'), (101, 'L') ;
The expected outcome
Pivot the sales data with the expected outcome being:
id | S element | L1 | L1 ---+-----------+------+---- 100| S | | 100| S | L | L 100| S | L | 101| S | L | 101| S | L | L
My attempt
I’ve attempted to use ROW_NUMBER() OVER
to count instances of L-elements assigned to each S-element but I don’t know how to reset the count after each new S-element. My query
SELECT ROW_NUMBER() OVER (Order by Id) AS rows, id, element, CASE element WHEN 'S' THEN 0 ELSE ROW_NUMBER() OVER (PARTITION BY element ORDER BY id) END AS cnt FROM Sales ORDER BY rows
Returns
rs id el cnt 1 100 S 0 2 100 S 0 3 100 L 1 4 100 L 2 5 100 S 0 6 100 L 3 7 101 S 0 8 101 L 4 9 101 S 0 10 101 L 5
Where I would like to see instead something like:
rs id el cnt 1 100 S 0 2 100 S 0 3 100 L 1 4 100 L 2 5 100 S 0 6 100 L 1 7 101 S 0 8 101 L 1 9 101 S 0 10 101 L 1
Which could be pivoted using the value in the cnt column.
Any help with the problem, or alternative approaches!, would be appreciated.
Advertisement
Answer
You really need something else than just non-unique ID
to order by. Having just these two columns id
and element
is not enough. Tables in SQL don’t have any intrinsic order that you could rely on.
The code below that populates a temp table generates IDENTITY
values in arbitrary order. In this specific example IDENTITY
values happen to be generated in the same order as rows are written in the VALUES
clause, but it is not guaranteed at all. This behaviour is undefined. You should never rely on this. Your data must have some sort of RowID
.
Sample data
create table #SalesWithId ( rowId int identity(1,1) , salesId int not null , element char(1) not null ) insert into #SalesWithId ( salesId, [element]) values (100, 'S'), (100, 'S'), (100, 'L'), (100, 'L'), (100, 'S'), (100, 'L'), (101, 'S'), (101, 'L'), (101, 'L'), (101, 'S'), (101, 'L');
Query
Once you do have some unique RowID
to order by, the query becomes a variation of gaps-and-islands. Your code was very close. You just need to subtract two sets of row-numbers.
WITH CTE AS ( SELECT * ,ROW_NUMBER() OVER (PARTITION BY SalesID ORDER BY RowID) AS rn1 ,ROW_NUMBER() OVER (PARTITION BY SalesID, Element ORDER BY RowID) AS rn2 FROM #SalesWithId ) SELECT * ,rn1-rn2 as GroupID ,CASE element WHEN 'S' THEN 0 ELSE ROW_NUMBER() OVER (PARTITION BY SalesID, Element, rn1-rn2 ORDER BY RowID) END AS Cnt FROM CTE ORDER BY RowID; DROP TABLE #SalesWithId;
Result
+-------+---------+---------+-----+-----+---------+-----+ | rowId | salesId | element | rn1 | rn2 | GroupID | Cnt | +-------+---------+---------+-----+-----+---------+-----+ | 1 | 100 | S | 1 | 1 | 0 | 0 | | 2 | 100 | S | 2 | 2 | 0 | 0 | | 3 | 100 | L | 3 | 1 | 2 | 1 | | 4 | 100 | L | 4 | 2 | 2 | 2 | | 5 | 100 | S | 5 | 3 | 2 | 0 | | 6 | 100 | L | 6 | 3 | 3 | 1 | | 7 | 101 | S | 1 | 1 | 0 | 0 | | 8 | 101 | L | 2 | 1 | 1 | 1 | | 9 | 101 | L | 3 | 2 | 1 | 2 | | 10 | 101 | S | 4 | 2 | 2 | 0 | | 11 | 101 | L | 5 | 3 | 2 | 1 | +-------+---------+---------+-----+-----+---------+-----+