Split records into buckets based on counts in reference table

Question

This is a simplified version, stripped down to my core problem. I have a ContactData table with millions of rows of data, with each contact record broken up into categories with a ReferenceID. I now have to assign a new UpdatedValue to each contact record, based on counts from the separate NewValues table also by ReferenceID. It doesn't matter which

Accepted Answer

Avoid using loops for such things. Windowed functions are very useful for these kind of issues.0 &#8211; I&#8217;ve created a sample data set using your code in a loop:declare @i int = 1while(@i <200)begin    INSERT INTO #ContactData (ReferenceID,FirstName)    VALUES (1,'John'), (1,'Mary'), (1,'Dan'), (2,'Sue'), (2,'Harvey'), (3,'Frank'), (3,'Mike')    set @i = @i + 1end1 &#8211; Calculate the max and min row values for each ReferenceID:select    f1.*    ,   sum(RecordTotal) over(partition by ReferenceID order by RowId asc) - RecordTotal minValue    ,   sum(RecordTotal) over(partition by ReferenceID order by RowId asc)  maxValuefrom #NewValues f12 &#8211; Then you need to count each ReferenceID ordering by any column:select    *,   sum(1) over(partition by ReferenceID order by RowId asc) rn    from #ContactData f13 &#8211; by using rn calculated in Step 2 you can assign your records to buckets dynamically. Here is the complete code:select    g1.*,   g2.UpdatedValue as Bucketfrom     (    select        *    ,   sum(1) over(partition by ReferenceID order by RowId asc) rn        from #ContactData f1    ) g1inner join     (    select        f1.*        ,   sum(RecordTotal) over(partition by ReferenceID order by RowId asc) - RecordTotal minValue        ,   sum(RecordTotal) over(partition by ReferenceID order by RowId asc)  maxValue    from #NewValues f1    ) g2 on g1.ReferenceID = g2.ReferenceID and g1.rn >= g2.minValue and g1.rn < g2.maxValue

Advertisement

Answer