Skip to content
Advertisement

Split records into buckets based on counts in reference table

This is a simplified version, stripped down to my core problem. I have a ContactData table with millions of rows of data, with each contact record broken up into categories with a ReferenceID. I now have to assign a new UpdatedValue to each contact record, based on counts from the separate NewValues table also by ReferenceID. It doesn’t matter which records are assigned to each group, it could be random or otherwise, as long as the count of records per group is correct.

To use an example from below to illustrate: If there are 800 records in #ContactData with ReferenceID=1, then using the RecordTotal counts in #NewValues, I want to assign 200 to Group1, 350 to Group2 and 250 to Group3.

I could do this with nested loops and updates. But the data in NewValues will be changing regularly and the resulting assignment of Groups to Contacts will change with it. Plus the resulting updated contact data will be dumped into a separate third table rather than updating the original ContactData table. So I’m hoping there is any easier way to assign this value on the fly while selecting the data into the third table. Below are the sample tables and data to illustrate. Any help would be greatly appreciated.

Advertisement

Answer

Avoid using loops for such things. Windowed functions are very useful for these kind of issues.

0 – I’ve created a sample data set using your code in a loop:

1 – Calculate the max and min row values for each ReferenceID:

2 – Then you need to count each ReferenceID ordering by any column:

3 – by using rn calculated in Step 2 you can assign your records to buckets dynamically. Here is the complete code:

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement