Skip to content
Advertisement

Merge tables in R and update rows where dates overlap

I hope this makes sense – it’s my first post here so I’m sorry if the question is badly formed.

I have tables OldData and NewData:

I need merge these tables as below. Where IDs match, dates overlap, and Priority is higher in NewData, I need to update the dates in OldData to reflect NewData.

I first tried to run nested for loops through each table, matching criteria and making changes one at a time, but I’m sure there is a much better way. e.g. possibly using sql in r?

Advertisement

Answer

In general, I interpret this to be an rbind operation with some cleanup: per-ID, if there is any overlap in the date ranges, then the lower-priority date range is truncated to match. Though not shown in the data, if you have situations where two higher-priority rows may completely negate a middle row, then you might need to add to the logic (it might then turn into an iterative process).

tidyverse

data.table

I am using magrittr here in order to break out the flow in a readable fashion, but it is not required. If you’re comfortable with data.table by itself, then translating from the magrittr::%>% to a native data.table piping should be straight-forward.

Also, I am using as.data.table instead of the often-preferred side-effect setDT, primarily so that you don’t use it on your production frame and not realize that many data.frame operations in R (on those two frames) now behave somewhat differently. If you’re up for using data.table, then feel free to step around this precaution.


Data:

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement