Merge tables in R and update rows where dates overlap

Question

I hope this makes sense - it's my first post here so I'm sorry if the question is badly formed. I have tables OldData and NewData: I need merge these tables as below. Where IDs match, dates overlap, and Priority is higher in NewData, I need to update the dates in OldData to reflect NewData. I first tried to run

Accepted Answer

In general, I interpret this to be an rbind operation with some cleanup: per-ID, if there is any overlap in the date ranges, then the lower-priority date range is truncated to match. Though not shown in the data, if you have situations where two higher-priority rows may completely negate a middle row, then you might need to add to the logic (it might then turn into an iterative process).tidyverselibrary(dplyr)out_tidyverse <- bind_rows(OldData, NewData) %>% arrange(ID, DateFrom) %>% group_by(ID) %>% mutate( DateTo = if_else(row_number() < n() & DateTo > lead(DateFrom) & Priority < lead(Priority), lead(DateFrom), DateTo), DateFrom = if_else(row_number() > 1 & DateFrom < lag(DateTo) & Priority < lag(Priority), lag(DateTo), DateFrom) ) %>% ungroup()out_tidyverse# # A tibble: 6 x 4# ID DateFrom DateTo Priority# # 1 1 2018-11-01 2018-11-13 5# 2 1 2018-11-13 2018-12-01 6# 3 1 2018-12-01 2019-02-01 5# 4 2 2017-06-01 2018-03-01 5# 5 2 2018-03-01 2018-03-21 5# 6 2 2018-03-21 2018-05-01 6### confirm it is the same as your expected outputall(mapply(`==`, FinData, out_tidyverse))# [1] TRUEdata.tableI am using magrittr here in order to break out the flow in a readable fashion, but it is not required. If you’re comfortable with data.table by itself, then translating from the magrittr::%>% to a native data.table piping should be straight-forward.Also, I am using as.data.table instead of the often-preferred side-effect setDT, primarily so that you don’t use it on your production frame and not realize that many data.frame operations in R (on those two frames) now behave somewhat differently. If you’re up for using data.table, then feel free to step around this precaution.library(data.table)library(magrittr)OldDT <- as.data.table(OldData)NewDT <- as.data.table(NewData)out_DT <- rbind(OldDT, NewDT) %>% .[ order(ID, DateFrom), ] %>% .[, .i := seq_len(.N), by = .(ID) ] %>% .[, DateTo := fifelse(.i < .N & DateTo > shift(DateFrom, type = "lead") & Priority < shift(Priority, type = "lead"), shift(DateFrom, type = "lead"), DateTo), by = .(ID) ] %>% .[, DateFrom := fifelse(.i > 1 & DateFrom < shift(DateTo) & Priority < shift(Priority), shift(DateTo), DateFrom), by = .(ID) ] %>% .[, .i := NULL ]out_DT[]# ID DateFrom DateTo Priority# 1: 1 2018-11-01 2018-11-13 5# 2: 1 2018-11-13 2018-12-01 6# 3: 1 2018-12-01 2019-02-01 5# 4: 2 2017-06-01 2018-03-01 5# 5: 2 2018-03-01 2018-03-21 5# 6: 2 2018-03-21 2018-05-01 6all(mapply(`==`, FinData, out_DT))# [1] TRUEData:OldData <- read.table(header = TRUE, text="ID DateFrom DateTo Priority1 2018-11-01 2018-12-01 51 2018-12-01 2019-02-01 52 2017-06-01 2018-03-01 52 2018-03-01 2018-04-05 5")NewData <- read.table(header = TRUE, text="ID DateFrom DateTo Priority1 2018-11-13 2018-12-01 62 2018-03-21 2018-05-01 6")FinData <- read.table(header = TRUE, text="ID DateFrom DateTo Priority1 2018-11-01 2018-11-13 51 2018-11-13 2018-12-01 61 2018-12-01 2019-02-01 52 2017-06-01 2018-03-01 52 2018-03-01 2018-03-21 52 2018-03-21 2018-05-01 6")

Advertisement

Answer

tidyverse

data.table