Hi all I’m writing a sql query on R using sqldf and seem to hit a roadblock. I have a table with an Id column, two dates columns and a grouping by column.
AlertDate AppointmentDate ID Branch 01/01/20 04/01/20 1 W1 01/01/20 09/01/20 1 W1 08/01/20 09/01/20 1 W2 01/01/20 23/01/20 1 W1
The query I’m writing is
sqldf('select Branch,count(ID) from df where AlertDate <= AppointmentDate and AppointmentDate <AlertDate+7 group by Branch')
From this query the result I’m getting is
Branch Count W1 1 W2 1
Whichis correct based on the query. What I want to achieve is if my 2nd condition is false ie AppointmentDate is less than AlertDate+7. Instead of dropping the count it should be counted in the next group depending on date. Example if the alertdate is 01/01/20 and appointment date is 23/01/20 then it should be counted in W4. ceil((Appointmentdate-alertdate)/7) SO in the end I want the result as
Branch Count W1 1 W2 2 W4 1
The second row should be counted in W2 and 4th should be in W4 rather than being discarded. I was trying to achieve this in sql using sqldf in R. Any possible solution using R or Sql would work for me.
Output of dput(test)
structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class = "Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems = structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null", file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list( cols = list(AlertDate = structure(list(format = "%d/%m/%y"), class = c("collector_date", "collector")), AppointmentDate = structure(list(format = "%d/%m/%y"), class = c("collector_date", "collector")), ID = structure(list(), class = c("collector_double", "collector")), Branch = structure(list(), class = c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1), class = "col_spec"))
Advertisement
Answer
Here is one way to do it using data.table
df <- structure(list(AlertDate = structure(c(18262, 18262, 18269, 18262), class = "Date"), AppointmentDate = structure(c(18265, 18270,18270, 18284), class = "Date"), ID = c(1, 1, 1, 1), Branch = c("W1","W1", "W2", "W1")), class = c("spec_tbl_df", "tbl_df", "tbl","data.frame"), row.names = c(NA, -4L), problems = structure(list( row = 4L, col = "Branch", expected = "", actual = "embedded null", file = "'C:/Users/FRssarin/Desktop/test.txt'"), row.names = c(NA,-1L), class = c("tbl_df", "tbl", "data.frame")), spec = structure(list( cols = list(AlertDate = structure(list(format = "%d/%m/%y"), class = c("collector_date",
I am converting this to data.table and creating a new column for your logic.
library(data.table) df <- data.table(df) df <- df[AlertDate <= AppointmentDate] df[, new_branch:= ifelse(as.numeric(AppointmentDate-AlertDate)>=7 ,paste0("W", as.character(ceiling(as.numeric(AppointmentDate-AlertDate)/7))),Branch)]
Here is the resulting table
AlertDate AppointmentDate ID Branch new_branch 1: 2020-01-01 2020-01-04 1 W1 W1 2: 2020-01-01 2020-01-09 1 W1 W2 3: 2020-01-08 2020-01-09 1 W2 W2 4: 2020-01-01 2020-01-23 1 W1 W4
This is the goupby result..
df[, .(.N, alert=head(AlertDate,1), appoint=head(AppointmentDate,1)), by = list(new_branch)] new_branch N alert appoint 1: W1 1 2020-01-01 2020-01-04 2: W2 2 2020-01-01 2020-01-09 3: W4 1 2020-01-01 2020-01-23