Each month I run a script in RStudio, on the 9th of the month. I need to manually append the latest DATAyymm sql data set onto the my_data vector, before running the script.
Is there a way I can have the program automatically create this vector? Maybe it could connect to SQL Server and find what the latest available DATAyymm was for the 9th of each month of the year so far?
So if I’m running the script on the 9th of July I would want it to create a vector like this:
my_data <- paste0('reference.dbo.', c('DATA20NOV', 'DATA20NOV', 'DATA21FEB', 'DATA21FEB','DATA21MAY','DATA21MAY'))
So each of the SQL data sets used represents the data used for each month of the year. In the above case January to June.
Advertisement
Answer
You can get the list of tables fairly easily with
tblnames <- DBI::dbListTables(con) tblnames <- unique(grep("^DATA", tblnames, value = TRUE)) tblnames # [1] "DATA20NOV" "DATA21FEB" "DATA21MAY"
With this, we can find the most recent:
gsub("^DATA", "", tblnames) # [1] "20NOV" "21FEB" "21MAY" as.Date(paste0(gsub("^DATA", "", tblnames), "01"), format = "%y%b%d") # [1] "2020-11-01" "2021-02-01" "2021-05-01" tblnames[order(as.Date(paste0(gsub("^DATA", "", tblnames), "01"), format = "%y%b%d"))] # [1] "DATA20NOV" "DATA21FEB" "DATA21MAY"
This created a vector of the tables ordered (ascending) by its date. With this, you can easily capture the most-recent table created.
To append this month’s data, you can use
tblnames <- c(tblnames, toupper(format(Sys.Date(), format = "DATA%y%b")))