Skip to content
Advertisement

Find the latest version of a dataset available in SQL Server on specified dates, using RStudio

Each month I run a script in RStudio, on the 9th of the month. I need to manually append the latest DATAyymm sql data set onto the my_data vector, before running the script.

Is there a way I can have the program automatically create this vector? Maybe it could connect to SQL Server and find what the latest available DATAyymm was for the 9th of each month of the year so far?

So if I’m running the script on the 9th of July I would want it to create a vector like this:

my_data <- paste0('reference.dbo.', c('DATA20NOV', 'DATA20NOV', 'DATA21FEB', 'DATA21FEB','DATA21MAY','DATA21MAY'))

So each of the SQL data sets used represents the data used for each month of the year. In the above case January to June.

Advertisement

Answer

You can get the list of tables fairly easily with

tblnames <- DBI::dbListTables(con)
tblnames <- unique(grep("^DATA", tblnames, value = TRUE))
tblnames
# [1] "DATA20NOV" "DATA21FEB" "DATA21MAY"

With this, we can find the most recent:

gsub("^DATA", "", tblnames)
# [1] "20NOV" "21FEB" "21MAY"
as.Date(paste0(gsub("^DATA", "", tblnames), "01"), format = "%y%b%d")
# [1] "2020-11-01" "2021-02-01" "2021-05-01"

tblnames[order(as.Date(paste0(gsub("^DATA", "", tblnames), "01"), format = "%y%b%d"))]
# [1] "DATA20NOV" "DATA21FEB" "DATA21MAY"

This created a vector of the tables ordered (ascending) by its date. With this, you can easily capture the most-recent table created.

To append this month’s data, you can use

tblnames <- c(tblnames, toupper(format(Sys.Date(), format = "DATA%y%b")))
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement