I’m working alongside a SQL tutorial using queryparser and tidyquery in R. This has been going well until I was asked to do:
SELECT ... SUM(new_vaccinations) OVER (PARTITION BY location) as vaccinations_to_date
Tidyquery reported that it did not support
OVER functions so I am trying to replicate the
OVER (PARTITION BY...) function with dplyr.
This led me to
with_order(order_by =... in dplyr. Now I’m struggling with getting the
fun = to allow me to create a cumulative sum column.
library(tidyverse) library(queryparser) library(tidyquery) mydf <- data.frame(date = as.Date(c("2021-06-01", '2021-06-02','2021-06-03','2021-06-04', '2021-6-01','2021-6-02','2021-6-03','2021-6-04', '2021-6-01','2021-6-02','2021-6-03','2021-6-04')), location = c('United States','United States','United States','United States', 'Canada','Canada','Canada','Canada','Mexico','Mexico','Mexico','Mexico'), new_vaccinations = c(100,98,32,50,99,34,97,53,35,101,97,56)) test <- mydf %>% mutate (total_vax = with_order(order_by = location, fun = cumsum(new_vaccinations), x = desc(location)))
This gives me the error
could not find function "fun"
Am I looking down the wrong rabbit hole when it comes to how to recreating
OVER(PARTITION BY...)? If so what is a better option?
Or am I missing how to properly use
If it is not clear, my goal is to create a new column that keeps a running total of vaccinations for each separate location.
PARTITION BY aspect of SQL can often be done in dplyr using
ORDER BY aspect of SQL can often be done in dplyr using
Consider this R code:
library(dplyr) data(mtcars) mtcars %>% select(mpg, cyl) %>% group_by(cyl) %>% arrange(mpg) %>% mutate(new = cumsum(mpg)) %>% arrange(cyl, mpg) %>% select(cyl, mpg, new)
Is equivalent to this SQL:
SELECT cyl ,mpg ,SUM(mpg) OVER (PARTITON BY cyl ORDER BY mpg) AS new FROM mtcars