I need to apply a procedure in SQL that is easy for me since R, but has been really tortuous in SQL.
I need to sort the data from highest to lowest by two variables, group based on another variable, and select the first item in each group.
I leave the code that I am trying to pass from R to SQL. Unfortunately the dbplyr package throws me an error when trying to convert one language to another: Error: first()
is only available in a windowed (mutate()
) context
library(tidyverse) library(dbplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2 mtcars2 %>% arrange(-mpg,-disp) %>% group_by(cyl) %>% summarise(hp = first(hp)) %>% show_query()
It seems to me that the DISTINCT ON function could help me.
Thanks for your help.
Advertisement
Answer
Maybe the following?
library(tidyverse) library(dbplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2 %>% arrange(-mpg,-disp) %>% group_by(cyl) %>% mutate(hp = first(hp)) %>% select(cyl, hp) %>% distinct %>% show_query #> <SQL> #> SELECT DISTINCT `cyl`, FIRST_VALUE(`hp`) OVER (PARTITION BY `cyl` ORDER BY -`mpg`, -`disp`) AS `hp` #> FROM `mtcars` #> ORDER BY -`mpg`, -`disp`