I’m having a hard time understanding the behavior of these three packages: RSQLite
, DBI
, and dbplyr
.
I’m using the mtcars
dataframe and I’m loading it with the tibble
package to add the car name as one of the rows.
library(tibble) mtcars <- tibble::as_tibble(mtcars, rownames = 'car')
I also created my own table that has information about where all these cars were built:
car <- c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout", "Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280", "Merc 280C", "Merc 450SE", "Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood", "Lincoln Continental", "Chrysler Imperial", "Fiat 128", "Honda Civic", "Toyota Corolla", "Toyota Corona", "Dodge Challenger", "AMC Javelin", "Camaro Z28", "Pontiac Firebird", "Fiat X1-9", "Porsche 914-2", "Lotus Europa", "Ford Pantera L", "Ferrari Dino", "Maserati Bora", "Volvo 142E") origin <- c("Japan", "Japan", "Japan", "United States", "United States", "United States", "United States", "Germany", "Germany", "Germany", "Germany", "Germany", "Germany", "Germany", "United States", "United States", "United States", "Italy", "Japan", "Japan", "Japan", "United States", "United States", "United States", "United States", "Italy", "Germany", "British", "United States", "Italy", "Italy", "Sweden") cars_origin_tbl <- tibble(car, origin)
By creating these two tables I can practice joins using R. Here is an example of two different ways to write a semi_join
:
mtcars %>% dplyr::semi_join(cars_origin_tbl, by = 'car') -> mtcars_semi_join # another method using value matching mtcars_semi_join <- mtcars %>% filter(car %in% cars_origin_tbl$car)
But I really want to feel more comfortable making joins using SQL syntax. I have an upcoming job that will involve more SQL and I really want to study before I start. For me, the best way to study is by doing this stuff. Watching example online without doing it myself just doesn’t stick for me.
So here I am trying to create a SQL version of stuff using these R packages.
library(dbplyr) library(RSQLite) library(DBI)
This YouTube video from Andrew Couch has been very helpful. Here is the link.
sql_mtcars <- mtcars con <- RSQLite::dbConnect(SQLite(), ":memory:") dplyr::copy_to(con, sql_mtcars) sql_mtcars_db <- tbl(con, "sql_mtcars") sql_mtcars_db %>% dplyr::select(car, mpg, wt) %>% dplyr::show_query()
This show_query
thing is interesting. It shows the SQL version of the dplyr
code.
Another SO post here was answered and showed me how to find the tables listed using dbListTables
.
dbListTables(con)
This dbGetQuery
will run the code and get the expected output.
DBI::dbGetQuery(con, ' SELECT sql_mtcars.mpg, sql_mtcars.wt FROM sql_mtcars ')
Here is where I’m confused. I’m trying to put the cars_origin_tbl
into RSQLite
and then practice joining two different tables.
cars_origin_tbl <- cars_origin_tbl con <- dbConnect(SQLite(), ":memory:") copy_to(con, cars_origin_tbl) cars_origin_tbl_db <- tbl(con, "cars_origin_tbl") dbListTables(con)
After I run dbListTables()
it’s like my first sql_mtcars
was forgotten and disappeared.
Now this dbGetQuery
works fine because I’m just calling the newer table.
DBI::dbGetQuery(con, ' SELECT cars_origin_tbl.car FROM cars_origin_tbl ')
But this other dbGetQuery
doesn’t work because I’m referring to both tables, which I thought would both be in RSQLite
at this point. I get a message saying there is no sql_mtcars
.
DBI::dbGetQuery(con, ' SELECT sql_mtcars.mpg, sql_mtcars.wt FROM sql_mtcars LEFT JOIN cars_origin_tbl ON sql_mtcars.car = cars_origin_tbl.car; ') Error: no such table: sql_mtcars
I’m basically trying to get two tables into the connection so I can practice joining them, but I’m not sure how to get both tables in there without one disappearing.
con <- RSQLite::dbConnect(SQLite(), ":memory:") con2 <- RSQLite::dbConnect(SQLite(), ":memory:") dplyr::copy_to(c(con, con2), cars_origin_tbl) Error in UseMethod("copy_to") : no applicable method for 'copy_to' applied to an object of class "list"
Advertisement
Answer
The solution is to use the same connection for both tables. The following works:
# create data base connection con <- RSQLite::dbConnect(SQLite(), ":memory:") # copy tables to data base dplyr::copy_to(con, sql_mtcars) dplyr::copy_to(con, cars_origin_tbl) # check contents of data base dbListTables(con) # create local pointer to remote table sql_mtcars_db <- tbl(con, "sql_mtcars") cars_origin_tbl_db <- tbl(con, "cars_origin_tbl")
You can now semi-join these two tables:
mtcars_semi_join = sql_mtcars_db %>% semi_join(cars_origin_tbl_db, by = 'car') show_query(mtcars_semi_join)
Producing the following SQL:
SELECT * FROM `sql_mtcars` AS `LHS` WHERE EXISTS ( SELECT 1 FROM `cars_origin_tbl` AS `RHS` WHERE (`LHS`.`car` = `RHS`.`car`)
As a side note, sql_mtcars
and sql_mtcars_db
are different R objects. The first is a table in local R memory. The second is a local access point to a remote table stored in a database.