Skip to content
Advertisement

R using RSQLite, DBI and dbplyr, can’t have more than one table in the RSQLite in order to make a join

I’m having a hard time understanding the behavior of these three packages: RSQLite, DBI, and dbplyr.

I’m using the mtcars dataframe and I’m loading it with the tibble package to add the car name as one of the rows.

library(tibble)

mtcars <- tibble::as_tibble(mtcars, rownames = 'car')  

I also created my own table that has information about where all these cars were built:

car <- c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", "Hornet Sportabout", "Valiant", "Duster 360", "Merc 240D", "Merc 230", "Merc 280", "Merc 280C", "Merc 450SE", "Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood", "Lincoln Continental", "Chrysler Imperial", "Fiat 128", "Honda Civic", "Toyota Corolla", "Toyota Corona", "Dodge Challenger", "AMC Javelin", "Camaro Z28", "Pontiac Firebird", "Fiat X1-9", "Porsche 914-2", "Lotus Europa", "Ford Pantera L", "Ferrari Dino", "Maserati Bora", "Volvo 142E")

origin <- c("Japan", "Japan", "Japan", "United States", "United States", "United States", "United States", "Germany", "Germany", "Germany", "Germany", "Germany", "Germany", "Germany", "United States", "United States", "United States", "Italy", "Japan", "Japan", "Japan", "United States", "United States", "United States", "United States", "Italy", "Germany", "British", "United States", "Italy", "Italy", "Sweden")

cars_origin_tbl <- tibble(car, origin)

By creating these two tables I can practice joins using R. Here is an example of two different ways to write a semi_join:

mtcars %>% dplyr::semi_join(cars_origin_tbl, by = 'car') -> mtcars_semi_join

# another method using value matching

mtcars_semi_join <- mtcars %>%
  filter(car %in% cars_origin_tbl$car)

But I really want to feel more comfortable making joins using SQL syntax. I have an upcoming job that will involve more SQL and I really want to study before I start. For me, the best way to study is by doing this stuff. Watching example online without doing it myself just doesn’t stick for me.

So here I am trying to create a SQL version of stuff using these R packages.

library(dbplyr)
library(RSQLite)
library(DBI)

This YouTube video from Andrew Couch has been very helpful. Here is the link.

sql_mtcars <- mtcars
con <- RSQLite::dbConnect(SQLite(), ":memory:")
dplyr::copy_to(con, sql_mtcars)

sql_mtcars_db <- tbl(con, "sql_mtcars")

sql_mtcars_db %>%
  dplyr::select(car, mpg, wt) %>%
  dplyr::show_query()

This show_query thing is interesting. It shows the SQL version of the dplyr code.

enter image description here

Another SO post here was answered and showed me how to find the tables listed using dbListTables.

dbListTables(con)

enter image description here

This dbGetQuery will run the code and get the expected output.

DBI::dbGetQuery(con, '
SELECT sql_mtcars.mpg, sql_mtcars.wt
FROM sql_mtcars
  ')

enter image description here

Here is where I’m confused. I’m trying to put the cars_origin_tbl into RSQLite and then practice joining two different tables.

cars_origin_tbl <- cars_origin_tbl
con <- dbConnect(SQLite(), ":memory:")
copy_to(con, cars_origin_tbl)

cars_origin_tbl_db <- tbl(con, "cars_origin_tbl")

dbListTables(con)

After I run dbListTables() it’s like my first sql_mtcars was forgotten and disappeared.

enter image description here

Now this dbGetQuery works fine because I’m just calling the newer table.

DBI::dbGetQuery(con, '
SELECT cars_origin_tbl.car
FROM cars_origin_tbl
  ')

But this other dbGetQuery doesn’t work because I’m referring to both tables, which I thought would both be in RSQLite at this point. I get a message saying there is no sql_mtcars.

DBI::dbGetQuery(con, '
SELECT sql_mtcars.mpg, sql_mtcars.wt
FROM sql_mtcars
LEFT JOIN cars_origin_tbl 
ON sql_mtcars.car = cars_origin_tbl.car;
  ')

Error: no such table: sql_mtcars

I’m basically trying to get two tables into the connection so I can practice joining them, but I’m not sure how to get both tables in there without one disappearing.

con <- RSQLite::dbConnect(SQLite(), ":memory:")
con2 <- RSQLite::dbConnect(SQLite(), ":memory:")
dplyr::copy_to(c(con, con2), cars_origin_tbl)

Error in UseMethod("copy_to") : 
  no applicable method for 'copy_to' applied to an object of class "list"

Advertisement

Answer

The solution is to use the same connection for both tables. The following works:

# create data base connection
con <- RSQLite::dbConnect(SQLite(), ":memory:")

# copy tables to data base
dplyr::copy_to(con, sql_mtcars)
dplyr::copy_to(con, cars_origin_tbl)

# check contents of data base
dbListTables(con)

# create local pointer to remote table
sql_mtcars_db <- tbl(con, "sql_mtcars")
cars_origin_tbl_db <- tbl(con, "cars_origin_tbl")

You can now semi-join these two tables:

mtcars_semi_join = sql_mtcars_db %>%
  semi_join(cars_origin_tbl_db, by = 'car')

show_query(mtcars_semi_join)

Producing the following SQL:

SELECT * FROM `sql_mtcars` AS `LHS`
WHERE EXISTS (
  SELECT 1 FROM `cars_origin_tbl` AS `RHS`
  WHERE (`LHS`.`car` = `RHS`.`car`)

As a side note, sql_mtcars and sql_mtcars_db are different R objects. The first is a table in local R memory. The second is a local access point to a remote table stored in a database.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement