Can you name dbplyr’s simulated lazy tables?

Question

dbplyr has some very useful-looking simulation functions so you can write queries while not connected to any real database, but I can&#8217;t seem to get actual table names into any of the queries I write &#8230;

Accepted Answer

I am not aware of any way to rename the simulated tables. According to to documentation, the important point of the simulate_* functions is to test database translation without actually connecting to a database.When connected to a remote table, dbplyr uses the database, schema, and table name defined using tbl(). It also fetches the column names. Because of this, I would recommend developing in an environment where R can connect to the database. Consider the following:# simulateddf_sim = tbl_lazy(mtcars, con = simulate_mssql())df_sim %>% head(5) %>% show_query()# outputSELECT TOP(5) *FROM `df`# actualdf = tbl('db_table_name', con = database_connection_object)df %>% head(5) %>% show_query()# outputSELECT TOP(5) col1, col2, col3FROM "database"."db_table_name"Not only does df get replaced by the table name, but the * in the simulated query is replaced by column names in the second query.One option you might consider if it is important to generate SQL scripts via simulation is converting to text, replacing, and converting back. For example:df_sim = tbl_lazy(mtcars, con = simulate_mssql())query = df_sim %>% head(5) %>% as.character()query = gsub("`df`", "[db].[schema].[table]", query)# write query out to filewriteLines(query, "file.sql")# OR create a remote connectionremote_table = tbl(db_connection, sql(query))remote_table %>% show_query()# outputSELECT TOP(5) *FROM [db].[schema].[table]

Advertisement

Answer