Skip to content
Advertisement

Importing JSON data from SQL DB to an R dataframe

I would like to know whether there is a way of importing JSON data from a MySQL DB to an R dataframe.

I have a table like this:

id  created_at   json
1   2020-07-01   {"name":"Dent, Arthur","group":"Green","age (y)":43,"height (cm)":187,"wieght (kg)":89,"sensor":34834834}
2   2020-07-01   {"name":"Doe, Jane","group":"Blue","age (y)":23,"height (cm)":172,"wieght (kg)":67,"sensor":12342439}
3   2020-07-01   {"name":"Curt, Travis","group":"Red","age (y)":13,"height (cm)":128,"wieght (kg)":47,"sensor":83287699}

I would like to get the columns ‘id’ and ‘json’. I am using RMySQL package for getting the data from the db to an R dataframe but this gives me only the column ‘id’, the column ‘json’ contains only NAs in each row.

Is there any way how to import/load the data and get the json column displayed? And possibly to extract the “sensor” part of the json values?

The result would be a dataframe (df) like this:

id   json
1    {"name":"Dent, Arthur","group":"Green","age (y)":43,"height (cm)":187,"wieght (kg)":89,"sensor":34834834}
2    {"name":"Doe, Jane","group":"Blue","age (y)":23,"height (cm)":172,"wieght (kg)":67,"sensor":12342439}
3    {"name":"Curt, Travis","group":"Red","age (y)":13,"height (cm)":128,"wieght (kg)":47,"sensor":83287699}

Or with with the extracted value:

id   sensor
1    "sensor":34834834
2    "sensor":12342439
3    "sensor":83287699

Thank you very much for any suggestions.

Advertisement

Answer

Using unnest_wider from tidyr

library(dplyr)

con <- DBI::dbConnect(RMySQL::MySQL(), 'db_name', user = 'user', password = 'pass', host = 'hostname')

t <- tbl(con, 'table_name')
  
t %>% 
  as_tibble() %>% 
  transmute(j = purrr::map(json, jsonlite::fromJSON)) %>%
  tidyr::unnest_wider(j)


DBI::dbDisconnect(con)

Result:

# A tibble: 3 x 6
  name         group `age (y)` `height (cm)` `wieght (kg)`   sensor
  <chr>        <chr>     <int>         <int>         <int>    <int>
1 Dent, Arthur Green        43           187            89 34834834
2 Doe, Jane    Blue         23           172            67 12342439
3 Curt, Travis Red          13           128            47 83287699

If you want to only retrieve data from the last 24 hours (as the OP requested) change the tbl(con, 'table_name') statement to:

t <- DBI::dbGetQuery(con, 'SELECT * FROM `table_name` WHERE DATE(`created_at`) > NOW() - INTERVAL 1 DAY')
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement