I would like to know whether there is a way of importing JSON data from a MySQL DB to an R dataframe.
I have a table like this:
id created_at json
1 2020-07-01 {"name":"Dent, Arthur","group":"Green","age (y)":43,"height (cm)":187,"wieght (kg)":89,"sensor":34834834}
2 2020-07-01 {"name":"Doe, Jane","group":"Blue","age (y)":23,"height (cm)":172,"wieght (kg)":67,"sensor":12342439}
3 2020-07-01 {"name":"Curt, Travis","group":"Red","age (y)":13,"height (cm)":128,"wieght (kg)":47,"sensor":83287699}
I would like to get the columns ‘id’ and ‘json’. I am using RMySQL package for getting the data from the db to an R dataframe but this gives me only the column ‘id’, the column ‘json’ contains only NAs in each row.
Is there any way how to import/load the data and get the json column displayed? And possibly to extract the “sensor” part of the json values?
The result would be a dataframe (df) like this:
id json
1 {"name":"Dent, Arthur","group":"Green","age (y)":43,"height (cm)":187,"wieght (kg)":89,"sensor":34834834}
2 {"name":"Doe, Jane","group":"Blue","age (y)":23,"height (cm)":172,"wieght (kg)":67,"sensor":12342439}
3 {"name":"Curt, Travis","group":"Red","age (y)":13,"height (cm)":128,"wieght (kg)":47,"sensor":83287699}
Or with with the extracted value:
id sensor
1 "sensor":34834834
2 "sensor":12342439
3 "sensor":83287699
Thank you very much for any suggestions.
Advertisement
Answer
Using unnest_wider
from tidyr
library(dplyr)
con <- DBI::dbConnect(RMySQL::MySQL(), 'db_name', user = 'user', password = 'pass', host = 'hostname')
t <- tbl(con, 'table_name')
t %>%
as_tibble() %>%
transmute(j = purrr::map(json, jsonlite::fromJSON)) %>%
tidyr::unnest_wider(j)
DBI::dbDisconnect(con)
Result:
# A tibble: 3 x 6
name group `age (y)` `height (cm)` `wieght (kg)` sensor
<chr> <chr> <int> <int> <int> <int>
1 Dent, Arthur Green 43 187 89 34834834
2 Doe, Jane Blue 23 172 67 12342439
3 Curt, Travis Red 13 128 47 83287699
If you want to only retrieve data from the last 24 hours (as the OP requested) change the tbl(con, 'table_name')
statement to:
t <- DBI::dbGetQuery(con, 'SELECT * FROM `table_name` WHERE DATE(`created_at`) > NOW() - INTERVAL 1 DAY')