I have a database consisting of three tables like this:
I want to make a machine learning model in R using that database, and the data I need is like this:
I can use one hot encoding to convert categorical variable from t_pengolahan (such as “Pengupasan, Fermentasi, etc”) into attributes. But, how to set flag (yes or no) to the data value based on “result (using SQL query)” data above?
Advertisement
Answer
We can combine two answers to previous related questions, each of which provides half of the solution; those answers are found here and here:
x
library(dplyr) ## dplyr and tidyr loaded for wrangling
library(tidyr)
options(dplyr.width = Inf) ## we want to show all columns of result
yes_fun <- function(x) { ## helps with pivot_wider() below
if ( length(x) > 0 ) {
return("yes")
}
}
sql_result %>%
separate_rows(pengolahan) %>% ## add rows for unique words in pengolahan
pivot_wider(names_from = pengolahan, ## spread to yes/no indicators
values_from = pengolahan,
values_fill = list(pengolahan = "no"),
values_fn = list(pengolahan = yes_fun))
Data
id_pangan <- 1:3
kategori <- c("Daging", "Buah", "Susu")
pengolahan <- c("Penggilingan, Perebusan", "Pengupasan",
"Fermentasi, Sterilisasi")
batas <- c(100, 50, 200)
sql_result <- data.frame(id_pangan, kategori, pengolahan, batas)
# A tibble: 3 x 8
id_pangan kategori batas Penggilingan Perebusan Pengupasan
<int> <fct> <dbl> <chr> <chr> <chr>
1 1 Daging 100 yes yes no
2 2 Buah 50 no no yes
3 3 Susu 200 no no no
Fermentasi Sterilisasi
<chr> <chr>
1 no no
2 no no
3 yes yes