I have a database consisting of three tables like this:
I want to make a machine learning model in R using that database, and the data I need is like this:
I can use one hot encoding to convert categorical variable from t_pengolahan (such as “Pengupasan, Fermentasi, etc”) into attributes. But, how to set flag (yes or no) to the data value based on “result (using SQL query)” data above?
Advertisement
Answer
We can combine two answers to previous related questions, each of which provides half of the solution; those answers are found here and here:
library(dplyr) ## dplyr and tidyr loaded for wrangling library(tidyr) options(dplyr.width = Inf) ## we want to show all columns of result yes_fun <- function(x) { ## helps with pivot_wider() below if ( length(x) > 0 ) { return("yes") } } sql_result %>% separate_rows(pengolahan) %>% ## add rows for unique words in pengolahan pivot_wider(names_from = pengolahan, ## spread to yes/no indicators values_from = pengolahan, values_fill = list(pengolahan = "no"), values_fn = list(pengolahan = yes_fun))
Data
id_pangan <- 1:3 kategori <- c("Daging", "Buah", "Susu") pengolahan <- c("Penggilingan, Perebusan", "Pengupasan", "Fermentasi, Sterilisasi") batas <- c(100, 50, 200) sql_result <- data.frame(id_pangan, kategori, pengolahan, batas) # A tibble: 3 x 8 id_pangan kategori batas Penggilingan Perebusan Pengupasan <int> <fct> <dbl> <chr> <chr> <chr> 1 1 Daging 100 yes yes no 2 2 Buah 50 no no yes 3 3 Susu 200 no no no Fermentasi Sterilisasi <chr> <chr> 1 no no 2 no no 3 yes yes