Skip to content
Advertisement

How to set flag value based on data that use one-hot-encoding

I have a database consisting of three tables like this:

database

I want to make a machine learning model in R using that database, and the data I need is like this:

Required data

I can use one hot encoding to convert categorical variable from t_pengolahan (such as “Pengupasan, Fermentasi, etc”) into attributes. But, how to set flag (yes or no) to the data value based on “result (using SQL query)” data above?

Advertisement

Answer

We can combine two answers to previous related questions, each of which provides half of the solution; those answers are found here and here:

library(dplyr) ## dplyr and tidyr loaded for wrangling
library(tidyr)
options(dplyr.width = Inf) ## we want to show all columns of result
yes_fun <- function(x) { ## helps with pivot_wider() below
    if ( length(x) > 0 ) {
        return("yes")
    }
}
sql_result %>%
    separate_rows(pengolahan) %>% ## add rows for unique words in pengolahan
    pivot_wider(names_from = pengolahan, ## spread to yes/no indicators
                values_from = pengolahan,
                values_fill = list(pengolahan = "no"),
                values_fn = list(pengolahan = yes_fun))

Data

id_pangan  <- 1:3
kategori   <- c("Daging", "Buah", "Susu")
pengolahan <- c("Penggilingan, Perebusan", "Pengupasan",
                "Fermentasi, Sterilisasi")
batas      <- c(100, 50, 200)
sql_result <- data.frame(id_pangan, kategori, pengolahan, batas)

# A tibble: 3 x 8
  id_pangan kategori batas Penggilingan Perebusan Pengupasan
      <int> <fct>    <dbl> <chr>        <chr>     <chr>     
1         1 Daging     100 yes          yes       no        
2         2 Buah        50 no           no        yes       
3         3 Susu       200 no           no        no        
  Fermentasi Sterilisasi
  <chr>      <chr>      
1 no         no         
2 no         no         
3 yes        yes  
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement