I am using R. In a previous post (R: Loop Producing the Following Error: Argument 1 must have names), I learned how to make a function (“create_data”) for my code.
Now, I am trying to modify this function.
First, I create some data to be used for this example:
#load library library(dplyr) set.seed(123) # create some data for this example a1 = rnorm(1000,100,10) b1 = rnorm(1000,100,5) c1 = sample.int(1000, 1000, replace = TRUE) train_data = data.frame(a1,b1,c1)
Here is the modified version of the function:
create_data <- function() { #generate random numbers random_1 = runif(1, 80, 120) random_2 = runif(1, random_1, 120) random_3 = runif(1, 85, 120) random_4 = runif(1, random_3, 120) #bin data according to random criteria train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c"))) train_data$cat = as.factor(train_data$cat) #new splits a_table = train_data %>% select(a1, b1, c1) %>% filter(cat == "a") b_table = train_data %>% select(a1, b1, c1) %>% filter(cat == "b") c_table = train_data %>% select(a1, b1, c1) %>% filter(cat == "c") split_1 = runif(1,0, 1) split_2 = runif(1, 0, 1) split_3 = runif(1, 0, 1) #calculate 60th quantile ("quant") for each bin table_a = data.frame(a_table%>% group_by(cat) %>% mutate(quant = quantile(c1, prob = split_1))) table_b = data.frame(b_table%>% group_by(cat) %>% mutate(quant = quantile(c1, prob = split_2))) table_c = data.frame(c_table%>% group_by(cat) %>% mutate(quant = quantile(c1, prob = split_3))) #create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1" table_a$diff = ifelse(table_a$quant > table_a$c1,1,0) table_b$diff = ifelse(table_b$quant > table_b$c1,1,0) table_c$diff = ifelse(table_c$quant > table_c$c1,1,0) #group all tables final_table = rbind(table_a, table_b, table_c) #create a table: for each bin, calculate the average of "diff" final_table_2 = data.frame(final_table %>% group_by(cat) %>% summarize( mean = mean(diff) )) #add "total mean" to this table final_table_2 = data.frame(final_table_2 %>% add_row(cat = "total", mean = mean(final_table$diff))) #format this table: add the random criteria to this table for reference final_table_2$random_1 = random_1 final_table_2$random_2 = random_2 final_table_2$random_3 = random_3 final_table_2$random_4 = random_4 final_table_2$split_1 = split_1 final_table_2$split_2 = split_2 final_table_2$split_3 = split_3 final_table$iteration_number = i }
The error results when I try to call the function:
Error: Problem with `filter()` input `..1`. i Input `..1` is `cat == "a"`. x comparison (1) is possible only for atomic and list types
I have a feeling that maybe the error is occurring over here:
a_table = train_data %>% select(a1, b1, c1) %>% filter(cat == "a")
I tried to replace this “select” with a non-dplyr version:
a_table <- train_data[cat == "a", ]
But this also producing an error:
Error in cat == "a" : comparison (1) is possible only for atomic and list types
Can someone please show me what I am doing wrong?
Thanks
Advertisement
Answer
You are selecting only 3 columns here which does not include cat
column hence you get the error.
a_table = train_data %>% select(a1, b1, c1) %>% filter(cat == "a")
Instead you can first filter
and then select
.
a_table = train_data %>% filter(cat == "a") %>% select(a1, b1, c1)
Same should be applied for b_table
and c_table
.