R : x comparison (1) is possible only for atomic a…

I am using R. In a previous post (R: Loop Producing the Following Error: Argument 1 must have names), I learned how to make a function (“create_data”) for my code.

Now, I am trying to modify this function.

First, I create some data to be used for this example:

#load library 
library(dplyr)
 set.seed(123)
 
 # create some data for this example
 a1 = rnorm(1000,100,10)
 b1 = rnorm(1000,100,5)
 c1 = sample.int(1000, 1000, replace = TRUE)
 train_data = data.frame(a1,b1,c1)

​x
 
#load library library(dplyr) set.seed(123)  # create some data for this example a1 = rnorm(1000,100,10) b1 = rnorm(1000,100,5) c1 = sample.int(1000, 1000, replace = TRUE) train_data = data.frame(a1,b1,c1)​

Here is the modified version of the function:

create_data <- function() {

#generate random numbers
 random_1 =  runif(1, 80, 120)
random_2 =  runif(1, random_1, 120)
 random_3 =  runif(1, 85, 120)
 random_4 =  runif(1, random_3, 120)

#bin data according to random criteria
train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c"))) 

train_data$cat = as.factor(train_data$cat)

#new splits
a_table = train_data %>% 
  select(a1, b1, c1) %>%
  filter(cat == "a")


b_table = train_data %>% 
  select(a1, b1, c1) %>%
  filter(cat == "b")


c_table = train_data %>% 
  select(a1, b1, c1) %>%
  filter(cat == "c")


 split_1 =  runif(1,0, 1)
 split_2 =  runif(1, 0, 1)
 split_3 =  runif(1, 0, 1)

#calculate 60th quantile ("quant") for each bin

table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_1))) 

table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_2)))

table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_3)))




#create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)

#group all tables

final_table = rbind(table_a, table_b, table_c)

#create a table: for each bin, calculate the average of "diff"
final_table_2 = data.frame(final_table %>% 
  group_by(cat) %>% 
  summarize(
   mean = mean(diff)
  ))

#add "total mean" to this table
final_table_2 = data.frame(final_table_2 %>% add_row(cat = "total", mean = mean(final_table$diff)))

#format this table: add the random criteria to this table for reference
final_table_2$random_1 = random_1

final_table_2$random_2 = random_2

final_table_2$random_3 = random_3

final_table_2$random_4 = random_4

final_table_2$split_1 = split_1

final_table_2$split_2 = split_2

final_table_2$split_3 = split_3

final_table$iteration_number = i

}

 
create_data <- function() {​#generate random numbers random_1 =  runif(1, 80, 120)random_2 =  runif(1, random_1, 120) random_3 =  runif(1, 85, 120) random_4 =  runif(1, random_3, 120)​#bin data according to random criteriatrain_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c"))) ​train_data$cat = as.factor(train_data$cat)​#new splitsa_table = train_data %>%   select(a1, b1, c1) %>%  filter(cat == "a")​​b_table = train_data %>%   select(a1, b1, c1) %>%  filter(cat == "b")​​c_table = train_data %>%   select(a1, b1, c1) %>%  filter(cat == "c")​​ split_1 =  runif(1,0, 1) split_2 =  runif(1, 0, 1) split_3 =  runif(1, 0, 1)​#calculate 60th quantile ("quant") for each bin​table_a = data.frame(a_table%>% group_by(cat) %>%mutate(quant = quantile(c1, prob = split_1))) ​table_b = data.frame(b_table%>% group_by(cat) %>%mutate(quant = quantile(c1, prob = split_2)))​table_c = data.frame(c_table%>% group_by(cat) %>%mutate(quant = quantile(c1, prob = split_3)))​​​​#create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)​#group all tables​final_table = rbind(table_a, table_b, table_c)​#create a table: for each bin, calculate the average of "diff"final_table_2 = data.frame(final_table %>%   group_by(cat) %>%   summarize(   mean = mean(diff)  ))​#add "total mean" to this tablefinal_table_2 = data.frame(final_table_2 %>% add_row(cat = "total", mean = mean(final_table$diff)))​#format this table: add the random criteria to this table for referencefinal_table_2$random_1 = random_1​final_table_2$random_2 = random_2​final_table_2$random_3 = random_3​final_table_2$random_4 = random_4​final_table_2$split_1 = split_1​final_table_2$split_2 = split_2​final_table_2$split_3 = split_3​final_table$iteration_number = i​}​

The error results when I try to call the function:

 Error: Problem with `filter()` input `..1`.
i Input `..1` is `cat == "a"`.
x comparison (1) is possible only for atomic and list types

 
 Error: Problem with `filter()` input `..1`.i Input `..1` is `cat == "a"`.x comparison (1) is possible only for atomic and list types​

I have a feeling that maybe the error is occurring over here:

a_table = train_data %>% 
  select(a1, b1, c1) %>%
  filter(cat == "a")

 
a_table = train_data %>%   select(a1, b1, c1) %>%  filter(cat == "a")​

I tried to replace this “select” with a non-dplyr version:

a_table <- train_data[cat == "a", ]

 
a_table <- train_data[cat == "a", ]​

But this also producing an error:

Error in cat == "a" : 
  comparison (1) is possible only for atomic and list types

 
Error in cat == "a" :   comparison (1) is possible only for atomic and list types​

Can someone please show me what I am doing wrong?

Thanks

Answer

You are selecting only 3 columns here which does not include cat column hence you get the error.

a_table = train_data %>% 
  select(a1, b1, c1) %>%
  filter(cat == "a")

 
a_table = train_data %>%   select(a1, b1, c1) %>%  filter(cat == "a")​

Instead you can first filter and then select.

a_table = train_data %>% 
  filter(cat == "a") %>%
  select(a1, b1, c1)

 
a_table = train_data %>%   filter(cat == "a") %>%  select(a1, b1, c1)​

Same should be applied for b_table and c_table.

R : x comparison (1) is possible only for atomic and list types

Advertisement

Answer