I have a data set of record numbers, with multiple values in a column for each record. I would like to produce a table where within each group it returns the minimum non-negative, non-NA value. But if that negative number or NA corresponds to the only row for that record, then I would like it to return NA.
data <- data.frame(record = c(1, 1, 1, 2, 3, 3, 4), value = c(-1, 2, 40, -10, 3, 4, NA)) | record | value | ------------------ | 1 | -1 | | 1 | 2 | | 1 | 40 | | 2 | -10 | | 3 | 3 | | 3 | 4 | | 4 | NA | ------------------
I would like to get
| record | out | ------------------ | 1 | 2 | | 2 | NA | | 3 | 3 | | 4 | NA | ------------------
In R I have tried:
filter(data, value >= 0 | is.na(value)) data %>% group_by(record) %>% summarize(out = min(value))
In MySQL I have tried
select min(value) as out from data where (value >= 0 or value is null))
However, both of these solutions drop record #2 from the table.
How do I include a clause that detects the number of rows per record and doesn’t eliminate single records?
Advertisement
Answer
One option is to replace
those ‘value’ less than 0 to NA
, grouped by ‘record’, slice
the rows where we get the min
of ‘value’ or if all are NA
, get the first row
data %>% mutate(value = replace(value, value < 0, NA)) %>% group_by(record) %>% slice(if(all(is.na(value))) 1 else which.min(value)) # A tibble: 4 x 2 # Groups: record [4] # record value # <dbl> <dbl> #1 1 2 #2 2 NA #3 3 3 #4 4 NA