I have a data set of record numbers, with multiple values in a column for each record. I would like to produce a table where within each group it returns the minimum non-negative, non-NA value. But if that negative number or NA corresponds to the only row for that record, then I would like it to return NA.
x
data <- data.frame(record = c(1, 1, 1, 2, 3, 3, 4), value = c(-1, 2, 40, -10, 3, 4, NA))
| record | value |
------------------
| 1 | -1 |
| 1 | 2 |
| 1 | 40 |
| 2 | -10 |
| 3 | 3 |
| 3 | 4 |
| 4 | NA |
------------------
I would like to get
| record | out |
------------------
| 1 | 2 |
| 2 | NA |
| 3 | 3 |
| 4 | NA |
------------------
In R I have tried:
filter(data, value >= 0 | is.na(value))
data %>% group_by(record) %>% summarize(out = min(value))
In MySQL I have tried
select min(value) as out
from data
where (value >= 0 or value is null))
However, both of these solutions drop record #2 from the table.
How do I include a clause that detects the number of rows per record and doesn’t eliminate single records?
Advertisement
Answer
One option is to replace
those ‘value’ less than 0 to NA
, grouped by ‘record’, slice
the rows where we get the min
of ‘value’ or if all are NA
, get the first row
data %>%
mutate(value = replace(value, value < 0, NA)) %>%
group_by(record) %>%
slice(if(all(is.na(value))) 1 else which.min(value))
# A tibble: 4 x 2
# Groups: record [4]
# record value
# <dbl> <dbl>
#1 1 2
#2 2 NA
#3 3 3
#4 4 NA