Filtering out non-zero values unless the value is the only one within its group (SQL or R)

I have a data set of record numbers, with multiple values in a column for each record. I would like to produce a table where within each group it returns the minimum non-negative, non-NA value. But if that negative number or NA corresponds to the only row for that record, then I would like it to return NA.

data <- data.frame(record = c(1, 1, 1, 2, 3, 3, 4), value = c(-1, 2, 40, -10, 3, 4, NA))

| record | value |
------------------
|   1    |  -1   |
|   1    |   2   |
|   1    |   40  |
|   2    |  -10  |
|   3    |   3   |
|   3    |   4   |
|   4    |   NA  |
------------------

​x
 
data <- data.frame(record = c(1, 1, 1, 2, 3, 3, 4), value = c(-1, 2, 40, -10, 3, 4, NA))​| record | value |------------------|   1    |  -1   ||   1    |   2   ||   1    |   40  ||   2    |  -10  ||   3    |   3   ||   3    |   4   ||   4    |   NA  |------------------​

I would like to get

| record |  out  |
------------------
|   1    |   2   |
|   2    |  NA   |
|   3    |   3   |
|   4    |  NA   |
------------------

 
| record |  out  |------------------|   1    |   2   ||   2    |  NA   ||   3    |   3   ||   4    |  NA   |------------------​

In R I have tried:

filter(data, value >= 0 | is.na(value))
data %>% group_by(record) %>% summarize(out = min(value))

 
filter(data, value >= 0 | is.na(value))data %>% group_by(record) %>% summarize(out = min(value))​

In MySQL I have tried

select min(value) as out
from data
where (value >= 0 or value is null))

 
select min(value) as outfrom datawhere (value >= 0 or value is null))​

However, both of these solutions drop record #2 from the table.

How do I include a clause that detects the number of rows per record and doesn’t eliminate single records?

Answer

One option is to replace those ‘value’ less than 0 to NA, grouped by ‘record’, slice the rows where we get the min of ‘value’ or if all are NA, get the first row

data %>%
   mutate(value = replace(value, value < 0, NA)) %>% 
   group_by(record) %>% 
   slice(if(all(is.na(value))) 1 else which.min(value))
# A tibble: 4 x 2
# Groups:   record [4]
#  record value
#   <dbl> <dbl>
#1      1     2
#2      2    NA
#3      3     3
#4      4    NA

 
data %>%   mutate(value = replace(value, value < 0, NA)) %>%    group_by(record) %>%    slice(if(all(is.na(value))) 1 else which.min(value))# A tibble: 4 x 2# Groups:   record [4]#  record value#   <dbl> <dbl>#1      1     2#2      2    NA#3      3     3#4      4    NA​

Advertisement

Answer