Find total IDs between two dates that satisfies a condition

Question

I have a dataset PosNeg like this. I need to find count of ID&#8217;s who have a pattern like this- P N P P or N P N N P N &#8211; that is having at least one N (negative) between two P&#8217;s(positive). If this pattern occurs at least once, then count that ID. Date is always in ascending order.

Accepted Answer

Here is one option with str_c/str_detect – grouped by ‘ID’, paste the ‘Test’ elements and then check whether the pattern P followed by one or more N (N+) and then a P occurslibrary(stringr)library(dplyr)df1 %>% group_by(ID) %>% summarise(isP = str_detect(str_c(substr(Test,1, 1) collapse = ""), "PN+P"), .groups = 'drop') %>% filter(isP)# A tibble: 2 × 2 ID isP 1 1 TRUE 2 4 TRUE Using the OP’s new data> df2 %>% group_by(ID) %>% summarise(isP = str_detect(str_c(substr(TEST,1, 1), collapse = ""), "PN+P"), .groups = 'drop') %>% filter(isP)# A tibble: 1 × 2 ID isP 1 1 TRUE EDIT: added substr to extract the first letter in ‘Test’ column as the original data values are not ‘P’ or ‘N’ as showed in exampledatadf2 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), DATE = c("2020-06-12", "2020-08-20", "2020-10-04", "2020-12-09", "2021-01-08", "2021-02-05", "2021-03-26", "2021-05-26", "2021-06-30", "2021-07-21", "2021-08-23", "2021-09-16", "2021-10-08", "2021-10-18", "2021-10-29"), TEST = c("N", "N", "N", "N", "P", "P", "P", "P", "N", "N", "N", "N", "N", "N", "P")), class = "data.frame", row.names = c(NA, -15L))

ID	Test	Date
1	P	2021-01-02
1	P	2021-01-08
1	N	2021-02-25
1	P	2021-03-26
2	N	2021-02-05
2	P	2021-03-04
2	N	2021-03-30
3	N	2021-01-24
3	P	2021-02-10
4	N	2021-02-15
4	P	2021-02-28
4	N	2021-03-18
4	P	2021-04-11

ID	DATE	TEST
1	2020-06-12	N
1	2020-08-20	N
1	2020-10-04	N
1	2020-12-09	N
1	2021-01-08	P
1	2021-02-05	P
1	2021-03-26	P
1	2021-05-26	P
1	2021-06-30	N
1	2021-07-21	N
1	2021-08-23	N
1	2021-09-16	N
1	2021-10-08	N
1	2021-10-18	N
1	2021-10-29	P

Advertisement

Answer

data