Skip to content
Advertisement

Find first occurance within a group of groups

I have a table with 5 columns.

Country   Flow   %Rec    Date_Received(with timestamp)   Date
DE       DEF    10      2020-03-03 05:05:54       2020-03-03
DE       DEF    15      2020-03-03 07:25:24       2020-03-03
DE       DEF    20      2020-03-03 04:05:54       2020-03-02
DE       ABC    40      2020-03-02 03:05:54       2020-03-02
DE       ABC    50      2020-03-02 07:05:54       2020-03-02
DE       ABC    20      2020-03-01 06:05:54       2020-03-01

I want to find the % received of the last date and first occurance of date_rec. Output required:

Country   Flow   %Rec    Date_Received(with timestamp)   Date
DE       DEF    10      2020-03-03 05:05:54       2020-03-03
DE       ABC    40      2020-03-02 03:05:54       2020-03-02

Advertisement

Answer

In R, we can do slice after grouping by ‘Country’, ‘Flow’

library(dplyr)
df %>%
   group_by(Country, Flow) %>% 
   slice(1)
# A tibble: 2 x 5
# Groups:   Country, Flow [2]
#  Country Flow  `%Rec` `Date_Received(with timestamp)` Date      
#  <chr>   <chr>  <int> <chr>                           <chr>     
#1 DE      ABC       40 2020-03-02 03:05:54             2020-03-02
#2 DE      DEF       10 2020-03-03 05:05:54             2020-03-03

The above assumes that the ‘Date’ are ordered (in the OP’s example it is already ordered). If not, then convert to Date class and use which.max

df %>%
    group_by(Country, Flow) %>%
    slice(which.max(as.Date(Date)))
# A tibble: 2 x 5
# Groups:   Country, Flow [2]
#  Country Flow  `%Rec` `Date_Received(with timestamp)` Date      
#  <chr>   <chr>  <int> <chr>                           <chr>     
#1 DE      ABC       40 2020-03-02 03:05:54             2020-03-02
#2 DE      DEF       10 2020-03-03 05:05:54             2020-03-03

data

df <- structure(list(Country = c("DE", "DE", "DE", "DE", "DE", "DE"
), Flow = c("DEF", "DEF", "DEF", "ABC", "ABC", "ABC"), `%Rec` = c(10L, 
15L, 20L, 40L, 50L, 20L), `Date_Received(with timestamp)` = c("2020-03-03 05:05:54", 
"2020-03-03 07:25:24", "2020-03-03 04:05:54", "2020-03-02 03:05:54", 
"2020-03-02 07:05:54", "2020-03-01 06:05:54"), Date = c("2020-03-03", 
"2020-03-03", "2020-03-02", "2020-03-02", "2020-03-02", "2020-03-01"
)), class = "data.frame", row.names = c(NA, -6L))
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement