I have a table with 5 columns.
Country Flow %Rec Date_Received(with timestamp) Date DE DEF 10 2020-03-03 05:05:54 2020-03-03 DE DEF 15 2020-03-03 07:25:24 2020-03-03 DE DEF 20 2020-03-03 04:05:54 2020-03-02 DE ABC 40 2020-03-02 03:05:54 2020-03-02 DE ABC 50 2020-03-02 07:05:54 2020-03-02 DE ABC 20 2020-03-01 06:05:54 2020-03-01
I want to find the % received of the last date and first occurance of date_rec. Output required:
Country Flow %Rec Date_Received(with timestamp) Date DE DEF 10 2020-03-03 05:05:54 2020-03-03 DE ABC 40 2020-03-02 03:05:54 2020-03-02
Advertisement
Answer
In R
, we can do slice
after grouping by ‘Country’, ‘Flow’
library(dplyr) df %>% group_by(Country, Flow) %>% slice(1) # A tibble: 2 x 5 # Groups: Country, Flow [2] # Country Flow `%Rec` `Date_Received(with timestamp)` Date # <chr> <chr> <int> <chr> <chr> #1 DE ABC 40 2020-03-02 03:05:54 2020-03-02 #2 DE DEF 10 2020-03-03 05:05:54 2020-03-03
The above assumes that the ‘Date’ are ordered
(in the OP’s example it is already ordered). If not, then convert to Date
class and use which.max
df %>% group_by(Country, Flow) %>% slice(which.max(as.Date(Date))) # A tibble: 2 x 5 # Groups: Country, Flow [2] # Country Flow `%Rec` `Date_Received(with timestamp)` Date # <chr> <chr> <int> <chr> <chr> #1 DE ABC 40 2020-03-02 03:05:54 2020-03-02 #2 DE DEF 10 2020-03-03 05:05:54 2020-03-03
data
df <- structure(list(Country = c("DE", "DE", "DE", "DE", "DE", "DE" ), Flow = c("DEF", "DEF", "DEF", "ABC", "ABC", "ABC"), `%Rec` = c(10L, 15L, 20L, 40L, 50L, 20L), `Date_Received(with timestamp)` = c("2020-03-03 05:05:54", "2020-03-03 07:25:24", "2020-03-03 04:05:54", "2020-03-02 03:05:54", "2020-03-02 07:05:54", "2020-03-01 06:05:54"), Date = c("2020-03-03", "2020-03-03", "2020-03-02", "2020-03-02", "2020-03-02", "2020-03-01" )), class = "data.frame", row.names = c(NA, -6L))