Suppose I’ve got this data:
key id value ----- ---- -------- 30 1 A 30 2 B 40 1 C 40 2 D 50 1 A 50 2 C 50 3 D 60 1 A 60 2 B 60 3 C 60 4 D
I want a list aggregation by key in R/SQL that would produce this last column:
key id value Class ----- ---- -------- -------- 30 1 A A,B 30 2 B B 40 1 C C,D 40 2 D D 50 1 A A,C,D 50 2 C C,D 50 3 D D 60 1 A A,B,C,D 60 2 B B,C,D 60 3 C C,D 60 4 D D
The value in Class in the current row always includes current and future values within a group (key).
Any Suggestions?
Many thanks!
Advertisement
Answer
In R, you can group by key
and create a sequence from id
to the number of rows in the group and paste the value
together.
library(dplyr) df %>% group_by(key) %>% mutate(Class = purrr::map_chr(id, ~toString(value[.x:n()]))) # If id doesn't represent the row number for each key you can use #mutate(Class = purrr::map_chr(row_number(), ~toString(value[.x:n()]))) # key id value Class # <int> <int> <chr> <chr> # 1 30 1 A A, B # 2 30 2 B B # 3 40 1 C C, D # 4 40 2 D D # 5 50 1 A A, C, D # 6 50 2 C C, D # 7 50 3 D D # 8 60 1 A A, B, C, D # 9 60 2 B B, C, D #10 60 3 C C, D #11 60 4 D D
data
df <- structure(list(key = c(30L, 30L, 40L, 40L, 50L, 50L, 50L, 60L, 60L, 60L, 60L), id = c(1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 4L), value = c("A", "B", "C", "D", "A", "C", "D", "A", "B", "C", "D")), class = "data.frame", row.names = c(NA, -11L))