I’m currently working on a project and I want to summarize a column from a joined table twice. SQL code is this:
SELECT M.date,T.team_long_name AS Home_Team, M.home_team_goal, Te.team_long_name AS Away_Team, M.away_team_goal FROM Match AS M JOIN Team AS T ON T.team_api_id = M.home_team_api_id JOIN Team AS Te ON Te.team_api_id = M.away_team_api_id WHERE match_api_id = 539848;
Database tables are as shown here:
I hope that I have provided all the information needed.
Question: How can I have the same result in R by only using dplyr library?
Table names and structure for the first 10 rows as below:
Match:
structure(list(id = 1:10, country_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), league_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), season = c("2008/2009", "2008/2009", "2008/2009", "2008/2009", "2008/2009", "2008/2009", "2008/2009", "2008/2009", "2008/2009", "2008/2009"), stage = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 10L), date = c("2008-08-17 00:00:00", "2008-08-16 00:00:00", "2008-08-16 00:00:00", "2008-08-17 00:00:00", "2008-08-16 00:00:00", "2008-09-24 00:00:00", "2008-08-16 00:00:00", "2008-08-16 00:00:00", "2008-08-16 00:00:00", "2008-11-01 00:00:00"), match_api_id = c(492473L, 492474L, 492475L, 492476L, 492477L, 492478L, 492479L, 492480L, 492481L, 492564L), home_team_api_id = c(9987L, 10000L, 9984L, 9991L, 7947L, 8203L, 9999L, 4049L, 10001L, 8342L), away_team_api_id = c(9993L, 9994L, 8635L, 9998L, 9985L, 8342L, 8571L, 9996L, 9986L, 8571L ), home_team_goal = c(1L, 0L, 0L, 5L, 1L, 1L, 2L, 1L, 1L, 4L), away_team_goal = c(1L, 0L, 3L, 0L, 3L, 1L, 2L, 2L, 0L, 1L )), row.names = c(NA, 10L), class = "data.frame")
Team:
structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 614L, 1034L), team_api_id = c(9987L, 9993L, 10000L, 9994L, 9984L, 8635L, 9991L, 9998L, 7947L, 9985L, 8203L, 8342L, 9999L, 8571L, 4049L, 9996L, 10001L, 9986L, 9997L, 9989L), team_long_name = c("KRC Genk", "Beerschot AC", "SV Zulte-Waregem", "Sporting Lokeren", "KSV Cercle Brugge", "RSC Anderlecht", "KAA Gent", "RAEC Mons", "FCV Dender EH", "Standard de Liège", "KV Mechelen", "Club Brugge KV", "KSV Roeselare", "KV Kortrijk", "Tubize", "Royal Excel Mouscron", "KVC Westerlo", "Sporting Charleroi", "Sint-Truidense VV", "Lierse SK" )), row.names = c(NA, 20L), class = "data.frame")
In the desired result I used match_api_id = 539848 but as it is not included in this sample data, use one of your own choice.
The main issue is to be able to have team_long_name twice in the result but for different teams, matching by their team_api_id ‘s.
Advertisement
Answer
Up front, the dbplyr
pipe:
tbl_match <- tbl(fakedb, "Match") tbl_team <- tbl(fakedb, "Team") tbl_match %>% filter(match_api_id == 492477) %>% inner_join(select(tbl_team, home_team_api_id = team_api_id, Home_Team = team_long_name), by = "home_team_api_id") %>% inner_join(select(tbl_team, away_team_api_id = team_api_id, Away_Team = team_long_name), by = "away_team_api_id") %>% select(date, Home_Team, Away_Team) %>% collect()
Edited to include collect()
, since without it the output is not a proper frame and/or may not include all relevant data.
from the corresponding DBI call:
DBI::dbGetQuery(fakedb, some_long_query)
Backfill from your sample data. Note that your data is inconsistent and incomplete, so I had to make some assumptions/translations. For instance, your first structure
, which I’m inferring is Match
, does not match the schema as depicted in your picture: it includes extra columns like season
and *_team_goal
. Also, your queried match_api_id
of 539848
is not in the sample data, so I used one that was present. (In the future, I suggest that your code and sample data should be consistent with regards to things like this.)
Code to generate a fake databsae for the purposes of this answer. Starting with your two structure
s as Match
and Team
.
library(dbplyr) library(dplyr) fakedb <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(fakedb, Match) copy_to(fakedb, Team) some_long_query <- ' SELECT M.date, T.team_long_name AS Home_Team, M.home_team_goal, Te.team_long_name AS Away_Team, M.away_team_goal FROM Match AS M JOIN Team AS T ON T.team_api_id = M.home_team_api_id JOIN Team AS Te ON Te.team_api_id = M.away_team_api_id WHERE match_api_id = 492477;' # 539848 DBI::dbGetQuery(fakedb, some_long_query) # date Home_Team home_team_goal Away_Team away_team_goal # 1 2008-08-16 00:00:00 FCV Dender EH 1 Standard de Liège 3 tbl_match <- tbl(fakedb, "Match") tbl_team <- tbl(fakedb, "Team") tbl_match %>% filter(match_api_id == 492477) %>% inner_join(select(tbl_team, home_team_api_id = team_api_id, Home_Team = team_long_name), by = "home_team_api_id") %>% inner_join(select(tbl_team, away_team_api_id = team_api_id, Away_Team = team_long_name), by = "away_team_api_id") %>% select(date, Home_Team, home_team_goal, Away_Team, away_team_goal) %>% collect() # A tibble: 1 x 5 # date Home_Team home_team_goal Away_Team away_team_goal # <chr> <chr> <int> <chr> <int> # 1 2008-08-16 00:00:00 FCV Dender EH 1 Standard de Liège 3