Skip to content
Advertisement

How can I translate this SQL code to R script using dplyr?

I’m currently working on a project and I want to summarize a column from a joined table twice. SQL code is this:

SELECT M.date,T.team_long_name AS Home_Team, M.home_team_goal, Te.team_long_name AS Away_Team, M.away_team_goal
FROM Match AS M JOIN Team AS T
ON T.team_api_id = M.home_team_api_id
JOIN Team AS Te
ON Te.team_api_id = M.away_team_api_id
WHERE match_api_id = 539848;

…and the result is this:
...and the result is this

Database tables are as shown here:
Database tables are as shown here

I hope that I have provided all the information needed.

Question: How can I have the same result in R by only using dplyr library?

Table names and structure for the first 10 rows as below:

Match:

structure(list(id = 1:10, country_id = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), league_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), season = c("2008/2009", "2008/2009", "2008/2009", 
"2008/2009", "2008/2009", "2008/2009", "2008/2009", "2008/2009", 
"2008/2009", "2008/2009"), stage = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 10L), date = c("2008-08-17 00:00:00", "2008-08-16 00:00:00", 
"2008-08-16 00:00:00", "2008-08-17 00:00:00", "2008-08-16 00:00:00", 
"2008-09-24 00:00:00", "2008-08-16 00:00:00", "2008-08-16 00:00:00", 
"2008-08-16 00:00:00", "2008-11-01 00:00:00"), match_api_id = c(492473L, 
492474L, 492475L, 492476L, 492477L, 492478L, 492479L, 492480L, 
492481L, 492564L), home_team_api_id = c(9987L, 10000L, 9984L, 
9991L, 7947L, 8203L, 9999L, 4049L, 10001L, 8342L), away_team_api_id = c(9993L, 
9994L, 8635L, 9998L, 9985L, 8342L, 8571L, 9996L, 9986L, 8571L
), home_team_goal = c(1L, 0L, 0L, 5L, 1L, 1L, 2L, 1L, 1L, 4L), 
    away_team_goal = c(1L, 0L, 3L, 0L, 3L, 1L, 2L, 2L, 0L, 1L
    )), row.names = c(NA, 10L), class = "data.frame")

Team:

structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 614L, 1034L), team_api_id = c(9987L, 
9993L, 10000L, 9994L, 9984L, 8635L, 9991L, 9998L, 7947L, 9985L, 
8203L, 8342L, 9999L, 8571L, 4049L, 9996L, 10001L, 9986L, 9997L, 
9989L), team_long_name = c("KRC Genk", "Beerschot AC", "SV Zulte-Waregem", 
"Sporting Lokeren", "KSV Cercle Brugge", "RSC Anderlecht", "KAA Gent", 
"RAEC Mons", "FCV Dender EH", "Standard de Liège", "KV Mechelen", 
"Club Brugge KV", "KSV Roeselare", "KV Kortrijk", "Tubize", "Royal Excel Mouscron", 
"KVC Westerlo", "Sporting Charleroi", "Sint-Truidense VV", "Lierse SK"
)), row.names = c(NA, 20L), class = "data.frame")

In the desired result I used match_api_id = 539848 but as it is not included in this sample data, use one of your own choice.

The main issue is to be able to have team_long_name twice in the result but for different teams, matching by their team_api_id ‘s.

Advertisement

Answer

Up front, the dbplyr pipe:

tbl_match <- tbl(fakedb, "Match")
tbl_team <- tbl(fakedb, "Team")
tbl_match %>%
  filter(match_api_id == 492477) %>%
  inner_join(select(tbl_team, home_team_api_id = team_api_id, Home_Team = team_long_name),
             by = "home_team_api_id") %>%
  inner_join(select(tbl_team, away_team_api_id = team_api_id, Away_Team = team_long_name),
             by = "away_team_api_id") %>%
  select(date, Home_Team, Away_Team) %>%
  collect()

Edited to include collect(), since without it the output is not a proper frame and/or may not include all relevant data.

from the corresponding DBI call:

DBI::dbGetQuery(fakedb, some_long_query)

Backfill from your sample data. Note that your data is inconsistent and incomplete, so I had to make some assumptions/translations. For instance, your first structure, which I’m inferring is Match, does not match the schema as depicted in your picture: it includes extra columns like season and *_team_goal. Also, your queried match_api_id of 539848 is not in the sample data, so I used one that was present. (In the future, I suggest that your code and sample data should be consistent with regards to things like this.)

Code to generate a fake databsae for the purposes of this answer. Starting with your two structures as Match and Team.

library(dbplyr)
library(dplyr)
fakedb <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(fakedb, Match)
copy_to(fakedb, Team)

some_long_query <- '
  SELECT
    M.date, T.team_long_name AS Home_Team, M.home_team_goal,
    Te.team_long_name AS Away_Team, M.away_team_goal
  FROM
    Match AS M
    JOIN Team AS T ON T.team_api_id = M.home_team_api_id
    JOIN Team AS Te ON Te.team_api_id = M.away_team_api_id
  WHERE
    match_api_id = 492477;' # 539848

DBI::dbGetQuery(fakedb, some_long_query)
#                  date     Home_Team home_team_goal         Away_Team away_team_goal
# 1 2008-08-16 00:00:00 FCV Dender EH              1 Standard de Liège              3


tbl_match <- tbl(fakedb, "Match")
tbl_team <- tbl(fakedb, "Team")
tbl_match %>%
  filter(match_api_id == 492477) %>%
  inner_join(select(tbl_team, home_team_api_id = team_api_id, Home_Team = team_long_name),
             by = "home_team_api_id") %>%
  inner_join(select(tbl_team, away_team_api_id = team_api_id, Away_Team = team_long_name),
             by = "away_team_api_id") %>%
  select(date, Home_Team, home_team_goal, Away_Team, away_team_goal) %>%
  collect()
# A tibble: 1 x 5
#   date                Home_Team     home_team_goal Away_Team         away_team_goal
#   <chr>               <chr>                  <int> <chr>                      <int>
# 1 2008-08-16 00:00:00 FCV Dender EH              1 Standard de Liège              3
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement