Skip to content
Advertisement

How can I translate this SQL code to R script using dplyr?

I’m currently working on a project and I want to summarize a column from a joined table twice. SQL code is this:

…and the result is this:
...and the result is this

Database tables are as shown here:
Database tables are as shown here

I hope that I have provided all the information needed.

Question: How can I have the same result in R by only using dplyr library?

Table names and structure for the first 10 rows as below:

Match:

Team:

In the desired result I used match_api_id = 539848 but as it is not included in this sample data, use one of your own choice.

The main issue is to be able to have team_long_name twice in the result but for different teams, matching by their team_api_id ‘s.

Advertisement

Answer

Up front, the dbplyr pipe:

Edited to include collect(), since without it the output is not a proper frame and/or may not include all relevant data.

from the corresponding DBI call:


Backfill from your sample data. Note that your data is inconsistent and incomplete, so I had to make some assumptions/translations. For instance, your first structure, which I’m inferring is Match, does not match the schema as depicted in your picture: it includes extra columns like season and *_team_goal. Also, your queried match_api_id of 539848 is not in the sample data, so I used one that was present. (In the future, I suggest that your code and sample data should be consistent with regards to things like this.)

Code to generate a fake databsae for the purposes of this answer. Starting with your two structures as Match and Team.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement