Skip to content
Advertisement

Merge left on load data from BigQuery

I have an input table: input and one or more maptables, where input contains data for multiple identifiers and dates stacked under each other. The schemas are as follows:

What I do now is that I run a pipeline that for each date/(id, id2, id3) combination loads the data from input and applies a left merge in python against one or more maptables (both as a DataFrame). I then stream the results to a third table named output with the schema:

Now I was thinking that this is not really efficient. If I change one value from a maptable, I have to redo all the pipelines for each date/(id, id2, id3) combination.

  • Therefore I was wondering if its possible to apply directly a left merge when loading the data? How would such a Query look like?

  • In the case of multiple maptables and target columns, would it also be beneficial to do the same? Would the query not become too complex or unreadable, in particular as the id columns are not the same?

Advertisement

Answer

How would such a Query look like?

Below is for BigQuery Standard SQL

In the case of multiple maptables and target columns …

If all your map tables are same/similar to two maps in your question – in this case it is just extra LEFT JOIN for each extra map

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement