Skip to content
Advertisement

How to join all keys correctly / How to identify join keys

so I’m having problems with creating tables, join, or at least I thought, when joining multiple tables resulting in duplicate result.

I have one big table which contains all the data. Also, I have Multiple sub-table(partitioned table) that was created from the big table and contains some information of the big table. I want the query which extract from the sub-tables to have the same result.

This is my query of how I created the 3 sub-tables. first table – traffic

Second table – hits page

The third table – hits

Now, the query which I used is the following:

As you can see, I joined all columns which appear on 3 of these sub-table fullVisitorId,visitId,visitStartTime,date,visitNumber except userId, pscampaignid, universal_Id, ps_project_Id which has no data and show no result when connected. This return the following result:

I have made updates to how I created those 3 tables, the results look closer now. I have a feeling that it was due to how I created the table, I will focus on that now.:

But, it should return something like this

I’m having trouble finding out the wrong in this problem. I am not sure if it is because the way I unnest or the join all the columns which appear on all of these 3 tables itself or others. Thank you in advance for all your inputs.

Advertisement

Answer

Alright guys, looks like I found an answer. I’m not sure if this is an absolute answer but it works for me so far, feel free to correct me if im wrong. So, I believe the problem here is not joining all the join keys causing the data to duplicate. I was trying to query the hits level data but did not join hit level key. Therefore, I did connect one more key which is hit.hitNumber to connect all hit level row to the surface. To anybody unfamiliar with this, I think in hit level there are arrays(data within data). It looks something like this. Each row has more than 1 data, so we need to unwrap the data to be usable. Hence, by unnesting and connecting said data with hit.hitNumber, I able to get the correct number compare to google analytics.

array image

By unnesting, it should look like this.

enter image description here

Here is my code that got it working:

Reference: https://towardsdatascience.com/explore-arrays-and-structs-for-better-performance-in-google-bigquery-8978fb00a5bc

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement