I have a datetime partitioned table in BigQuery with a schema like:
timestamp TIMESTAMP REQUIRED customer_id STRING REQUIRED plant_id STRING REQUIRED inverters RECORD REPEATED inverters. id STRING REQUIRED ...
So, if I do a SELECT * FROM
my_table`, it will generate a JSON with a structure like:
[ { "timestamp": "2019-10-20 20:00:00 UTC", "customer_id": "5d3f1aee5f70740d54c416e1", "plant_id": "5d658cb25f70747626153e83", "inverters": [ { "id": "5d77e3225f70746ec02bd8e7", ... }, ... ] }, ... ]
And that format works well in most of my use cases, except for one.
So, if I wanted the id’s of all the inverters i would do
SELECT DISTINCT id FROM `my_table` CROSS JOIN UNNEST(inverters)
But I also need the timestamps grouped by inverter, but if I do, let’s say, a nested query for getting the timestamps by id, like:
SELECT inverter, timestamp FROM ( SELECT DISTINCT id FROM `my_table` CROSS JOIN UNNEST(inverters) ) AS inverter, `my_table`
then the resulting json would be:
[ { "inverter": { "id": "5d791a505f70744c233b819d" }, "timestamp": "2020-01-21 11:15:00 UTC" }, { "inverter": { "id": "5d791a505f70744c233b819d" }, "timestamp": "2020-01-21 00:20:00 UTC" }, { "inverter": { "id": "5d791a505f70744c233b819d" }, "timestamp": "2020-01-21 04:05:00 UTC" }, ... ]
Now, that was a long introduction.
The question is: Is there a way of querying the data so the result of the query looks something like the following?
[ { "inverter": { "id": "5d791a505f70744c233b819d", "timestamp": [ "2020-01-21 11:15:00 UTC", "2020-01-21 00:20:00 UTC", "2020-01-21 04:05:00 UTC", ... ] }, ... ]
Advertisement
Answer
Is this what you want?
SELECT inverter.id, ARRAY_AGG(timestamp) FROM `my_table` t CROSS JOIN UNNEST(t.inverters) inverter GROUP BY inverter.id;
You say that SELECT *
returns JSON. I think this is a function of your API. When you use the BigQuery API, you don’t see JSON.