Skip to content
Advertisement

Select query to group records in output json using BigQuery

I have a datetime partitioned table in BigQuery with a schema like:

timestamp      TIMESTAMP  REQUIRED  
customer_id    STRING     REQUIRED  
plant_id       STRING     REQUIRED  
inverters      RECORD     REPEATED  
inverters. id  STRING     REQUIRED
...

So, if I do a SELECT * FROMmy_table`, it will generate a JSON with a structure like:

[
  {
    "timestamp": "2019-10-20 20:00:00 UTC",
    "customer_id": "5d3f1aee5f70740d54c416e1",
    "plant_id": "5d658cb25f70747626153e83",
    "inverters": [
      {
        "id": "5d77e3225f70746ec02bd8e7",
        ...
      },
      ...
    ]
  },
  ...
]

And that format works well in most of my use cases, except for one.

So, if I wanted the id’s of all the inverters i would do

SELECT DISTINCT id 
FROM `my_table`
CROSS JOIN UNNEST(inverters)

But I also need the timestamps grouped by inverter, but if I do, let’s say, a nested query for getting the timestamps by id, like:

SELECT inverter, timestamp
FROM (
  SELECT DISTINCT id 
  FROM `my_table`
  CROSS JOIN UNNEST(inverters)
) AS inverter, `my_table`

then the resulting json would be:

[
  {
    "inverter": {
      "id": "5d791a505f70744c233b819d"
    },
    "timestamp": "2020-01-21 11:15:00 UTC"
  },
  {
    "inverter": {
      "id": "5d791a505f70744c233b819d"
    },
    "timestamp": "2020-01-21 00:20:00 UTC"
  },
  {
    "inverter": {
      "id": "5d791a505f70744c233b819d"
    },
    "timestamp": "2020-01-21 04:05:00 UTC"
  },
  ...
]

Now, that was a long introduction.

The question is: Is there a way of querying the data so the result of the query looks something like the following?

[
  {
    "inverter": {
      "id": "5d791a505f70744c233b819d",
      "timestamp": [
        "2020-01-21 11:15:00 UTC",
        "2020-01-21 00:20:00 UTC",
        "2020-01-21 04:05:00 UTC",
        ...
      ]
    },
  ...
]

Advertisement

Answer

Is this what you want?

SELECT inverter.id, ARRAY_AGG(timestamp)
FROM `my_table` t CROSS JOIN
     UNNEST(t.inverters) inverter
GROUP BY inverter.id;

You say that SELECT * returns JSON. I think this is a function of your API. When you use the BigQuery API, you don’t see JSON.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement