Skip to content
Advertisement

Unnest Query optimisation for singular record

I’m trying to optimise my query for when an internal customer only want to return one result *(and it’s associated nested dataset). My aim is to reduce the query process size.

However, it appears to be the exact same value regardless of whether I’m querying for 1 record (with unnested 48,000 length array) or the whole dataset (10,000 records with unnest total 514,048,748 in total length of arrays)!

So my table results for one record query:

SELECT test_id, value
FROM <my_table_reference>, unnest(TimeSeries)timeseries
WHERE test_id= "T0003" and SignalName = "Distance";

looks like this:

test_id value
T0003 1.0
T0003 2.0
T0003 3.0
T0003 4.0

(48000 rows)

This will continue until value (Distance) = 48000m (48000 rows) for 1 record: WHERE == 'T0003.

Total process was 3.84GB

For whole table (~10,000 records):

SELECT test_id, value
FROM <my_table_reference>, unnest(TimeSeries)timeseries
WHERE SignalName = "Distance";

looks like this:

test_id value
T0001 1.0
T0001 2.0
T0001 3.0
T0001 4.0

(514,048,748 rows)

Total process was 3.84GB

Why are the process size the same for both queries and how can I optimise this for singular row extractions?

Advertisement

Answer

This is happening because there is still need for a full table scan to find all the test IDs that are equal to the specified one.

It is not clear from your example which columns are part of the timeseries record. In case test_id is not one of them, I would suggest to cluster the table on the test_id column. By clustering, the data will be automatically organized according to the contents of the test_id column.

So, when you query with a filter on that column a full scan won’t be needed to find all values.

Read more about clustered tables here.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement