Skip to content
Advertisement

Improve SQL query to find range between start and end date

I’m working with a database called international_education from the world_bank_intl_education dataset of bigquery-public-data.

My aim is to plot a line graph with countries who have had the biggest and smallest change in Population growth (annual %) (one of the indicator_name values).

I have done this below using two partitions finding the first and last value of the year by each country but I’m rough on my SQL and am wondering if there is a way to optimize this formula.

Convert to pandas dataframe.

Resulting table.

I would then plot this with python as follows.

Advertisement

Answer

You don’t need the CTE and you don’t need the window frame definitions. So this should be equivalent:

Note that LAST_VALUE() is finicky with window frame definitions. So I routinely just use FIRST_VALUE() with the order by reversed.

If you want just one row per country, then you need aggregation. BigQuery doesn’t have “first” and “last” aggregation functions, but they are very easy to do with arrays:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement