Improve SQL query to find range between start and end date

Question

I&#8217;m working with a database called international_education from the world_bank_intl_education dataset of bigquery-public-data. My aim is to plot a line graph with countries who have had the biggest and smallest change in Population growth (annual %) (one of the indicator_name values). I have done this b…

Accepted Answer

You don&#8217;t need the CTE and you don&#8217;t need the window frame definitions.  So this should be equivalent:SELECT country_name, year, value,       (first_value(value) OVER (PARTITION BY country_name ORDER BY YEAR DESC) -        first_value(value) OVER (PARTITION BY country_name ORDER BY YEAR)       ) as total_rangeFROM `bigquery-public-data.world_bank_intl_education.international_education`WHERE indicator_name = 'Population growth (annual %)';Note that LAST_VALUE() is finicky with window frame definitions.  So I routinely just use FIRST_VALUE() with the order by reversed.If you want just one row per country, then you need aggregation.  BigQuery doesn&#8217;t have &#8220;first&#8221; and &#8220;last&#8221; aggregation functions, but they are very easy to do with arrays:SELECT country_name,        ((array_agg(value ORDER BY year DESC LIMIT 1))[ordinal(1)] -        (array_agg(value ORDER BY year LIMIT 1))[ordinal(1)]       ) as total_rangeFROM `bigquery-public-data.world_bank_intl_education.international_education`WHERE indicator_name = 'Population growth (annual %)'GROUP BY country_nameORDER BY total_range;

Advertisement

Answer