I have a dataset containing a list of recorded coronavirus cases in countries across the world through a time period of December 2019 to April 2020. I am trying to write an SQL query to give a a cumulative total for the top ten countries through this time period (i.e. the countries with the highest total at the end).
I have started this by writing a query to give a cumulative total for all countries, and then have tried to write a sub-query to limit this list to only include the data for the top ten countries.
SELECT Virus.dateRep, Country.countriesAndTerritories, SUM(Virus.cases) OVER (PARTITION BY Country.geoId ORDER BY Dates.year, Dates.month, Dates.day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'cume_cases', SUM(Virus.deaths) OVER (PARTITION BY Country.geoId ORDER BY Dates.year, Dates.month, Dates.day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'cume_deaths' FROM Virus INNER JOIN Dates ON (Virus.dateRep = Dates.dateRep) INNER JOIN Country ON (Virus.geoId = Country.geoId) WHERE Virus.geoId IN (SELECT Virus.geoId, sum(Virus.cases) as 'total' FROM Virus GROUP BY Virus.geoId ORDER BY total DESC LIMIT 10) GROUP BY Country.geoId, Virus.dateRep ORDER BY Dates.year, Dates.month, Dates.day, Country.countriesAndTerritories ASC;
However, the sub-query only allows for one column to be queried, and so is not working. Any solutions?
Advertisement
Answer
You can do this with only one column:
WHERE Virus.geoId IN (SELECT v.geoId FROM Virus v GROUP BY v.geoId ORDER BY sum(Virus.cases) DESC LIMIT 10 )