I have a dataset containing a list of recorded coronavirus cases in countries across the world through a time period of December 2019 to April 2020. I am trying to write an SQL query to give a a cumulative total for the top ten countries through this time period (i.e. the countries with the highest total at the end).
I have started this by writing a query to give a cumulative total for all countries, and then have tried to write a sub-query to limit this list to only include the data for the top ten countries.
x
SELECT
Virus.dateRep,
Country.countriesAndTerritories,
SUM(Virus.cases) OVER (PARTITION BY Country.geoId ORDER BY Dates.year, Dates.month, Dates.day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'cume_cases',
SUM(Virus.deaths) OVER (PARTITION BY Country.geoId ORDER BY Dates.year, Dates.month, Dates.day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'cume_deaths'
FROM
Virus
INNER JOIN
Dates ON (Virus.dateRep = Dates.dateRep)
INNER JOIN
Country ON (Virus.geoId = Country.geoId)
WHERE Virus.geoId IN (SELECT
Virus.geoId, sum(Virus.cases) as 'total'
FROM
Virus
GROUP BY
Virus.geoId
ORDER BY
total
DESC
LIMIT 10)
GROUP BY
Country.geoId,
Virus.dateRep
ORDER BY
Dates.year,
Dates.month,
Dates.day,
Country.countriesAndTerritories
ASC;
However, the sub-query only allows for one column to be queried, and so is not working. Any solutions?
Advertisement
Answer
You can do this with only one column:
WHERE Virus.geoId IN (SELECT v.geoId
FROM Virus v
GROUP BY v.geoId
ORDER BY sum(Virus.cases) DESC
LIMIT 10
)