I have a dataset containing a list of recorded coronavirus cases in countries across the world through a time period of December 2019 to April 2020. I am trying to write an SQL query to give a a cumulative total for the top ten countries through this time period (i.e. the countries with the highest total at the end).
I have started this by writing a query to give a cumulative total for all countries, and then have tried to write a sub-query to limit this list to only include the data for the top ten countries.
SELECT
Virus.dateRep,
Country.countriesAndTerritories,
SUM(Virus.cases) OVER (PARTITION BY Country.geoId ORDER BY Dates.year, Dates.month, Dates.day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'cume_cases',
SUM(Virus.deaths) OVER (PARTITION BY Country.geoId ORDER BY Dates.year, Dates.month, Dates.day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'cume_deaths'
FROM
Virus
INNER JOIN
Dates ON (Virus.dateRep = Dates.dateRep)
INNER JOIN
Country ON (Virus.geoId = Country.geoId)
WHERE Virus.geoId IN (SELECT
Virus.geoId, sum(Virus.cases) as 'total'
FROM
Virus
GROUP BY
Virus.geoId
ORDER BY
total
DESC
LIMIT 10)
GROUP BY
Country.geoId,
Virus.dateRep
ORDER BY
Dates.year,
Dates.month,
Dates.day,
Country.countriesAndTerritories
ASC;
However, the sub-query only allows for one column to be queried, and so is not working. Any solutions?
Advertisement
Answer
You can do this with only one column:
WHERE Virus.geoId IN (SELECT v.geoId
FROM Virus v
GROUP BY v.geoId
ORDER BY sum(Virus.cases) DESC
LIMIT 10
)