I am looking to do something similar to what I did in R (below) in SQL:
clim_data %>% select(Year, AverageTemperature, State) %>% group_by(Year,State) %>% summarize(value = mean(AverageTemperature), .groups = 'drop') -> clim_data3 colnames(clim_data3)[2] <- "region" clim_data3$region<-tolower(clim_data3$region) clim_data3 %>% filter(Year==1900) -> clim_data1900 clim_data1900<-clim_data1900[,2:3]
The output of this code looks like this:
region value <chr> <dbl> alabama 17.059167 alaska -5.146500 arizona 15.742917 arkansas 15.893417 california 14.51575
So far in SQL I have managed to output a single state using the following code:
select distinct year, round(avg(AverageTemperature) over (partition by year),2) as avgTemp, Country, State from dbo.landTemps where AverageTemperature is not null and Country = 'United States' and state = 'Alabama' and year = 1900
The output looks like this:
year avgTemp Country State 1900 17.06 United States Alabama
However, I am unable to get the unique avgTemp for each state. When I take it to more than one state, I get the same avgTemp for all states in that query. So if I run a query like this:
select distinct year, round(avg(AverageTemperature) over (partition by year),2) as avgTemp, Country, State from dbo.landTemps where AverageTemperature is not null and Country = 'United States' and state like 'A%' and year = 1900
I get the average for those states.
year avgTemp Country State 1900 15.15 United States Alaska 1900 15.15 United States Arizona 1900 15.15 United States Arkansas 1900 15.15 United States Alabama
I am assuming I need to write a subquery of some sort to iterate through each state and give the avgTemp. I tried to partition over the state as well and it didn’t give me what I wanted. My overall goal is to print out each states avgTemp for the given year.
The dataset contains multiple values for each year and state:
select year, round(avg(AverageTemperature) over (partition by year),2) as avgTemp, Country, State from dbo.landTemps where AverageTemperature is not null and Country = 'United States' and state like 'A%' and year = 1900 output: year avgTemp Country State 1900 15.15 United States Alabama 1900 15.15 United States Alabama 1900 15.15 United States Alabama 1900 15.15 United States Alabama
With more Alabama entries and the other ‘A’ states below it, so I think using distinct year is imperative. I am just stuck and expanding the single state out into multiple without then taking the average of all the states in the query. The data set can be found here: https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data?select=GlobalLandTemperaturesByState.csv
Thanks for the help!
Advertisement
Answer
Since you are looking for the average temperature for each state, you should add the State column to the partition by clause.
Try:
select year, round(avg(AverageTemperature) over (partition by year,State),2) as avgTemp, Country, State from dbo.landTemps where AverageTemperature is not null and Country = 'United States' and state like 'A%' and year = 1900