I am looking to do something similar to what I did in R (below) in SQL:
clim_data %>%
select(Year, AverageTemperature, State) %>%
group_by(Year,State) %>%
summarize(value = mean(AverageTemperature), .groups = 'drop') -> clim_data3
colnames(clim_data3)[2] <- "region"
clim_data3$region<-tolower(clim_data3$region)
clim_data3 %>%
filter(Year==1900) -> clim_data1900
clim_data1900<-clim_data1900[,2:3]
The output of this code looks like this:
region value
<chr> <dbl>
alabama 17.059167
alaska -5.146500
arizona 15.742917
arkansas 15.893417
california 14.51575
So far in SQL I have managed to output a single state using the following code:
select distinct year, round(avg(AverageTemperature) over (partition by year),2) as avgTemp, Country, State
from dbo.landTemps
where AverageTemperature is not null and
Country = 'United States' and
state = 'Alabama' and
year = 1900
The output looks like this:
year avgTemp Country State
1900 17.06 United States Alabama
However, I am unable to get the unique avgTemp for each state. When I take it to more than one state, I get the same avgTemp for all states in that query. So if I run a query like this:
select distinct year, round(avg(AverageTemperature) over (partition by year),2) as avgTemp, Country, State
from dbo.landTemps
where AverageTemperature is not null and
Country = 'United States' and
state like 'A%' and
year = 1900
I get the average for those states.
year avgTemp Country State
1900 15.15 United States Alaska
1900 15.15 United States Arizona
1900 15.15 United States Arkansas
1900 15.15 United States Alabama
I am assuming I need to write a subquery of some sort to iterate through each state and give the avgTemp. I tried to partition over the state as well and it didn’t give me what I wanted. My overall goal is to print out each states avgTemp for the given year.
The dataset contains multiple values for each year and state:
select year, round(avg(AverageTemperature) over (partition by year),2) as avgTemp, Country, State
from dbo.landTemps
where AverageTemperature is not null and
Country = 'United States' and
state like 'A%' and
year = 1900
output:
year avgTemp Country State
1900 15.15 United States Alabama
1900 15.15 United States Alabama
1900 15.15 United States Alabama
1900 15.15 United States Alabama
With more Alabama entries and the other ‘A’ states below it, so I think using distinct year is imperative. I am just stuck and expanding the single state out into multiple without then taking the average of all the states in the query. The data set can be found here: https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data?select=GlobalLandTemperaturesByState.csv
Thanks for the help!
Advertisement
Answer
Since you are looking for the average temperature for each state, you should add the State column to the partition by clause.
Try:
select year, round(avg(AverageTemperature) over (partition by year,State),2) as avgTemp, Country, State
from dbo.landTemps
where AverageTemperature is not null and
Country = 'United States' and
state like 'A%' and
year = 1900