I have a Snowflake table which includes addresses, state, first names and last names. I would like to get a query that shows me only the addresses where more than 1 individual with a different last name is present.
So for example, assume that I have
address | fname | lname |State
10 lake road| John | Smith |FL
10 lake road| Julie | Gallagher|FL
3 gator cove| Jack | Hoyt |FL
3 gator cove| Debra | Hoyt |FL
I would like the query to return only 1 row in that example: 10 lake road. Because it’s the only house where there is more than 1 unique last name present.
I am currently using
SELECT distinct a.address, a.fname, a.lname, a.state
FROM clients_addresses a
WHERE a.state = 'FL'
qualify count(1) over( partition by a.lname) > 1
order by a.address
However, this is just returning the addresses where there is more than 1 person, it doesn’t care if the last name is repeated. That’s what I’m trying to avoid.
I can’t quite understand where the query is going wrong. Snowflake doesn’t like using any distinct keyword after the initial select, and even if I use it, it only returns 1 occurrence of each address, but it’s still just addresses with more than 1 person, even if there was only 1 last name in the address.
It doesn’t need to involve the keyword “qualify”, I know Snowflake also accepts other things such as subselects that might help with this problem.
Advertisement
Answer
I would like the query to return only 1 row in that example: 10 lake road.
This sounds like aggregation:
SELECT a.address, count(*)
FROM clients_addresses a
WHERE a.state = 'FL'
GROUP BY a.address
HAVING COUNT(DISTINCT a.lname) > 1;
If you want the original rows (which is not what your question asks for), you can use:
SELECT a.*
FROM clients_addresses a
WHERE a.state = 'FL'
QUALITY COUNT(DISTINCT a.lname) OVER (PARTITION BY a.address) > 1;