I have a Snowflake table which includes addresses, state, first names and last names. I would like to get a query that shows me only the addresses where more than 1 individual with a different last name is present.
So for example, assume that I have
address | fname | lname |State 10 lake road| John | Smith |FL 10 lake road| Julie | Gallagher|FL 3 gator cove| Jack | Hoyt |FL 3 gator cove| Debra | Hoyt |FL
I would like the query to return only 1 row in that example: 10 lake road. Because it’s the only house where there is more than 1 unique last name present.
I am currently using
SELECT distinct a.address, a.fname, a.lname, a.state FROM clients_addresses a WHERE a.state = 'FL' qualify count(1) over( partition by a.lname) > 1 order by a.address
However, this is just returning the addresses where there is more than 1 person, it doesn’t care if the last name is repeated. That’s what I’m trying to avoid.
I can’t quite understand where the query is going wrong. Snowflake doesn’t like using any distinct keyword after the initial select, and even if I use it, it only returns 1 occurrence of each address, but it’s still just addresses with more than 1 person, even if there was only 1 last name in the address.
It doesn’t need to involve the keyword “qualify”, I know Snowflake also accepts other things such as subselects that might help with this problem.
Advertisement
Answer
I would like the query to return only 1 row in that example: 10 lake road.
This sounds like aggregation:
SELECT a.address, count(*) FROM clients_addresses a WHERE a.state = 'FL' GROUP BY a.address HAVING COUNT(DISTINCT a.lname) > 1;
If you want the original rows (which is not what your question asks for), you can use:
SELECT a.* FROM clients_addresses a WHERE a.state = 'FL' QUALITY COUNT(DISTINCT a.lname) OVER (PARTITION BY a.address) > 1;