New at PySpark, trying to get a query to run and it seems like it SHOULD run but I get an EOF issue and I’m not sure how to resolve it..
What I’m trying to do is find all rows in blah.table where the value in col “domainname” matches a value from a list of domains. Then I want to grab 2 columns from those rows that contain ID information, and do another search finding all rows in blah.table that contain those pairs of IDs.
So far I have:
df = spark.sql("select * from blah.table where id1,id2 in (select id1,id2 from blah.table where domainname in ('list.com','of.com','domains.com'))")
When I run it I get this error:
mismatched input ',' expecting {<EOF>, ';'}
If I split the query up, this seems to run fine by itself:
df = spark.sql("select id1,id2 from blah.table where domainname in ('list.com','of.com','domains.com')")
How can I make this work?
Advertisement
Answer
select * from blah.table where (id1, id2) in (select id1,id2 from blah.table where domainname in ('list.com','of.com','domains.com'))
Or, use EXISTS
:
select * from blah.table t1 where EXISTS (select * from blah.table t2 where t1.id1 = t2.id1 and t1.id2 = t2.id2 and t2.domainname in ('list.com','of.com','domains.com'))