Tag: apache-zeppelin

Is there a way to compare all rows in one column of a dataframe against all rows in another column of another dataframe (spark)?

apache-spark apache-zeppelin pyspark scala sql

I have two dataframes in Spark, both with an IP column. One column has over 800000 entries while the other has 4000 entries. What I want to do is to see if the IP’s in the smaller dataframe appear in the IP column of the large dataframe. At the moment all I can manage is to compare the first row