I want to filter a Pyspark DataFrame with a SQL-like IN clause, as in sc = SparkContext() sqlc = SQLContext(sc) df = sqlc.sql(‘SELECT * from my_df WHERE field1 IN a’) where a is the tuple (1, 2, 3). …
Tag: apache-spark
Sparksql filtering (selecting with where clause) with multiple conditions
Hi I have the following issue: All the values that I want to filter on are literal null strings and not N/A or Null values. I tried these three options: numeric_filtered = numeric.filter(numeric[‘LOW’] != ‘null’).filter(numeric[‘HIGH’] != ‘null’).filter(numeric[‘NORMAL’] != ‘null’) numeric_filtered = numeric.filter(numeric[‘LOW’] != ‘null’ AND numeric[‘HIGH’] != ‘null’ AND numeric[‘NORMAL’] != ‘null’) sqlContext.sql(“SELECT * from numeric WHERE LOW != ‘null’