Tag: apache-spark-sql

Including null values in an Apache Spark Join

apache-spark apache-spark-sql join scala sql

I would like to include null values in an Apache Spark join. Spark doesn’t include rows with null by default. Here is the default Spark behavior. val numbersDf = Seq( (“123”), (“456”), (null),…

What is the difference between cube, rollup and groupBy operators?

apache-spark apache-spark-sql cube rollup sql

I can’t find any detailed documentation regarding the differences. I do notice a difference, because when interchanging cube and groupBy function calls, I get different results. I noticed that for the result using cube, I got a lot of null values on the expressions where I used to use groupBy. Answer These are not intended to work in the same

Sparksql filtering (selecting with where clause) with multiple conditions

apache-spark apache-spark-sql pyspark python sql

Hi I have the following issue: All the values that I want to filter on are literal null strings and not N/A or Null values. I tried these three options: numeric_filtered = numeric.filter(numeric[‘LOW’] != ‘null’).filter(numeric[‘HIGH’] != ‘null’).filter(numeric[‘NORMAL’] != ‘null’) numeric_filtered = numeric.filter(numeric[‘LOW’] != ‘null’ AND numeric[‘HIGH’] != ‘null’ AND numeric[‘NORMAL’] != ‘null’) sqlContext.sql(“SELECT * from numeric WHERE LOW != ‘null’