Skip to content
Advertisement

SparkSQLContext dataframe Select query based on column array

This is my dataframe:

  authors: array (nullable = true)-->
    element: string (containsNull = true)

I want to select all books where the author is Udo Haiber.

spark.sql("select *  from f  where authors="Udo Haiber" ").show

but of course it didn’t work because authors is array.

Advertisement

Answer

You can use array_contains to check if the author is inside the array:

spark.sql("select * from f where array_contains(authors, 'Udo Haiber')")

Use single quotes to quote the author name because you’re using double quotes for the query string.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement