Skip to content
Advertisement

filter stop words from text column – spark SQL

I’m using spark SQL and have a data frame with user IDs & reviews of products. I need to filter stop words from the reviews, and I have a text file with stop words to filter.

I managed to split the reviews to lists of strings, but don’t know how to filter.

this is what I tried to do:

thanks!

Advertisement

Answer

You are a little vague in that you do not allude to the flatMap approach, which is more common.

Here an alternative just examining the dataframe column.

returns – and filter out the columns you do not want.

You see the stop words and the fact that I converted all to lower case and stripped some stuff out.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement