Skip to content
Advertisement

Create a boolean column and fill it if other column contains a particular string in Pyspark

I am new to Pyspark Dataframe.

I have a pyspark dataframe which has a column which contains value in below format

Col1
a+
b+
a-
b-

I want to create another boolean column (Col2). Value for this column will be true if Col1 is having + in its value else it will be False.

I tried below code after research on Googleverse but it gave unexpected EOF while parsing error

DF = DF.withColumn("col2", F.when(DF.filter(DF.col1.like('+')), True).otherwise(False)

I also tried below code but that is also giving error Condition should be a column

df = DF.withColumn("col2", F.when(DF.filter("col1 like '%-%'")=="-", True).otherwise(False))

Please assist me on this

Advertisement

Answer

You don’t need to use filter to scan each row of col1. You can just use the column’s value inside when and try to match it with the %+ literal that indicates that you are searching for a + character at the very end of the String.

DF.withColumn("col2", when(col("col1").like("%+"), true).otherwise(false))

This will result in the following DataFrame:

+----+-----+
|col1| col2|
+----+-----+
|  a+| true|
|  b+| true|
|  a-|false|
|  d-|false|
+----+-----+

You can study more about the when/otherwise functionality here and here.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement