I am new to Pyspark Dataframe.
I have a pyspark dataframe which has a column which contains value in below format
| Col1 |
|---|
| a+ |
| b+ |
| a- |
| b- |
I want to create another boolean column (Col2). Value for this column will be true if Col1 is having + in its value else it will be False.
I tried below code after research on Googleverse but it gave unexpected EOF while parsing error
DF = DF.withColumn("col2", F.when(DF.filter(DF.col1.like('+')), True).otherwise(False)
I also tried below code but that is also giving error Condition should be a column
df = DF.withColumn("col2", F.when(DF.filter("col1 like '%-%'")=="-", True).otherwise(False))
Please assist me on this
Advertisement
Answer
You don’t need to use filter to scan each row of col1. You can just use the column’s value inside when and try to match it with the %+ literal that indicates that you are searching for a + character at the very end of the String.
DF.withColumn("col2", when(col("col1").like("%+"), true).otherwise(false))
This will result in the following DataFrame:
+----+-----+ |col1| col2| +----+-----+ | a+| true| | b+| true| | a-|false| | d-|false| +----+-----+
You can study more about the when/otherwise functionality here and here.