I am new to Pyspark Dataframe.
I have a pyspark dataframe which has a column which contains value in below format
Col1 |
---|
a+ |
b+ |
a- |
b- |
I want to create another boolean column (Col2). Value for this column will be true if Col1 is having + in its value else it will be False.
I tried below code after research on Googleverse but it gave unexpected EOF while parsing error
DF = DF.withColumn("col2", F.when(DF.filter(DF.col1.like('+')), True).otherwise(False)
I also tried below code but that is also giving error Condition should be a column
df = DF.withColumn("col2", F.when(DF.filter("col1 like '%-%'")=="-", True).otherwise(False))
Please assist me on this
Advertisement
Answer
You don’t need to use filter
to scan each row of col1
. You can just use the column’s value inside when
and try to match it with the %+
literal that indicates that you are searching for a +
character at the very end of the String
.
DF.withColumn("col2", when(col("col1").like("%+"), true).otherwise(false))
This will result in the following DataFrame:
+----+-----+ |col1| col2| +----+-----+ | a+| true| | b+| true| | a-|false| | d-|false| +----+-----+
You can study more about the when
/otherwise
functionality here and here.