I have column called name under a table in Databricks.
I want to find a way to select only those rows from a table, which contains at-least one alphabet character in the name column.
Example values in the column:
12243 #123-(23) $ank ada124$% () !asd 122acs# gmgd32
Expected: I need to pick only those values which contains at least one alphabet in it. Or in other words, I need a way to exclude all the rows which contains only numbers and special characters.
So the Expected output should be as below :
$ank ada124$% !asd 122acs# gmgd32
because these contains at least one alphabet in them.
I am using pyspark-sql in data bricks.
Advertisement
Answer
You can use rlike with regex:
import pyspark.sql.functions as F
df.filter(F.col("name").rlike(".*[a-zA-Z]+.*")).show()
#+--------+
#| name|
#+--------+
#| $ank|
#|ada124$%|
#| !asd|
#| 122acs#|
#| gmgd32|
#+--------+
Spark SQL equivalent query:
SELECT * FROM df WHERE name RLIKE '.*[a-zA-Z]+.*'