I have column called name
under a table in Databricks.
I want to find a way to select only those rows from a table, which contains at-least one alphabet character in the name
column.
Example values in the column:
12243 #123-(23) $ank ada124$% () !asd 122acs# gmgd32
Expected: I need to pick only those values which contains at least one alphabet in it. Or in other words, I need a way to exclude all the rows which contains only numbers and special characters.
So the Expected output should be as below :
$ank ada124$% !asd 122acs# gmgd32
because these contains at least one alphabet in them.
I am using pyspark-sql in data bricks.
Advertisement
Answer
You can use rlike
with regex:
import pyspark.sql.functions as F df.filter(F.col("name").rlike(".*[a-zA-Z]+.*")).show() #+--------+ #| name| #+--------+ #| $ank| #|ada124$%| #| !asd| #| 122acs#| #| gmgd32| #+--------+
Spark SQL equivalent query:
SELECT * FROM df WHERE name RLIKE '.*[a-zA-Z]+.*'