Skip to content
Advertisement

Select rows from a table which contains at-least one alphabet in the column

I have column called name under a table in Databricks.

I want to find a way to select only those rows from a table, which contains at-least one alphabet character in the name column.

Example values in the column:

12243
#123-(23)
$ank
ada124$%
()
!asd
122acs#
gmgd32

Expected: I need to pick only those values which contains at least one alphabet in it. Or in other words, I need a way to exclude all the rows which contains only numbers and special characters.

So the Expected output should be as below :

$ank
ada124$%
!asd
122acs#
gmgd32

because these contains at least one alphabet in them.

I am using pyspark-sql in data bricks.

Advertisement

Answer

You can use rlike with regex:

import pyspark.sql.functions as F

df.filter(F.col("name").rlike(".*[a-zA-Z]+.*")).show()

#+--------+
#|    name|
#+--------+
#|    $ank|
#|ada124$%|
#|    !asd|
#| 122acs#|
#|  gmgd32|
#+--------+

Spark SQL equivalent query:

SELECT * 
FROM   df
WHERE  name RLIKE '.*[a-zA-Z]+.*'
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement