Skip to content
Advertisement

Spliting Name in Google BigQuery

Split FullName into firstName and lastName.

I want to split the full name based on “,” into firstName and LastName. anything before “,” should be selected as firstName and anything “,” should be the lastName. I used split(fullName) function and used offset to grab the firstName and the lastName but ran into an error when there were 2 “,” in the fullName. I used this code :

WITH getfirstName AS 
(
  SELECT ID,Name[offset(1)] AS firstName FROM
    (SELECT * FROM 
    (SELECT ID,Name,ARRAY_LENGTH(Name) as length  FROM
    (
    SELECT ID, SPLIT(OPR_FULL) as Name FROM `project.Dataset.Name` )
    )
    WHERE length >=2)
)


, getLastName AS 
(
  SELECT  A.ID,  NAME[OFFSET(0)] AS lastName
  FROM(SELECT SPLIT(OPR_FULL) as Name,ID FROM `project.Dataset.Name`) A
  LEFT JOIN getfirstName B
  ON A.ID=B.ID
)

IS THERE A BETTER WAY TO DO IT? HOW DO I RESOLVE THE DOUBLE “,” ISSUE? if i use OFFSET(2) I get an OffBound error.

Advertisement

Answer

Below is for BigQuery Standard SQL

#standardSQL
SELECT fullName,
  REGEXP_EXTRACT(fullName, r'(.*?),') lastName,
  REGEXP_EXTRACT(fullName, r',(.*)') firstName
FROM `project.dataset.table`  

You can test, play with above using dummy data as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'Geogre,Bailey' fullName UNION ALL
  SELECT 'Kat,JR,Cruz'
)
SELECT fullName,
  REGEXP_EXTRACT(fullName, r'(.*?),') lastName,
  REGEXP_EXTRACT(fullName, r',(.*)') firstName
FROM `project.dataset.table`  

with output

Row fullName        lastName    firstName    
1   Geogre,Bailey   Geogre      Bailey   
2   Kat,JR,Cruz     Kat         JR,Cruz    

In case if comma is missing at all – you can use below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'Geogre,Bailey' fullName UNION ALL
  SELECT 'Kat,JR,Cruz' union all
  SELECT 'Monica Calderon'
)
SELECT fullName,
  REGEXP_EXTRACT(fullName, r'(.*?)(?:,|$)') lastName,
  REGEXP_EXTRACT(fullName, r',(.*)') firstName
FROM `project.dataset.table`

with result

Row fullName            lastName            firstName    
1   Geogre,Bailey       Geogre              Bailey   
2   Kat,JR,Cruz         Kat                 JR,Cruz  
3   Monica Calderon     Monica Calderon     null     
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement