My client sent me name data as a Name string which includes the last, first, and middle names in a single entry. I need them split into LastName, FirstName, and MiddleName. I have found some scripts online, but they don’t serve my purposes because they either (1) use a different format, or (2) don’t handle edge cases very well. See the examples below:
- Nightingale, Florence -> Florence Nightingale
- Bond, James Bond -> James Bond Bond
- Abbott, Edwin A. -> Edwin A. Abbott
Can someone help me write a SQL Server script that splits a string into the various pieces I’m looking for?
Advertisement
Answer
Please note the following:
- Always request normalized data to ensure the highest data quality. I tried to enumerate all possible cases for last, first, and middle name combinations but I’m sure I did not get all of them.
- My script requires the format: LastName@DELIMITER1@DELIMITER2FirstName@DELIMITER2MiddleName, but can be easily altered for other formats.
- This script does not separate tiles like Dr., or handle suffixes.
- Credit to MemKills for the idea of the test data set, which I expanded.
>
x
DECLARE @DELIMITER1 varchar(1), @DELIMITER2 varchar(1), @MAX_LENGTH int
SET @DELIMITER1 = ','
SET @DELIMITER2 = ' '
SET @MAX_LENGTH = 50
SELECT [Name],
SUBSTRING(Name,1,CHARINDEX(@DELIMITER1,Name) -1) AS LastName, -- Less one char for @DELIMITER1
SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH) AS FirstAndMiddle, -- Plus two for @DELIMITER1 and @DELIMITER2
CASE
-- Middle name follows two-name first names like Mary Ann
WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
THEN SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH)
ELSE NULL
END AS MiddleName,
CASE
-- Count the number of @DELIMITER2. Choose the string between the @DELIMITER1 and the final @DELIMITER2.
WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
Then SUBSTRING(Name, CHARINDEX(@DELIMITER1,Name)+ 2,
(LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH))
- LEN(SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH))))
ELSE SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)
END AS FirstName
FROM
(
SELECT [Name] = 'Zzz, A' UNION ALL
SELECT 'de Zzz, Aaa' UNION ALL
SELECT 'Zzz, Aaaa' UNION ALL
SELECT 'Zzz, A B' UNION ALL
SELECT 'Zzz, Aaaa Bbbb' UNION ALL
SELECT 'de Zzz, Aaaa' UNION ALL
SELECT 'de Zzz, Aaaa B' UNION ALL
SELECT 'van Zzz, Aaaa B' UNION ALL
SELECT 'Yyy-Zzz, Aaaa B' UNION ALL
SELECT 'd''Zzz, Aaaa B' UNION ALL
SELECT 'Zzz, Aaaa Bbbb C' UNION ALL
SELECT 'Zzz, Aaaa Bbbb Cccc'
) AS X