My client sent me name data as a Name string which includes the last, first, and middle names in a single entry. I need them split into LastName, FirstName, and MiddleName. I have found some scripts online, but they don’t serve my purposes because they either (1) use a different format, or (2) don’t handle edge cases very well. See the examples below:
- Nightingale, Florence -> Florence Nightingale
- Bond, James Bond -> James Bond Bond
- Abbott, Edwin A. -> Edwin A. Abbott
Can someone help me write a SQL Server script that splits a string into the various pieces I’m looking for?
Advertisement
Answer
Please note the following:
- Always request normalized data to ensure the highest data quality. I tried to enumerate all possible cases for last, first, and middle name combinations but I’m sure I did not get all of them.
- My script requires the format: LastName@DELIMITER1@DELIMITER2FirstName@DELIMITER2MiddleName, but can be easily altered for other formats.
- This script does not separate tiles like Dr., or handle suffixes.
- Credit to MemKills for the idea of the test data set, which I expanded.
>
DECLARE @DELIMITER1 varchar(1), @DELIMITER2 varchar(1), @MAX_LENGTH int SET @DELIMITER1 = ',' SET @DELIMITER2 = ' ' SET @MAX_LENGTH = 50 SELECT [Name], SUBSTRING(Name,1,CHARINDEX(@DELIMITER1,Name) -1) AS LastName, -- Less one char for @DELIMITER1 SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH) AS FirstAndMiddle, -- Plus two for @DELIMITER1 and @DELIMITER2 CASE -- Middle name follows two-name first names like Mary Ann WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0 THEN SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH) ELSE NULL END AS MiddleName, CASE -- Count the number of @DELIMITER2. Choose the string between the @DELIMITER1 and the final @DELIMITER2. WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0 Then SUBSTRING(Name, CHARINDEX(@DELIMITER1,Name)+ 2, (LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH)))) ELSE SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH) END AS FirstName FROM ( SELECT [Name] = 'Zzz, A' UNION ALL SELECT 'de Zzz, Aaa' UNION ALL SELECT 'Zzz, Aaaa' UNION ALL SELECT 'Zzz, A B' UNION ALL SELECT 'Zzz, Aaaa Bbbb' UNION ALL SELECT 'de Zzz, Aaaa' UNION ALL SELECT 'de Zzz, Aaaa B' UNION ALL SELECT 'van Zzz, Aaaa B' UNION ALL SELECT 'Yyy-Zzz, Aaaa B' UNION ALL SELECT 'd''Zzz, Aaaa B' UNION ALL SELECT 'Zzz, Aaaa Bbbb C' UNION ALL SELECT 'Zzz, Aaaa Bbbb Cccc' ) AS X