I have a full name field that has data in the format of
Last Name First Name Middle Name
some of the data doesn’t have a Middle Name where as it will be just
Last Name First Name
also some of the data has 2 spaces between the Last Name and First Name where as it comes to be
Last Name First Name
all of this data is in the same field, how would I go about breaking this data up in to 3 seperate fields using an SSIS Expression?
Advertisement
Answer
You’re going to have a bad experience here because of names like “De los Santos Edwin Paul” or any of a host of multi surname conditions.
I advocate using multiple derived columns in cases like this to help you identify where things have gone wrong for future debugging.
A assume the name data column is called NameFull
.
DER SubDoubleSpaces
The first thing I would do is simplify your life. Instead of writing two conditionals for one space and two space between Last name and First name, I’d make everything consistent with one space.
Create a new column via a Derived Column Component called Name
and we’re going to replace all the double spaces with a single one.
REPLACE([NameFull], " ", " ")
DER GetFirstSpace
Add a new derived column component to the Data flow. Add a new column, called FirstSpace
. The expression we want to use is FINDSTRING
FINDSTRING([Name], " ", 1)
DER GetLastName
Add another derived column component into the mix. This will use the value of our new column FirstSpace
to slice out the first element and I’ll stuff it into a column called LastName
SUBSTRING([Name], 1, [FirstSpace])
DER GetSecondSpace
Add a new derived column component to the Data flow. Add a new column, called SecondSpace
. The expression we want to use is similar to the first but the third parameter will be the position of the last space we found (plus 1 to skip it)
FINDSTRING([Name], " ", [FirstSpace])
DER GetSecondName
Similar to the LastName except now we’ll use the First and Second spaces
SUBSTRING([Name], [FirstSpace]+1, [SecondSpace])
DER MiddleName
At this point, you can see the pattern but now we have to deal with your actual data, which I don’t have. What if they don’t have a middle name listed? The lazy/safe approach for middle name is probably to add a derived column and just take the RIGHTmost N characters. If they have a middle name, this will pick it up, otherwise, it should just blank it out.
RIGHT([Name], LEN([Name]) - [SecondSpace] +1)
Answers with similar approach