Skip to content
Advertisement

SSIS – Inserting records only if they don’t already exist based on matching at least 1 of multiple columns

I have an SSIS package that is supposed to insert data from a flat-file into a database table. For the sake of this example, let’s say I am wanting to insert User records. The data records come from other existing databases, so they already include a previously generated primary ID, which we would like to preserve and continue using. The records also include an email field which should also be unique in the destination table; this is enforced by the schema. A given batch could include records that have previously been “migrated” as well as a user might be in more than one of the original systems with the same email address. In addition to avoiding errors, I would also like to track any possible duplicates (on either the UserID OR the email fields) by writing those to a file.

Because matches can be made on either of the 2 fields, do I need to chain 2 Lookup Transformations? Or is there a way to specify OR operation instead of AND when using multiple columns? Or is there a better-suited transform that I should be using?

Thank you in advance.

Advertisement

Answer

Well, let’s split your question.

Can I do a Lookup with OR condition on two fields?

Yes, you can.
Suppose you are lookup through User table. On the Lookup transformation General section – specify Partial cache or No cache as Cache mode. Then design your query in Connection section. Important — map your data flow fields to query columns in Columns section. So far preparation is done.
Go to Advanced section and tick Modify the SQL statement flag. Modify the SQL statement below with something like

    select * from (SELECT [ColA], [ColB], ...
          FROM [User]) [refTable]
    where [refTable].[ColA] = ? OR [refTable].[ColB] = ?  

Then – hit Parameters button and specify data flow columns which should be mapped to the first ?Parameter0, and so on.
As you see, it is possible but not easy.

Should you use two lookups or single complex lookup?

I would go for two lookups, as it allows you finer control and error reporting – with OR statement you can only report that something among unique fields matched. Doing specific lookups allow you to be more specific and design special flow steps if needed.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement