Skip to content
Advertisement

Check if rows exists in SQL Server using R and then Insert or Update rows

I have a table with the below structure.

ID Entity UserName UserRole UserStatus UpdatedDate

Clustered key is defined on columns Entity, UserName, UserRole.

UserStatus for an Entity-UserName-UserRole could be either Active or Inactive.

I could implement the INSERT and UPDATE statements for a dataframe df as follows:

Insert:

values <- paste("('",df$Entity,"','", df$UserName,"','",
                    ,df$UserRole,"','" ,df$UserStatus,"','" ,
                    df$UpdatedDate"')", sep="", collapse=",")
   
sqlStr_Insert<- sqlQuery(con, sprintf(paste("Insert into TBL
                                     (Entity, UserName,UserRole,UserStatus,UpdatedDate) values ",values)))

Update:

sql_string <- "UPDATE TBL SET UserStatus = '%s', UpdatedDate = '%s' WHERE Entity='%s' AND UserName ='%s' AND UserRole = '%s';"
  
  sqlStr_update <-sqlQuery(con, paste(sprintf(sql_string, UserStatus, UpdatedDate,df$Entity,df$UserName,df$UserRole), collapse=""))

The code works fine but I would like it to check if for every row in the df for a combination of Entity-UserName-UserRole already exists in the TBL. If found, run the UPDATE statement else run the INSERT statement.

For example – SQL Server table data:

ID Entity UserName UserRole UserStatus UpdatedDate
---------------------------------------------------
1  A      Ford    Analyst  Active     1/1/2020
2  B      Acura   Analyst  Active     1/5/2020
3  A      Tesla   Engineer Inactive   1/6/2020
4  A      Kia     Analyst  Active     1/1/2020

df to be inserted or updated:

 Entity UserName UserRole UserStatus UpdatedDate
---------------------------------------------------
 A      Accord   Analyst  Active     1/10/2020
 B      Acura    Analyst  Active     1/10/2020
 C      BMW      Engineer Active     1/10/2020

What should happen:

Only rows 1st and 3rd should get ‘INSERTED‘ to the database TBL. UPDATE should only happen if the ‘UserStatus‘ changes for an existing Entity-UserName-UserRole Combination.

I guess I could run a ‘FOR’ loop for every row of the dataframe df but I think that would be slow and not an optimum solution.

Advertisement

Answer

You could create a temporary table (use # prefix in SQL Server) and send a MERGE query as suggested by @CharlieFace:

data.new <- read.table(text='
Entity UserName UserRole UserStatus UpdatedDate
A      Accord   Analyst  Active     1/10/2020
B      Acura    Analyst  Active     1/10/2020
C      BMW      Engineer Active     1/10/2020', header = T)



library(DBI)
conn <- dbConnect(...)

# Create temporary table
dbWriteTable(conn,'#NEWDATA',data.new)

dbExecute(conn,"
MERGE INTO TBL WITH (HOLDLOCK) AS target
USING #NEWDATA AS source
    ON target.Entity = source.Entity
    AND target.UserName = source.UserName
    AND target.UserRole = source.UserRole
WHEN MATCHED THEN 
    UPDATE SET target.UserStatus = source.UserStatus
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Entity,UserName,UserRole,UserStatus,UpdatedDate)
    VALUES (Entity,UserName,UserRole,UserStatus,UpdatedDate);
          " )

# Remove temporary table
dbExecute(conn,"DROP TABLE #NEWDATA" )

# Check results
result <- dbGetQuery(conn,"SELECT * from TBL")
result

  ID Entity UserName UserRole UserStatus UpdatedDate
1  1      A     Ford  Analyst     Active    1/1/2020
2  2      B    Acura  Analyst     Active    1/5/2020
3  3      A    Tesla Engineer   Inactive    1/6/2020
4  4      A      Kia  Analyst     Active    1/1/2020
5 NA      A   Accord  Analyst     Active   1/10/2020
6 NA      C      BMW Engineer     Active   1/10/2020

Not sure how ID is managed on the server : autonumber?

7 People found this is helpful
Advertisement