Editorials

Roll Your Own Merge

When you merge two sets together you are executing three different queries. Yes, I know some engines support a MERGE command. There are still times you need to roll your own implementation, and it is instructive to know what commands to do, and the order in which to execute them.

In the merge process you provide a new set of data that will be compared with an existing set. You will perform Insert, Update and Delete queries between the two sets to bring the data in the old set into alignment with the new set. You don’t always have to do a complete merge. For example, you could skip the delete process. For this example let’s consider doing all three steps.

I like to do the delete step first. In this step, you delete any record from the old set that does not exist in the new set. A left join is an efficient way to do this query.

DELETE OLD
FROM Oldtable OLD
LEFT JOIN NewTable NEW ON OLD.Key = NEW.Key
WHERE New.Key IS NULL

I like to do the delete step first because you are looking at all the records in the old table. If you do the delete before the insert there are less records in the old table to process.

The second step I like to do is the UPDATE. I do this step next because if you perform the inserts first, then you will
update the records you just inserted.

UPDATE OLD SET
…update column list

FROM Oldtable OLD
JOIN NewTable NEW ON OLD.Key = NEW.Key

You may find some optimization if you add a where clause for the update statement, and only update where the updated column data from old table is different than the new table.

All that is left now is the insert step. This is again done using an outer join.

INSERT INTO OLD
FROM NewTable NEW
LEFT JOIN OldTable OLD ON OLD.Key = NEW.Key
WHERE OLD.Key IS NULL

Your merge is now complete. This implementation works with any SQL engine supporting outer joins and data modification.

Cheers,

Ben