Search
Close this search box.

Do NOT use only CHECKSUM or BINARY_CHECKSUM as sole comparison in SQL, use HASHBYTES instead

We had a process ot using BizTalk to take in a reference file, then send data in via a table_type parameter in a stored procedure.  The stored procedure uses a MERGE statement to insert and update; the update part of which does a CHECKSUM first to determine whether an update is even necessary.

Then it happened, we had a reference row that should have been updated by the latest file, but it wasn’t done.  We checked the incoming and outgoing BizTalk Message in and out of the pipeline; everything looked normal.  Then we decided to take out the CHECKSUM part and the update succeeded.

In reading over the Microsoft documentation, I was absolutely shocked when I read this in the remarks section:

However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change

http://msdn.microsoft.com/en-us/library/ms189788%28v=SQL.100%29.aspx

Then I checked the page for BINARY_CHECKSUM, a similar remark exists.  http://msdn.microsoft.com/en-us/library/ms173784%28v=SQL.100%29.aspx

According to these 2 articles, we should be using HASHBYTES().

http://msdn.microsoft.com/en-us/library/ms174415%28v=SQL.100%29.aspx

Personally, I was really shocked to find a function in any database that “most of the time” does its job.  I really do not know any system that tolerates “occasional” mistakes, or maybe it’s that I haven’t worked in enough industries.  I equate this to getting a notice on your bank statement saying “we might not have all your transaction listed, and we don’t know for sure whether we have them all”.

I love quantum physics, but not in my computer systems.

[Update 2012-10-19]

I updated the title of this post to give more clarity to what I was trying to say.  You can read the comments below to follow the discussion.  I’m really fortunate that so many smart people decided to participate in this discussion to make this post better.

Here is an article that articulates the original idea of this post:  http://www.bidn.com/blogs/TomLannen/bidn-blog/2265/using-hashbytes-to-compare-columns 

This article is part of the GWB Archives. Original Author: My Digital Life in Data Flow

Related Posts