GetHashCode() method, HashCode and HashTable behavior (Deep level)
About uniqueness of HashCode, Microsoft says: http://msdn2.microsoft.com/en-us/library/system.object.gethashcode.aspx
The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
I tried to find out, why Microsoft says this:
Making Hash Code unique for Different objects / strings is almost impossible, the one of the reason is the size of the Hash (int) which is 32 bits only, this means that the contents of the string and the size of any object much be squeezed into those bits, which means a lot of data is lost. For any given string bigger than the size of the hash, total uniqueness cannot be guaranteed and that’s why MSDN is also not giving guarantee on this
Even though HashTable is able to identify the unique object, this is because hashtable internal structure / hashtable behavior.
Hash Table behavior: When we pass any object as a key to the Hash Table, HashTable calls the GetHashCode() method internally and maintain the Hash Code in its list. Now as there is a chances that two objects / strings can generate same hash Code, so that’s why HashTable also maintains the list of passed keys. Whenever we want the value of particular key or want to check if particular Key is exist, hash table uses the Hash Code list to find out the possible matching keys, and then it looks through all of those keys and compares them with the key (value) we are looking – this is done using Equals() method of the object. So here Hash table internal structure is very much reliable and is able to maintain total uniqueness
It has been observed that some developer do try to create unique value for the object then pass that value to the HashTable. It has also been observed that the use default GetHashCode() method of object to create HashCode – they feel that the generated HashCode will be an unique value. But originally that is not unique. If you really wants to call GetHashCode(), it is recommended to override the GetHashCode() and implement your own logic to create unique value. My recommdation is, if you are using HashTable, then directly pass the object to hashtable, it is capable to handle as long as two different objects are different.
For example; You have two string: 1st is ‘blair’ and 2nd is ‘brainlessness’, and the Hash Code of both are ‘175803953’. You generally calls GetHashCode() method and add the generated hashcode to the hashtable, in this case when we try to add the 2nd string’s hash Code in Hash Table we get ‘Duplicate key’ error. but if you do not call GetHashCode() method and directly add string (‘blair’ and ‘brainlessness’) to hash table as a key (not a Hash Code of string) the Hash Table treats them as a different key as long as the string / object are really different (‘blair’ and ‘brainlessness’ are different) – here hash table doesn’t care if two different strings/objects are generating the same Hash Code internally.


Hello,
Not sure that this is true:), but thanks for a post.
Elcorin
Hi,
Thanks for article. Everytime like to read you.
You have tested it and writing form your personal experience or you find some information online?
I have tested this and written from my personal experience. Also just to sure, based upon my finding, i have made changes in the program and now the changes are running on Production without any problem.
However, i have also given the MSDN link where microsoft also confirm this.
Regards,
Ashish Khandelwal
http://ashishkhandelwal.arkutil.com
Very shorts, simple and easy to understand, bet some more comments from your side would be great