.net - How do I calculate a good hash code for a list of strings? -


background:

  • i have short list of strings.
  • the number of strings not same, of order of “handful”
  • in our database store these strings in 2nd normalised table
  • these strings never changed once written database.

we wish able match on these strings in query without performance hit of doing lots of joins.

so thinking of storing hash code of these strings in main table , including in our index, joins processed database when hash code matches.

so how hashcode? could:

  • xor hash codes of string together
  • xor multiply result after each string (say 31)
  • cat string hashcode
  • some other way

so people think?


in end concatenate strings , compute hashcode concatenation, simple , worked enough.

(if care using .net , sqlserver)


bug!, bug!

quoting guidelines , rules gethashcode eric lippert

the documentation system.string.gethashcode notes 2 identical strings can have different hash codes in different versions of clr, , in fact do. don't store string hashes in databases , expect them same forever, because won't be.

so string.gethashcode() should not used this.

standard java practise, write

final int prime = 31; int result = 1; for( string s : strings ) {     result = result * prime + s.hashcode(); } // result hashcode. 

Comments

Popular posts from this blog

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -