String hash identifiers
Posted by jgiors on 2006-05-24
In the games industry, it is common practice to use hashes of strings as identifiers for objects. String hash identifiers are a useful compromise between the differences in human and computer "comprehension" — humans understand and remember string identifiers better than arbitrary numeric quantities, while computers operate most effectively with numbers.
CRC is one of the most common hashes, but there are other options, such as FNV. I prefer FNV because it is even simpler than CRC and doesn't require a lookup table as the typical CRC implementation does.
The advantages of using hashes are:
- Hash values are always the same data size no matter how long a string is.
- A primitive data type (typically a 32-bit unsigned int) can hold the hash.
- Equality/inequality comparison is fast.
- Less memory space is required than for strings.
- They make good indices into hash tables.
Like all techniques, string hash identifiers have disadvantages. Because a hash is just an arbitrary number with no intrinsic meaning, the only way to find the string a hash originated from is to look it up in a hash-to-string table. And that means you have to create a hash-to-string table if you want "reversibility". If the string was hashed without storing it in a lookup table, it may not be possible to determine the original string (though putting a conditional breakpoint in the code often works).
It is also necessary to be on the lookout for hash collisions (two different strings which hash to the same value). They are reasonably rare, but the data sets of many games are getting large enough that a few collisions will occur during the course of a project. Collisions can lead to obscure bugs which are hard to track down. Often, this can be caught in debug builds by checking for duplicate strings when inserting hashes into lookup tables. When a problematic collision is found, it can be fixed by renaming one of the items.
Even so, the advantages of string hash identifiers typically outweigh the disadvantages. I have worked on several projects that use this technique, and the consensus is that it has been worthwhile.
I'll be writing more on this topic soon…
ionous » String Hash said
[...] the passing ship sense, not the flame-baiting, nor monster under the bridge sense ) and came across John Gior’s post on FNV in games. Having never heard of FNV, I thought I’d take a look. Recognizing that [...]