-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think David's primary argument here, that hashing may not be the best design choice, is well taken. However, on the subject of hash table sizes, it is not difficult to allow the hash table to dynamically increase in size. This does not impose significant overhead; you can simply count the length of the hash buckets as you traverse them, and trigger a dynamic resize when some threshhold is exceeded. There would be some issues in doing so, such as memory fragmentation, but it can be done. Here, of course, you are trading off average case performance (which tends to be better with hashing) against worst case performance (which would clearly be dismal at the point where the has is dynamically reconstructed). So, it depends somewhat on whether the average case or the worst case is more important; an obvious example is that you wouldn't do this sort of thing if you are trying to get near real time functionality. On Tuesday 27 November 2001 02:21, you wrote: > > There is a potential issue however; FNV uses multiply (*) and xor (^), > > whereas a lot of the existing hashes on strings and buffers just use > > addition (+). > > Using multiply will (probably) stuff the performance of sytems that > don't have fast hardware multiply (eg sparc32). > > > 1) vmstat -h nfsnodehash: > > total used util num average maximum > > hash table buckets buckets % items chain chain > > ---------------------------------------------------------------------- > > > > with existing nfs_hash(): > > nfsnodehash 65536 1000 1.53 34688 34 105 > > The 1000 used buckets looks too much like a non-random number! > Anyone know whether this is true? > > > using fnv_32_buf() > > nfsnodehash 65536 27054 41.28 34867 1 6 > > One problem, I have with these kernel hash tables, is that you end up > allocating a massive table - and require a good (computationally expensive) > hash function - to get good performance on large systems, wasting the space > on smaller systems. Or generate a kernel configuration constant that is > never tuned for the (current) workload of the system. > > It often isn't viable to make the hash table (and function) such that the > load factor is < 1 - so there are almost always chains. Leaving your > lookup O(n) for large data sets. > > Surely it would be better to find an ordering relation and put the data > into a balanced binary tree. For large data sets this is O(log n). > > Of course it is probably worth having a cache of recently acessed items > - but this can be much smaller and only needs a cheap hash function. > > David - - -- Seth Kurtzberg Machine Independent Software Office: (480) 661-1849 Fax: (480) 614-8909 email: seth@cql.com pager: 888-605-9296 or email 6059296@skytel.com - -- Seth Kurtzberg Machine Independent Software Office: (480) 661-1849 Fax: (480) 614-8909 email: seth@cql.com pager: 888-605-9296 or email 6059296@skytel.com -----BEGIN PGP SIGNATURE----- Version: PGP 6.5.8 iQA/AwUBPAMtb3hkmRgYZUCaEQK4RQCbBw2P6szdPnEKoXAA3+TNOEvXXwYAoLlD MpaU6/dKfkfOTzxwZL1+O3jz =MUBm -----END PGP SIGNATURE-----