tech-kern: Re: hashinit.9 [Re: 1. uiopeek? 2. hashinit/hashdone?]

Subject: Re: hashinit.9 [Re: 1. uiopeek? 2. hashinit/hashdone?]
To: Chapman Flack <nblists@anastigmatix.net>
From: None <brian@surge.insomnia.org>
List: tech-kern
Date: 06/04/2006 23:48:57
On Sun, 4 Jun 2006, Chapman Flack wrote:

> brian@surge.insomnia.org wrote:
>> This seems to me to be a commitment to allocate based on power of two. 
>> Sure, the current implementation does this (going so far as getting into
>
> Ah, but this is not a variable implementation choice, given the
> interface. That you get back a hashmask that you can & with any
> arbitrary hash to get a valid index is part of the defined interface,
> and /that/ commits the implementation to choosing a power of two;
> an interface change would be required to change it. The question left
> unanswered is /which/ power of two, well worth answering not only
> because it's important (and anything that would make a factor of two
> difference in the size of a kernel structure is probably worth thinking
> about) but also because it's changed a couple of times and differs among
> bsds.

Without the man page, you have to infer the contract. This inference 
basically comes down to this:

/*
  * General routine to allocate a hash table.
  * Allocate enough memory to hold at least `elements' list-head pointers.
  * Return a pointer to the allocated space and set *hashmask to a pattern
  * suitable for masking a value to use as an index into the returned
  * array.
  */
void *
hashinit(u_int elements, enum hashtype htype, struct malloc_type *mtype,
     int mflags, u_long *hashmask)


That is, you return a pointer that can hold at least `elements` head 
pointers, and you have a hashmask which can be used as a mask to get an 
index value.

Note the things this does not say:

1: It does not say the implementation is powers of two (it is)

2: It does not say that hashmask can be used to represent the full range 
of indexes.

I'm not saying there's a valid reason to change the implementation to do 
something else, but you are clearly saying more than was said before, for 
better or worse.

Unfortunately, in the kernel, there are few good alternatives to the 
bitwise and operator; modulus is a very poor choice, as it tends to be 
much more expensive, even when it's a single instruction (like on i386).

> it hasn't gone out of the way to say it can't.  I don't know of any
> plans or requests to add any more queue.3 types--there's a type I
> would use if it were even in queue.3, but that's another story--and
> a programmer using the interface now needs to know what it supports
> now. If it supports other htypes a year from now, the documentation
> a year from now should say that.

Indeed it should, perhaps it even will.

>> 3: The documentation does not mention any limits on sizes of the chains. 
>> This should be added, since it would be nice if the contract explicitly 
>> forbidded hashsize * esize being > the value a u_long can store. I 
>
> Hmm, maybe I need to somehow clarify that the interface doesn't
> constrain the size of the chains. When you look at esize you're looking

I was using the word chains to keep the names consistant with your 
documentation, specifically:

.Fo hashinit
.Fa "u_int chains"
.Fa "enum hashtype htype"
.Fa "struct malloc_type *mtype"
.Fa "int mflags"
.Fa "u_long *hashmask"
.Fc

IE: the variable the source code calls elements. I personally prefer the 
source's naming than the man pages; is there a reason they differ?

> at the size of a chain /head/, which is fixed and beyond your control
> (except that you get to choose whether it's a LIST head or a TAILQ head,
> which have sizes of one or two pointers, respectively). The array only
> contains the heads; you allocate and manage the stuff on the chains,
> and nothing but available memory constrains you in that.  Allocating the
> array would be a problem if you really want MAXULONG/sizeof(two pointers)
> slots, but I'm not sure if the likelihood of that warrants the verbiage

Saying 'dont do this' is pretty inexpensive.