tech-kern: Re: 1. uiopeek? 2. hashinit/hashdone?

Subject: Re: 1. uiopeek? 2. hashinit/hashdone?
To: None <tech-kern@NetBSD.org>
From: Chapman Flack <nblists@anastigmatix.net>
List: tech-kern
Date: 06/04/2006 14:22:55
Elad Efrat wrote:
> How about waiting a little longer before commiting something?

This particular commit is incapable of breaking anything, and changes
are just as easy to commit as the original.  :)

>>wrote one to fit. Any objections?
> Lots. The purpose of a man-page is to be, first and foremost, helpful.

We agree completely on that point. Where we seem to disagree is on what
information it is helpful to provide in a reference document.

>>.Nd kernel hash table construction and destruction
> "construction and destruction" -> "functions".

I have no strong position on this change. I was documenting two functions,
one of which allocates and initializes an object, one of which frees it.
If there's a consensus that more vagueness would be helpful, I'll go along.

>>.Fa "u_int chains"
> That's a lie. The prototype clearly says "u_int elements". Elements can
> be either a LIST or a TAILQ.

This was a tough call. The name 'elements' is used in the function
definition (there is no name at all in the prototype, per KNF),
and has been unchanged since the 4.4-alpha incarnation, but it became
potentially misleading with Thor's change in 1.35. The name 'elements'
in a hash table constructor typically suggests that you pass the
expected number of objects you will want to enter in the table, and
rely on the constructor to apply a load factor when choosing the number
of slots/chains. That /is/ how hashinit behaved through 1.34, but as of
1.35 you are actually passing the number of chains you want, and had
better account for the load factor yourself (unless you have RAM to burn).

Because the 4.4BSD behavior was undocumented, and it got its load factor
rather cleverly from an uncommented step in its sizing algorithm, it's
actually possible 1.35 was committed without thinking of it as a
fundamental semantic change, but in any case it was, and it would be
misleading IMO to document the current behavior by instructing the user
to pass a number of 'elements.'

I appreciate your objection that the name 'chains' in the manual does
not match the name 'elements' in the code. There are at least 3
possible resolutions:

1. Change the name in the code to reflect the actual behavior.
2. Leave the code alone and tolerate that the names differ.
3. Change the manual to match the misleading name in the code.

I have no strong preference between 1 and 2, but would oppose 3. 

>>The number of slots will be the least power of two not smaller than
>>.Fa chains ,
>>where
>>.Fa chains
>>should be the product of the intended load factor and the expected number
>>of elements to be stored.
> 
> This must change. Nobody cares. How about "The number of slots will be
> 'elements', or the closest power of two."?

Here I believe we're using different values of 'nobody'. :)  Notice that
"the closest power of two" would not accurately describe any of the three
different sizing rules the function has historically used: neither the
least-above nor the greatest-below is always the closest. Depending on
which power of two we're talking about, the difference can be large, and
to leave the man page deliberately vague on it would simply be to drive
anyone who cares to reading the code to find out the truth. That, IMO, is
a failure of the duty of a man page. The information pertains directly to
capacity and load factor, the fundamental parameters of any hash table.

> Now you need a paragraph separation (.Pp?), and then:

I have few strong feelings about paragraph layout; I think we need to
work on our disagreements about what information is necessary and
accurate first.

> Too much. "'htype' indicates the type of each element in the hash table.
> Valid types include 'HASH_LIST' to specify a LIST and 'HASH_TAILQ' to
> specify a TAILQ. See queue(3)."

Your suggested change would not be accurate. The /element/ type of the
hash table--the type of the things you will be keeping in it--is the
structure type you have declared in your own code with LIST_ENTRY or
TAILQ_ENTRY, about which hashinit neither knows nor cares. What hashinit
does need to know is whether you are using LISTs or TAILQs for the
collision chains, and it is /that/ type that 'htype' indicates.

>>.Fa mtype
>>and
>>.Fa mflags
>>arguments are passed to
>>.Xr malloc 9
>>to control the allocation.
> 
> Kill the last line.

No strong preference - fine, if that's consensus.

>>A value will be stored into
>>.Fa *hashmask
>>suitable for masking any computed hash, to obtain the index of a chain
>>head in the allocated table.
> 
> Again, this is confusing and non-clear wording. People who never used a
> hash table (or hashinit(9)) will need to see an example, which makes
> this sentence useless.
> 
> Change it to something like "The hashmask argument is used to pass back
> the mask for use with the allocated hashing table" -- as it appears in
> the OpenBSD man-page.

Here you and I starkly disagree on which wording is less clear. I have
assumed exactly this audience background:
a. knows that hashing involves computing a hash and reducing it to
   a table index
b. knows that the 'reduce to index' step is done differently for different
   hash schemes, and needs to know what to do for this one
c. knows that 'masking' refers to the & operation.

Nothing wrong with proposing different audience assumptions if you think
mine are off base. If you think it can't be explained to the target aud
without an example, then the page needs an example. If you think leaving
the explanation vague is better because it's hopeless anyway, we disagree.

> How about "The hashdone(9) function frees a hash table 'hashtbl' created
> by hashinit(9). 'mtype' is the malloc(9) type passed to free(9)."?
> 
> Then a paragraph break, add something like "Note:"..
> ...
>>.Em does not
>>walk the table and deallocate anything in it, or check that the caller has
>>done so.
> 
> Change this to something simpler. "hashinit(9) frees the hash table
> itself and not the elements stored in it."

Too simple, IMO. We have (1) an array, (2) collision chains, and
(3) elements. In common use, a hash table is an ADT that comprises
1 and 2 and lets you store 3. That would mean "frees the hash table
itself" could misleadingly suggest it does more than just blindly free
the array, which is what it does. My own wording there could be improved
for the same reason--but I think the improvement would have to be something
other than what you propose.

>>.Sh RETURN VALUES
>>The value returned by
>>.Fn hashinit
>>should be cast as pointer to an array of
>>.Dv LIST_HEAD
>>or 
>>.Dv TAILQ_HEAD
>>as appropriate.
>>It can be
>>.Dv NULL
>>only if the specified
>>.Fa mflags
>>allow it.
> 
> Should say "hashinit(9) returns a pointer to the hash table, and
> acts with the same semantics as a malloc(9) call with respect to the
> passed flags."?

hashinit returns a pointer to void. Are you proposing that I omit from
the man page the specific type the silly thing actually points to?
Moreover, the real point of mentioning mflags again here is not simply
to state they determine allocation semantics; they do, but the earlier
mention covered that. The /point/ of the reference in the RETURN VALUES
section is to address directly the user's obvious question, "do I have
to worry about this thing being NULL?"  Why beat around the bush?

> The entire "HISTORY" section should go away. If people want to be anal,
> just stick a "first appeared in 4.4BSD".

I'm not convinced that to be interested in a history section makes one
anal. The actual guideline in mdoc(7) is:

  Any command which does not adhere to any specific standards should
  be outlined historically in this section.

That I have done. To truncate the outline at 4.4BSD would be especially
unhelpful in this particular case, because it is since 4.4BSD that the
function has seen not only obvious changes in its prototype, but also
semantic changes that seem to indicate varying interpretations of just
how its "hash table" should be understood--something that seems in fact
to be going on in our present conversation, and makes the historical
outline more interesting, not less. Someone porting code between nb, fb,
and ob may be helped by an outline of the semantic divergence between
them.

>>.Sh BUGS
>>The only part of the work of implementing a hash table that these functions
>>relieve is the part that isn't much work.
> 
> Remove that section too.

There is precedent among the Unix man pages--especially among the oldest
and best ones--for frankly including validation- as well as verification-
level reflections. I've never thought it hurt them in the least.

The BUGS sections that bug me are the ones that say "this man page is
very incomplete" - which usually sees me shouting at the monitor "yeah,
so what made y'think it was ready to commit?"  The monitor just sits
there, though. :)

-Chap