Subject: Re: 1. uiopeek? 2. hashinit/hashdone?
To: Chapman Flack <nblists@anastigmatix.net>
From: Elad Efrat <elad@NetBSD.org>
List: tech-kern
Date: 06/04/2006 22:24:10
Chapman Flack wrote:

> This was a tough call. The name 'elements' is used in the function
> definition (there is no name at all in the prototype, per KNF),
> and has been unchanged since the 4.4-alpha incarnation, but it became
> potentially misleading with Thor's change in 1.35. The name 'elements'
> in a hash table constructor typically suggests that you pass the
> expected number of objects you will want to enter in the table, and
> rely on the constructor to apply a load factor when choosing the number
> of slots/chains. That /is/ how hashinit behaved through 1.34, but as of
> 1.35 you are actually passing the number of chains you want, and had
> better account for the load factor yourself (unless you have RAM to burn).
> 
> Because the 4.4BSD behavior was undocumented, and it got its load factor
> rather cleverly from an uncommented step in its sizing algorithm, it's
> actually possible 1.35 was committed without thinking of it as a
> fundamental semantic change, but in any case it was, and it would be
> misleading IMO to document the current behavior by instructing the user
> to pass a number of 'elements.'
> 
> I appreciate your objection that the name 'chains' in the manual does
> not match the name 'elements' in the code. There are at least 3
> possible resolutions:
> 
> 1. Change the name in the code to reflect the actual behavior.
> 2. Leave the code alone and tolerate that the names differ.
> 3. Change the manual to match the misleading name in the code.
> 
> I have no strong preference between 1 and 2, but would oppose 3.
> 
>>> The number of slots will be the least power of two not smaller than
>>> .Fa chains ,
>>> where
>>> .Fa chains
>>> should be the product of the intended load factor and the expected
>>> number
>>> of elements to be stored.
>>
>>
>> This must change. Nobody cares. How about "The number of slots will be
>> 'elements', or the closest power of two."?
> 
> 
> Here I believe we're using different values of 'nobody'. :)  Notice that
> "the closest power of two" would not accurately describe any of the three
> different sizing rules the function has historically used: neither the
> least-above nor the greatest-below is always the closest. Depending on
> which power of two we're talking about, the difference can be large, and
> to leave the man page deliberately vague on it would simply be to drive
> anyone who cares to reading the code to find out the truth. That, IMO, is
> a failure of the duty of a man page. The information pertains directly to
> capacity and load factor, the fundamental parameters of any hash table.
> 
>> Now you need a paragraph separation (.Pp?), and then:
> 
> 
> I have few strong feelings about paragraph layout; I think we need to
> work on our disagreements about what information is necessary and
> accurate first.
> 
>> Too much. "'htype' indicates the type of each element in the hash table.
>> Valid types include 'HASH_LIST' to specify a LIST and 'HASH_TAILQ' to
>> specify a TAILQ. See queue(3)."
> 
> 
> Your suggested change would not be accurate. The /element/ type of the
> hash table--the type of the things you will be keeping in it--is the
> structure type you have declared in your own code with LIST_ENTRY or
> TAILQ_ENTRY, about which hashinit neither knows nor cares. What hashinit
> does need to know is whether you are using LISTs or TAILQs for the
> collision chains, and it is /that/ type that 'htype' indicates.
> 
>>> .Fa mtype
>>> and
>>> .Fa mflags
>>> arguments are passed to
>>> .Xr malloc 9
>>> to control the allocation.
>>
>>
>> Kill the last line.
> 
> 
> No strong preference - fine, if that's consensus.
> 
>>> A value will be stored into
>>> .Fa *hashmask
>>> suitable for masking any computed hash, to obtain the index of a chain
>>> head in the allocated table.
>>
>>
>> Again, this is confusing and non-clear wording. People who never used a
>> hash table (or hashinit(9)) will need to see an example, which makes
>> this sentence useless.
>>
>> Change it to something like "The hashmask argument is used to pass back
>> the mask for use with the allocated hashing table" -- as it appears in
>> the OpenBSD man-page.
> 
> 
> Here you and I starkly disagree on which wording is less clear. I have
> assumed exactly this audience background:
> a. knows that hashing involves computing a hash and reducing it to
>   a table index
> b. knows that the 'reduce to index' step is done differently for different
>   hash schemes, and needs to know what to do for this one
> c. knows that 'masking' refers to the & operation.
> 
> Nothing wrong with proposing different audience assumptions if you think
> mine are off base. If you think it can't be explained to the target aud
> without an example, then the page needs an example. If you think leaving
> the explanation vague is better because it's hopeless anyway, we disagree.
> 
>> How about "The hashdone(9) function frees a hash table 'hashtbl' created
>> by hashinit(9). 'mtype' is the malloc(9) type passed to free(9)."?
>>
>> Then a paragraph break, add something like "Note:"..
>> ...
>>
>>> .Em does not
>>> walk the table and deallocate anything in it, or check that the
>>> caller has
>>> done so.
>>
>>
>> Change this to something simpler. "hashinit(9) frees the hash table
>> itself and not the elements stored in it."
> 
> 
> Too simple, IMO. We have (1) an array, (2) collision chains, and
> (3) elements. In common use, a hash table is an ADT that comprises
> 1 and 2 and lets you store 3. That would mean "frees the hash table
> itself" could misleadingly suggest it does more than just blindly free
> the array, which is what it does. My own wording there could be improved
> for the same reason--but I think the improvement would have to be something
> other than what you propose.
> 
>>> .Sh RETURN VALUES
>>> The value returned by
>>> .Fn hashinit
>>> should be cast as pointer to an array of
>>> .Dv LIST_HEAD
>>> or .Dv TAILQ_HEAD
>>> as appropriate.
>>> It can be
>>> .Dv NULL
>>> only if the specified
>>> .Fa mflags
>>> allow it.
>>
>>
>> Should say "hashinit(9) returns a pointer to the hash table, and
>> acts with the same semantics as a malloc(9) call with respect to the
>> passed flags."?
> 
> 
> hashinit returns a pointer to void. Are you proposing that I omit from
> the man page the specific type the silly thing actually points to?
> Moreover, the real point of mentioning mflags again here is not simply
> to state they determine allocation semantics; they do, but the earlier
> mention covered that. The /point/ of the reference in the RETURN VALUES
> section is to address directly the user's obvious question, "do I have
> to worry about this thing being NULL?"  Why beat around the bush?
> 
>> The entire "HISTORY" section should go away. If people want to be anal,
>> just stick a "first appeared in 4.4BSD".
> 
> 
> I'm not convinced that to be interested in a history section makes one
> anal. The actual guideline in mdoc(7) is:
> 
>  Any command which does not adhere to any specific standards should
>  be outlined historically in this section.
> 
> That I have done. To truncate the outline at 4.4BSD would be especially
> unhelpful in this particular case, because it is since 4.4BSD that the
> function has seen not only obvious changes in its prototype, but also
> semantic changes that seem to indicate varying interpretations of just
> how its "hash table" should be understood--something that seems in fact
> to be going on in our present conversation, and makes the historical
> outline more interesting, not less. Someone porting code between nb, fb,
> and ob may be helped by an outline of the semantic divergence between
> them.
> 
>>> .Sh BUGS
>>> The only part of the work of implementing a hash table that these
>>> functions
>>> relieve is the part that isn't much work.
>>
>>
>> Remove that section too.
> 
> 
> There is precedent among the Unix man pages--especially among the oldest
> and best ones--for frankly including validation- as well as verification-
> level reflections. I've never thought it hurt them in the least.
> 
> The BUGS sections that bug me are the ones that say "this man page is
> very incomplete" - which usually sees me shouting at the monitor "yeah,
> so what made y'think it was ready to commit?"  The monitor just sits
> there, though. :)

I'm sorry, but I find your points very weak. Because we will not reach
an understanding of what should be used for wording in a man-page, I'll
take a step back and let you do whatever you want.

For the record, I'll say that the man-page you wrote is, IMHO, bloated
of useless information and is hard to understand for people who
(a) never seen the code that implements the API and (b) want to start
coding without caring about and reading through the gory details.

This is simply based on *my* experience with these routines a few years
ago, which is the only reason why I even bothered taking part of this
thread.

-e.

-- 
Elad Efrat