Re: Can someone please explain the rules for __NetBSD_Version__ ??

To: christos%astron.com@localhost (Christos Zoulas)
Subject: Re: Can someone please explain the rules for __NetBSD_Version__ ??
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Tue, 10 Jan 2012 13:41:10 +0700

    Date:        Tue, 10 Jan 2012 03:48:31 +0000 (UTC)
    From:        christos%astron.com@localhost (Christos Zoulas)
    Message-ID:  <jegcef$auo$1%dough.gmane.org@localhost>

Quoting Soren Jacobsen  <snj%pobox.com@localhost> |>
  | >My thinking was that we didn't care much if the version number got
  | >bumped unless there was an API/ABI change.

It wasn't just when it should be bumped that I was asking, but the
whole way the various sub-fields are intended to be used.

Param.h says ...

 *      #define __NetBSD_Version__ MMmmrrpp00
 * 
 *      M = major version
 *      m = minor version; a minor number of 99 indicates current. 
 *      r = 0 (*)
 *      p = patchlevel

which is all OK as it goes, except that ...

#define __NetBSD_Version__      500000003       /* NetBSD 5.0 */

which has "03" in a part that param.h says should be 00.

  | It was designed to be bumped with every release. Now if that is necessary
  | or not, is a different story.

I think it should, personally, but before it is possible to define when
it should be bumped, we should have a clear idea what it actually means,
and how it is intended to be used.

On the latter, first, I assume (simply from its being in param.h and
not somewhere else) that this value is really intended for internal
kernel use only (not for userland at all).    I know some user programs
abuse param.h (fortunately less these days than there used to be) but
aside from the ones that want to consider themselves part of the kernel
but running in userspace (for good, or bad, reasons) they really shouldn't
be including param.h at all, or using __NetBSD_Version__ if they do.

Second, it is fairly clear that we want a new version whenever an internal
ABI changes (which is what causes all the bumps in the patchlevel in
current) - that part is fairly obvious.

The question is do we need a new version when there is no ABI change
(such as between 5.1 and 5.1.1 or 5.0 and 5.0.2 or whatever - and
potentially even between 5.0 and 5.1).

To answer that (to provide what is IMO the answer) I need to go back
to the purpose of the definition again and examine that more closely.

First, I think we can (almost) rule out most of what is in /usr/src/sys
as candidates for using __NetBSD_Version__, I think it is reaosnable
to assume the entire kernel source tree is one consistent version,
and not a mixture of various different versions, and if that's true,
the entire kernel source knows (implicitly) which version it is being
compiled for, and never needs to test.

That said, I know there are two (other than what is below) uses of this
in the kernel sources currently, one is in dev/video.c where it
(ab)uses NetBSD_Version to create a video driver version value to
return as part of a query for that value (which is basically harmless
and irrelevant), and in netisdn, which is full of ifdef's testing
__NetBSD_Version__ - I assume it is (attempting to be) maintained as
a single source shared between NetBSD & other *BSD's (and more?) and
so is almost better treated as if it were as below.   (If that's no longer
true I'd submit that all the ifdef's should be ripped out of it.)

What that leaves, are externally provided modules (or LKM's in
earlier systems) - which includes those user space programs that
properly should be considered kernel extensions - that probably
includes rump/fuse (whichever is the part that matters, or both),
and perhaps stuff like lsof.

For an external module what is needed to know are what kernel
header files the module is compiled against (that's what <sys/param.h>
provides) and what version the kernel is that the module is loaded
into.   Currently, from what I can see, that's only available by
looking up the sysctl node kern.osrevision (I'd think this is important
enough that it could be stored in a named global variable, which the
sysctl node references, but at least it is available,a nd of course, the
sysctl makes it available to userland).

For gross incompatibilities, the module loader already checks that
the kernel version is "correct" (no idea if LKM's also did that or
not, but they no longer really matter), so modules don't really
need to do a test themselves any more, but this only works for
gross incompatabilities (and I think that's correct).   Check
out module_compatible() in kern/kern_module.c

Now the kind of incompatibility I'm concerned about is more subtle.
Say I'm writing a module that just happens to use an internal
kernel interface in a (correct) way that's never been tested before,
and in so doing, I expose a locking bug (that everybody admits is
a bug, and immediately gets fixed in the sources) that had never been
encountered before, because nothing else in the kernel happens to do the
sequence of operations that causes it.   Of perhaps my module is a
driver, and one of its internal interfaces returns NULL (according
to the documented return values) in a place that no other driver has
ever returned NULL before, and some other part of the kernel just assumed
that no test for NULL was needed,a nd so crashes with a dereference of
NULL when my driver (quite properly) returns NULL at this point - again,
everyone agrees it is a kernel bug, adds the (if res == NULL) return ENOMEM;
(or whatever) test was missing, and the bug gets fixed.

Either way, these fixes are the kind of thing that quite properly go
in the next minor release of the system (from 5.1 to 5.1.1 or whatever)
and the kernel module loader allows (the same) modules to be loaded
into both versions - fixing neither bug makes what is rationally called
a kernel ABI change.

But now I'm stuck, I have a module that panics a kernel when it is loaded
and used, where the kernel is totally stable when my module is not used.
Clearly (to any sane user) my module is the problem, and so no-one should
ever use it (and once you gain a reputation like that, it is very very hard
to overcome).

So, my module tests __NetBSD_Version__ and simply disallows the functionality
that causes the locking error in older kernels, or fakes some nonsense
return value (and so looks to have succeeded, when it actually failed)
instead of returning NULL for older kernels (and in newer ones where the
bug is fixed, allows things to work as designed).

But I can do this only if __NetBSD_Version__ gets changed whenever there
is a new release with any bug fixes at all, as there's no way that you can
know just what bugs my module has internal workarounds for.

It also requires that I know how to test __NetBSD_Version__ (which bits to
test, and which are irrelevant, if any) - which means the thing needs to be
correctly documented somewhere (and ideally, the value could be more
easily obtained than digging it out of sysctl() - note it is the
runtime kernel version we need for this, other than being used to
set that, the one in param.h is useless for this purpose).

Finally, once all of this is actually sorted out, we can perhaps do
something with pkgsrc/pkgtools/osabi and make it a little more rational
than it currently is.

kre

Follow-Ups:
- Re: Can someone please explain the rules for __NetBSD_Version__ ??
  - From: Martin Husemann

References:
- Re: Can someone please explain the rules for __NetBSD_Version__ ??
  - From: Christos Zoulas
- Can someone please explain the rules for __NetBSD_Version__ ??
  - From: Robert Elz
- Re: Can someone please explain the rules for __NetBSD_Version__ ??
  - From: Soren Jacobsen

Prev by Date: Re: Equivalent of FreeBSD kernel semaphore?
Next by Date: Re: Equivalent of FreeBSD kernel semaphore?
Previous by Thread: Re: Can someone please explain the rules for __NetBSD_Version__ ??
Next by Thread: Re: Can someone please explain the rules for __NetBSD_Version__ ??
Indexes:

Home | Main Index | Thread Index | Old Index