tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Proposal: kmem_valloc [was: Re: raspberry pi panic 7.0_BETA after install fs resize]

    Date:        Sun, 09 Nov 2014 09:06:40 +0100
    From:        Maxime Villard <>
    Message-ID:  <>

  | My point is:
  |   - KMEM_SIZE is only enabled on DIAGNOSTIC.

Which makes sense if we're assuming the point of DIAGNOSTIC is to help
track kernel programming errors, which is what I believe.  Tracking
allocation sizes allows the kernel to verify that the correct space is
being freed.   If we assume there are no bugs, then there's no need to
check that.

  |   - We need a similar way of holding the size by default - and no size
  |     argument to _free.

I disagree with that.

I believe there are 3 cases to consider.   The first two are easy.

First, allocation of a sized sized blob (kmem_alloc(sizeof(whatever))).
For that one, tracking the allocation size serves no purpose but to
avoid typing ", sizeof(whatever)" on the kmem_free call (and could lead
to vastly overused memory resources, eg: if "whatever" is "char" then
probably 9 times as much memory, on modern systems, as really required.).

Obviously, the plan would not be to use the new interface for this case.

Second, allocations that vary in size over time.   (Tables that grow
bigger as more data arrives, or whatever).   For this, since no malloc()
type function set has a how_much_is_allocated_at(void *p) the code needs
to track how much it had previously allocated, so it can know whether it
needs to change that allocation (up or down) as the data volume varies.

There's no point (other than DIAGNOSTIC) keeping the same information
twice, so there's no point having the allocation functions track the
size in this case either - this is another case where just using the
existing interface is the best way to go.

Third, allocations that are of variable size when allocated, but which
don't change later.   This is the case where it seems that having the
malloc() type interface would win, as it becomes possible to do
	p = malloc(strlen(s)); .....; free(p);
without having to remember strlen(s) value through the .... section (which
can be a long time, and through function returns, etc).

But think about it a bit more - how much of that kind of allocation does
the kernel really do?   How much variable sized data is really saved in the
kernel, and then never returned to the user?   The latter qualification
counts, as if the data is ever to be copied out, the kernel must know its
size first (for bounds checking on user space buffers).  If the code does
not know how much is stored there, it needs to recalculate it - every time.
That's wasteful - it is better to simply remember the size, and further, that
tends to be more efficient as well, as the remembered size can be in a
variable, or field, that's appropriately sized, not just a size_t which is
what a malloc() type copy of the size would need to use.

What is being proposed here is to fix something so rare, and to do it in
such an inefficient way, that it makes no sense at all.

Further, even if you don't buy that argument, and want to go ahead anyway,
then this ...

  |   - Problem: if we do it by default in kmem_alloc - which, if I
  |     understand correctly, is your suggestion - we lose the memory
  |     optimisation.
  |   - Solution: let's use malloc!
  |   - Problem: malloc is deprecated, and dead.

makes absolutely no sense.   The reason malloc() is deprecated, in the
kernel, is because of an argument just like the one above - not because
the name is somehow evil, and must not be used.   If one was to reinstate
malloc()/free() functionality in the kernel, it would not make any sense at
all to use different names, that would just confuse everybody, and make it
even harder than it is now to share code between kernel and userland.

Don't do this - and especially not when the motivation was the resize_ffs
issue, where the actual bug was something else entirely (whether you believe
it is a user error, in attempting to unmount a filesystem that has been
resized - same as users should not attempt to unmount a mounted filesystem
that has been corrected using fsck, or whether it was a coding bug, not
correctly updating other fields when the filesystem size changed - either
way, making it so the free() worked, would not have helped at all.)

Further the 0 allocation size case isn't worth dealing with either, or not
this way.

  | Verily I would agree with you if I hadn't seen many bugs like:
  | 	int count = SCARG(uap, misc);
  | ...
  | 	/* Make sure userland cannot exhaust kernel memory */
  | 	if ((size_t)count > (size_t)uvmexp.nswapdev)
  | 		count = uvmexp.nswapdev;
  | ...
  | 	ksep_len = sizeof(*ksep) * count;
  | 	ksep = kmem_alloc(ksep_len, KM_SLEEP);
  | In this example, count could be zero, and it just crashed. But ksep was
  | used depending on count, it was not touched when count=0.

Sure, but the code could just as easily test count != 0 and skip the
entire segment in that case.   Or we could just redefine the kmem_alloc(),
have it test for 0, and if it gets that, define it to return NULL, and
not crash - (and not fail when NULL is passed to kmem_free() of course).
Since you're already assuming that the code will be correctly written to
not indirect through the pointer returned if size was 0, this would be
perfectly safe.   For kernel use, there's no need to keep the old malloc()
promise that no two returned pointers will ever be the same.

  | And finally, remember that malloc(0) does not panic precisely because
  | a header is allocated.

Believe it or not, that is (somewhat) new - malloc(0) used to fail (I
forget now whether it just returned NULL - which says "I fail", or whether
it just did stupid things) - even userland code was required to check
whether or not the size was 0 before requesting memory (which was sometimes
done, when appropriate, by just "+ 1" on the allocation size...)

But there were too many lazy, and stupid, user level programmers, and
so it was decided to "fix" malloc() to deal with the 0 allocation size.

Hopefully, we don't have lazy stupid programmers coding in the NetBSD
kernel, so provided we document that 0 sized allocations don't work (or
document that they just return NULL if that change was to be made) then
there should not be a problem.  A whole new interface, just to solve that
problem is not warranted.

  | If you think kmem_valloc(0) is a bad idea, then malloc(0) is a bad idea
  | too, and we would have to fix them all.

No, the usage patterns are different.  Personally I never really believed
that we needed malloc(0) to work, or that realloc(NULL,...) was ever really
necessary either, dealing with those cases is not that hard.   But
user programmers (and the programs they write) and the kernel (and its
programmers) are just different.   The kernel must be able to run for years,
mem allocations need to be very tightly controlled - most user programs
last a few seconds, if they don't deal correctly with malloc(), no-one really
cares, even long running programs can have leaks, and it is rarely a
very serious problem.   The kernel allows none of that, all memory allocations
need very close attention, to make sure that everything is handled properly.
If the code has that must attention, then dealing with 0, and remembering
how much was allocated, are trivial to deal with.


Home | Main Index | Thread Index | Old Index