tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kernel memory allocation failures



On Fri, Dec 11, 2015 at 02:02:08PM -0500, Greg Troxel wrote:
> 
> Chuck Silvers <chuq%chuq.com@localhost> writes:
> 
> > On Fri, Dec 11, 2015 at 09:44:07AM -0500, Greg Troxel wrote:
> >> 
> >> Chuck Silvers <chuq%chuq.com@localhost> writes:
> >> 
> >> > how about instead we fix the kmem_alloc() implementation to match the man page?
> >> > that seems much more practical to me.  adding failure checks and recovery code
> >> > to the thousands of *alloc() calls in the kernel would be a vast amount of work
> >> > for very little benefit.  an attempt to allocate an amount of memory large
> >> > enough that it can never succeed sounds like a bug to me, and it seems better
> >> > to fix any such bugs rather than add a vast amount of mostly useless
> >> > error handling code in hopes of papering over them.
> >> 
> >> That sounds sensible, but it would seem to require defining some 'small
> >> enough that it cannot fail' size, either statically or via some
> >> getconf-like interface, so that code is only relieved of the obligation
> >> to check if the size is below the limit.   Then kernels that can't
> >> enforce that limit have to panic at boot.   Did you mean all that, or
> >> something else?
> >
> > I just meant that kmem_alloc() with KM_SLEEP should never return NULL,
> > so that the caller does not need to check.  if the caller is erroneously
> > requesting an amount of memory so large that the allocation can never succeed
> > (eg. more than the RAM in the machine) then kmem_alloc() would need to either
> > retry forever or panic.
> 
> That's fine, but to be "erroneous" there needs to be a specification.
> Perhaps that can be something not all that large (1 MB?), with the
> notion that it's always buggy to want more than that, even on big
> machines, and you should have smarter data structures and use pools.
> But it seems tricky because the amount that is reasonable is hard to
> intuit when we can have a main memory range from small to large machines
> of probably 2^14 at present.

exactly... it depends so much on context that it's hard to give a useful specification.
the patch I sent just now will just retry forever, so there is no specific threshold
where something different will happen.  this seems an improvement over what happens now,
so I'd like to get this in for the time being.  if people want to improve things further,
I will have no objection as long as whatever scheme we come up with does not require
every caller to add error-handling code.

-Chuck


Home | Main Index | Thread Index | Old Index