Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Another UVM wire page count underflow panic

Careful, Robert, you're going to end up the designated maintainer of UVM at this rate :-)

> On Mar 15, 2019, at 12:11 PM, Robert Elz <kre%munnari.OZ.AU@localhost> wrote:
> However, the relationship between the shm() functions and the uvm
> (m*() functions) looks to be something of a mess - we really cannot keep
> a state where by doing bizarre combinations of calls from those two
> sets, which internally seem to not know much about each other, can
> either make a mess of the kernel data structures, or crash the kernel.


So, there are two competing, incompatible API contracts here:

1==> SVID shared memory's "locking" extension that requires balanced wire/unwire calls.

2==> POSIX Realtime extensions mlock(2) / munlock(2), which does not require balancing wire/unwire.

SVID's maps nicely to the "wire_count" field in the vm_map_entry_t.  POSIX does not, but we sort of coerce it by letting it increment wantonly and then forcing to 0 when munlock(2) comes in (I'm pretty sure this is my fault, actually; I will find a sword to fall on).

POSIX's semantics could just as well be represented with a bit in a flags word, and handily there is one in vm_map_entry_t (why it's a uint8_t when just about every ABI is going to round that to at least an int, in terms of allocated space, *shrug*).

I would suggest that the right way to fix this would be as follows:

1==> Use a bit in the entry flags to hold the POSIX "mlock" wiring state.  This includes entries wired as a result of VM_MAP_WIREFUTURE (which is a proxy for mlockall(2)'s MCL_FUTURE).

2==> Change VM_MAPENT_ISWIRED() to consult this flags bit in addition to the wire_count.  Audit the code to make sure everything uses this macro rather than just looking at entry->wire_count.  (This audit could be done first, independently.)

3==> Add a new flag to uvm_map_pageable() that says you're performing an mlock-sematics wiring change.  If that flag it set, you either fiddle the bit, or fiddle the count.  A new flag to indicate "no, REALLY unwire it" that clears both the bit and sets the wire_count to 0 might also be necessary to handle certain cases (like removing mappings, process exit / exec, etc. -- I think uvm_map_pageable() is used in those cases before the map entry is removed, but I'm not 100% sure).

4==> A convenience function to determine if a map entry's wiring state has changed that considers the mlock bit as well as the wire_count, maybe something like this?

	const vm_map_entry_t * const entry,
	const u_int new_mlock_flag,
	const int wire_count_delta)
	const bool mlock_flag_changed = ((entry->flags & UVM_MAP_MLOCKED) ^ new_mlock_flag) != 0;
	const int new_wire_count = entry->wire_count + wire_count_delta;
	const bool wire_count_changed = entry->wire_count ? new_wire_count == 0 : new_wire_count != 0;

	return mlock_flag_changed || wire_count_changed;

Use of this function would replace checking if the wiring count is 0 (and about to become 1) or 1 (and about to become 0) in various places.

It seems like doing this would allow both APIs to be used together without mangling the internal kernel state.  Of course munlock() would not actually unwire SVID shared memory regions, but that seems OK ... the two APIs are independent, and it seems like this is a reasonable "undefined" behavior resulting from mixing them together.

I don't think this would actually be a ton of work, and I'd be happy to guide you along.

-- thorpej

Home | Main Index | Thread Index | Old Index