Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ATF t_mlock() babylon5 kernel panics



    Date:        Tue, 12 Mar 2019 23:21:59 -0700
    From:        Jason Thorpe <thorpej%me.com@localhost>
    Message-ID:  <AB3DD6E8-9ED7-4929-BF24-C03EC593E72D%me.com@localhost>

Thanks for the reply.   I have dropped tech-kern and tech-userlevel
from this reply though.

  | The test employs a bogus understanding of how malloc() is specified.

Yes, that is kind of obvious, but perhaps understandable given the man
page.

  | I've also seen the term "fundamental object" used.

I think that might be pushing understanding a bit far.   That could
refer to a quark, or qason or one of those truly fundamental objects,
or perhaps a proton/neutron/electron, or even atom or molecule...

I am not going to change the man page, as that should be done by someone
who knows what they're actually talking about - what NetBSD libc malloc()
is actually willing to promise, but perhaps something like

	suitable for any C primitive or constructed data type

?

  | One has to remember that malloc() is specified by the C standard,
  | and C has no notion of "pages" or any other such silliness that
  | we Unix people assume are fundamental :-)

Yes, of course, but as long as we meet its requirements, there's no
reason that we can't provide "more" alignment than is required.  It
would not perhaps be completely unreasonable for a malloc of any power
of two size (perhaps up to the page size, or perhaps even more,
depending upon what works best for the architecture) to align the
result to that same size (or the minimum required alignment, whichever
is greater).

An application should not assume that, but it might make some operations
magically work more efficiently if things just happened to be aligned
that way - eg: a read for a (one or more) full page into a page aligned
buffer could simply do page flipping (sharing with kernel on a CoW basis,
or if the kernel doesn't need to keep the page, simply a page swap) rather
than a byte by byte copy.   No-one need even know that such a thing was
happening...


  | POSIX specifically states that mlock() //may// require that the
  | address is page-aligned ... Our implementation does not require this:

Yes, I know - the test is specifically testing that the implementation
does not require it (run on a different system that test might fail).

  | And there are no files there?

No, there were no files, and unless init has taken to holding open
large unlinked files in /tmp (a /tmp that is not yet mounted when
init does most of is work) then there were no remaining processes
to be keeping any unlinked files.

  | Even an open-unliked file should disappear when the offending process exits.

yes.   I don't think it was that, I think it is some anon mmap()'d object
that is no longer referenced by anyone but which is still hanging around.

  | Well, note that tmpfs also uses anonymous memory.

Yes, I'm aware.

  | Is it that "df" on the tmpfs is really showing a bunch of space
  | allocated to the tmpfs?

netbsd# df /tmp
Filesystem    1K-blocks       Used      Avail %Cap Mounted on
tmpfs                 4          4          0 100% /tmp

That's what it showed (it was still in my xterm scrollback buffer from
the window I use as the DomU console).    So, no, df isn't showing space
allocated, just nothing available.   Something had "stolen" all the available
ram, and wasn't letting go.

It did not look as if there was no free memory however:

netbsd# vmstat m
 procs    memory      page                       disk faults      cpu
 r b      avm    fre  flt  re  pi   po   fr   sr x0   in   sy  cs us sy id
 0 0    34868 945956  264   0   0    0    0    0 15  428  426 119  0  0 100

but something as simple as attempting to read a man page (to check
what other options I could try):

netbsd# man vmstat
man: Formatting manual page...
mandoc: stdout: No space left on device

(attempting to write a temp file on /tmp I presume).

netbsd# ps ax
  PID TTY     STAT    TIME COMMAND
    0 ?       OKl  0:27.38 [system]
    1 ?       Ss   0:00.02 init 
 1966 xencons O+   0:00.00 ps -ax 
 7241 xencons Ss   0:00.01 login 
23376 xencons S    0:00.00 -sh 
netbsd# fstat
[...]

The fstat showed nothing interesting, in partucular, init had
no open files at all (not surprising).   The ps was done by
the time the fstat started of course, fstat login and sh had
files open (also no surprise - except why login is there at
all (it is the parent of the sh) - when did we start keeping
login around rather than simply having it exec sh, and why?

In any case I made that login and sh go away, by logging out
and in again, and confirming new pids...


A "normal" df of /tmp shows something more like ...

netbsd# df /tmp
Filesystem    1K-blocks       Used      Avail %Cap Mounted on
tmpfs            524664          4     524660   0% /tmp

  | We use basic 4K page size.

Yes, I saw that when I added diagnostics into the test (I printed
that value).    I also saw when I ran it on my laptop, where I do
most basic development work initially, that whatever malloc it is
using (userland from -current from just over a year ago) was in fact
returning page aligned buffers...

tc-se:t_mlock: pagesize 4096
tc-se:mlock_clip: buf = 0x7f7ff7b11000 (page=4096)


  | In a nutshell, that printf says that UVM thought the page was wired,
  | but the pmap layer did not.

OK, and thanks for the longer explanation as well.   If it turns out
that no-one better qualified than I looks into this, that will help a
lot.


  | > This should indicate that there is no more available anon memory.
  |
  | **OR** it indicates that the limit on wired memory has been exceeded.

Yes, more or less the same thing in this case I suspect.

  | My guess is that jemalloc is simply returning non-page-aligned regions
  | (I have not read the code).

Nor have I, and that was my assumption too, after all, nothing else
seemed to have changed from when the tests were working until now
when they are not.   But ...

t_mlock: pagesize 4096
tp-start: 1552463816.444975, t_mlock, 5
tc-start: 1552463816.444983, mlock_clip
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ed6000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ed7000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7eda000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ee1000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ee4000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ee9000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7eec000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7eed000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7eee000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7eef000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ef0000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ef1000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ef2000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ef4000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ef7000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ef8000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7ef9000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7efa000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7efb000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7efc000 did not change!
[ 450.9201054] pmap_unwire: wiring for pmap 0xffffd180021e0960 va 0x7f7ff7efd000 did not change!
tc-se:t_mlock: pagesize 4096
tc-se:mlock_clip: buf = 0x7f7ff7ed6000 (page=4096)
tc-end: 1552463816.486817, mlock_clip, passed

it appears not, that buffer is page aligned.    One other possible
difference - it could be that the older malloc (whatever it was)
touched the page, so it was faulted in, perhaps the new one simply
mmap()'s some space, and leaves it untouched ??

I will test that by touching the buffer before starting any mlock()
tests in the next version (but will not commit that change if it makes
a difference so it does not influence the b5 tests.)

The same buffer is returned for the next test:

tc-se:mlock_limits: buf = 0x7f7ff7ed6000 (page=4096)

during which the kernel sent this message:

[ 450.9301058] pid 468 (t_mlock): system write of 64@0xffffd1804da77ad0 at 0 failed: 28

which must mean something to someone, but I am not that one!
I do not recall seeing that previously, but the tests are doing
more output now (all my new disgnostics).

It appeared in the middle of the (older) diagnostic message:

tc-se:trying to lock 2596 bytes with 2595 byte limit

(just before the space after "bytes").   All this ouuput was being
sent directly to xencons, and is stderr, so I presume is probably
unbuffered, so 1 byte at a time writes, though ATF will be buffering
it via some file in /tmp I guess.

By the time this test is finishing we get...

tc-se:FATAL ERROR: Failed to write results file; result passed, reason null: No space left on device

so this is where /tmp befomes full (this is after a rebuild and
reboot, and is back using mounted tmpfs).

After that:

tc-se:mlock_mmap: mmap(NULL, 4096, 0x3, 0x1802, -1, 0): MAP_FAILED [12] Cannot allocate memory

and soon after another "system write failed" kernel message like the one
above.

And finally:

tc-se:t_mlock: pagesize 4096
tc-se:mlock_nested: buf = 0x7f7ff7ed6000 (page=4096)
tc-se:mlock_nested: i=0 (of 100) mlock(0x7f7ff7ed6000, 4096): -1 [35] Resource temporarily unavailable

Still the same page aligned buffer.   I think by this stage the system
is just so messed up that this is meaningless.

Another system write failed kernel printf, and more ATF errors about
being unable to save the routput because of the lack of space, and the
test eventually aborts (as in abort() or SIGABRT).


  | Was our previous allocator an older jemalloc, or was it phkmalloc?

I suspect perhaps the former, but I really have no idea.   Christos?

  | I would suggest adding:
  |
  | 	KASSERT(pg->wire_count != 0);
  |
  | Immediately before:
  |
  | 	pg->wire_count--;

I will do that, but that will be more for the b5 tests, as this panic
does not happen for me.    That means it will be a day or two (probably)
before we get any results.

Thanks,

kre



Home | Main Index | Thread Index | Old Index