tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: something really screwed up with mmap+ffs on 5.0_STABLE

On Tue Aug 17 2010 at 19:06:38 +0300, Antti Kantee wrote:
> Usually after the bus has seen one generation (i.e. the pages have been
> faulted in to all processes) there are no further problems.  However,
> causing (read) faults from a 3rd party process not involved with the
> test may trigger the problem.

I flew with that idea and added code to prefault in all the backend
pages in driver attachment.  That makes the corruption go away, so it
works around the bug.  I'm still not sure where exactly the bug is.
Also, I'm not sure how well the workaround works if a lot of interfaces
attach at different times, i.e. cause the initial faults when the bus is
already in use.  Furthermore, I'm not sure how this would affect archs
such as sparc64 which use traps to catch page modification.  Anyway,
i386 for the most part works now.

In ~2 hours of test runs I did once come across the situation where a
process hung in uvn_fp2 (unkillable, of course, and others attempting to
access the file afterwards "turnstiles" most likely on the vnode lock).
The stacktrace was pretty much the expected uvm_fault -> VOP_GETPAGES ->
genfs_getpages -> uvn_findpages.  This was vanilla ffs without wapbl
or softdeps.

It would be great if someone could confirm or debunk this on -current
and for archs beyond i386.  Just get the latest sources, go to
sys/rump/net/lib/libshmif, comment out line 61 (the one with PREFAULT_RW)
from if_shmif.c, "make && make install", and run tests/net/icmp/t_ping
floodping in a loop.  You should see a coredump within a few thousand
iteratios (few minutes) if the problem is there.

I'll file a PR if someone can repeat the problem.  Otherwise I'll just
be happy with the workaround and bury the issue.

Home | Main Index | Thread Index | Old Index