Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Filesystem tests crashing host



On Sat, Apr 16, 2011 at 11:54:26AM -0700, Paul Goyette wrote:
> (Resending, this time with subject line and cc's)
> 
> >> I still say revert rmind's changes of 2011.04.11.22.31.43, because
> >> that's when the failures started.  My logs show six test runs > between 
> >christos' change to kern_descrip.c (at 2011.04.10.15.45.33) > and rmind's 
> >changes, and none of those test runs paniced; after > rmind's changes, 
> >every single test run has paniced.
> >
> >Problem is not diagnosed.  It cannot be reproduced on real hardware,
> >and I do not see how f_ops can become invalid when using semaphore.
> >Even if we assume that it can - the semaphore code should actually be
> >*used* in the first place.  However, it seems that neither failing
> >ATF tests, nor ATF itself are using semaphores.  Can somebody prove
> >me wrong on this?
> >
> >Perhaps a simple printf("f_type = %d\n", fp->f_type) would hint what
> >type of descriptor is actually failing.  Also, a wild guess - can one
> >reproduce the problem with the following changes reverted:
> 
> I'm working on building kernels with each commit backed out - it will take 
> a while.
> 
> However, I have been able to dump the file structure:
> 
>       f_offset        0000 0000 0000 0000
>       f_cred          ffff 8000 098e db40
>       f_ops           ffff ffff 80ef fac0
>       f_data          ffff 8000 09de 28c0
>       f_list.next     ffff 8000 0ab7 f4c0
>       f_list.prev     ffff ffff 80cb 2728
>       f_lock          0000 0000 0000 0000
>       f_flag          0000 0003
>       f_marker        0000 0000
>       f_type          0000 0008
>       f_advice        0000 0000
>       f_count         0000 0000
>       f_msgcount      0000 0000
>       f_unpcount      0000 0000
>       f_unplist.next  0000 0000 0000 0000
> 
> Note that both the f_ops and f_list.prev pointers seem to be corrupt, and 
> that the type of this structure is semaphore = 8

I added some instrumentation too. With a printf from ksem_sysinit(),
ksem_sysfini() and do_ksem_init()/do_ksem_open() I get:

        fs/nfs/t_mountd (91/400): 1 test cases
            mountdhup:
        ksem_sysinit ops 0xcb0d4bc0
        fp 0xcaf91780 ops 0xcb0d4bc0
        ksem_sysfini ops 0xcb0d4bc0
        uvm_fault(0xc0b0f380, 0xcb0d4000, 1) -> 0xe

Backtrace:

        closef(caf91780, ...) <== the file with the ksem above!
        fd_free()
        exit1()
        sigexit()
        postsig()
        lwp_userret()
        syscall()

The fault address is the page containing the ops vector of the now unloaded
ksem module -- page fault -- boom.

-- 
Juergen Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig 
(Germany)


Home | Main Index | Thread Index | Old Index