Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Test-bed time-out



On Fri, May 11, 2012 at 05:22:30PM +0300, Andreas Gustafsson wrote:
> Chuck Silvers wrote:
> > thanks for the info.  I tried to reproduce this myself but didn't have any 
> > luck.
> > I ran that one test program in a loop for 1/2 hour (several thousand cycles)
> > but it never got stuck.  that was on a real system, not qemu.  also, amd64.
> 
> Please try to reproduce it under qemu.  It's just four lines of
> typing, assuming you have misc/py-anita installed:
> 
>   $ anita interact 
> http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201205092050Z/i386/
>   ...
>   login: root
>   ...
>   # cd /usr/tests/rump/rumpvfs
>   # while true; do atf-run t_etfs; done
> 
> > that stack looks like the unmount thread wasn't sleeping, can you confirm
> > that the thread is continuing to run even though the unmount doesn't 
> > complete?
> > if that's the case, could you try to narrow down what it's doing?
> 
> It's looping in ffs_sync(), between ffs_vfsops.c line 1649:
> 
>   1649            for (vp = TAILQ_FIRST(&mp->mnt_vnodelist); vp; vp = nvp) {
> 
> and line 1742:
> 
>   1742                            goto loop;
> 
> and ffs_sync() never returns.

I found the problem, it was indeed in the change you identified.
the hang could only happen on single-CPU systems, and only if unmounting
the MFS happens at a time when all the dirty buffers in that file system
are already in the process of being written to disk.  in that case,
vflushbuf() would loop without sleeping, so that the mount_mfs process
would never get to run to actually process those "disk" writes.
it's a bit surprising that the unmount process continued to run forever
without being preempted by mount_mfs eventually.

-Chuck


Home | Main Index | Thread Index | Old Index