NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/40466: endless looping in ffs_sync() on WAPBL mounts...



>Number:         40466
>Category:       kern
>Synopsis:       endless looping in ffs_sync() on WAPBL mounts...
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 24 17:35:00 +0000 2009
>Originator:     Greg Oster
>Release:        NetBSD 5.99.7
>Organization:
>Environment:
System: NetBSD thog 5.99.7 NetBSD 5.99.7 (MONOLITHIC) #1: Sat Jan 24 11:06:25 
CST 2009  oster@quad:/u1/devel/current2/src/sys/arch/i386/compile/MONOLITHIC 
i386
Architecture: i386
Machine: i386
>Description:

Given a MONOLITHIC -current kernel (with COMPAT_50) with a 5.0_BETA
userland, attempt to mount a freshly newfs'ed partition with '-o log'
options.  Wonder why the machine suddenly stops responding.  Add
instrumentation to ffs_sync(), and determine that it is looping
endlessly around the "loop:" label.

>How-To-Repeat:


thog# newfs /dev/rwd1f
/dev/rwd1f: 9765.6MB (20000000 sectors) block size 16384, fragment size 2048
        using 53 cylinder groups of 184.27MB, 11793 blks, 23296 inodes.
super-block backups (for fsck_ffs -b #) at:
32, 377408, 754784, 1132160, 1509536, 1886912, 2264288, 2641664, 3019040,
...............................................................................
thog# fsck -f /dev/rwd1f    
** /dev/rwd1f
** File system is already clean
** Last Mounted on 
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
1 files, 1 used, 4921974 free (14 frags, 615245 blocks, 0.0% fragmentation)
thog# mount -o log /dev/wd1f /u2 
Read from remote host thog: Connection reset by peer

at this point we break into ddb, and discover that the mount_ffs
process seems to be in various places in ffs_sync() - i.e. can tell
ddb to continue, break again, and it's often in a different function.
Add instrumentation to ffs_sync(), and determine that, indeed,
ffs_sync() is looping hard through the "loop:" label.

In the above case, / was also mounted as a logging filesystem.

Surprisingly, however, I can no longer trigger this bug:
 http://gnats.netbsd.org/40361
with this kernel.....?????  (i.e. now having / mounted as non-log and
/u2 mounted as log works again... so at least now I can hack on
logging code without having to worry as much about wrecking / :-/ )

>Fix:
        Please.  



Home | Main Index | Thread Index | Old Index