Subject: kern/20653: lfs_segwrite panic
To: None <gnats-bugs@gnats.netbsd.org>
From: None <scotte@warped.com>
List: netbsd-bugs
Date: 03/11/2003 02:57:07
>Number:         20653
>Category:       kern
>Synopsis:       LFS filesystem on -current causes panic as soon as lfs_cleanerd tries to clean it
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 10 18:58:00 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Scott Ellis
>Release:        NetBSD 1.6P
>Organization:
   //////////////////////////////////////////////////////////////////////
  //    Scott Ellis     //             scotte@warped.com              //
 //////////////////////////////////////////////////////////////////////
// WARNING: This signature warps  time and space in its vicinity    //
>Environment:
	
	
System: NetBSD intrepid 1.6P NetBSD 1.6P (INTREPID.APM.DDB) #0: Mon Mar 10 17:45:21 PST 2003 scotte@intrepid:/usr/src/sys/arch/i386/compile/INTREPID.APM.DDB i386
Architecture: i386
Machine: i386

Userland and kernel are from the same day.

>Description:
	
Using the /dev/wd1f partition in the following table:

intrepid# df -i
Filesystem  1K-blocks     Used     Avail Capacity iused   ifree  %iused  Mounted on
/dev/wd0a     3060024  2640320    266696    90%   55398   40856    57%   /
mfs:37          97863        4     92965     0%       6   24760     0%   /tmp
/dev/wd0e      992812    68280    874888     7%    7050  241780     2%   /var
/dev/wd0f   112381288 89393216  17369000    83%  224627 3295371     6%   /misc
/dev/wd1f    99693798 83534000  11175110    88%  187092 130520428     0%   /mounts/tempmisc

Seemingly as soon as lfs_cleanerd starts running, the system panics with:

lfs_segwrite: possibly invalid checkpoint!
lfs_segwrite: ifile still has dirty blocks?!
bp=0xca29b6e0, lbn 51, flags 0x24080
bp=0xca2afe10, lbn 35, flags 0x24080
panic: dirty blocks
syncing disks...

This is repeatable:

lfs_segwrite: possibly invalid checkpoint!
lfs_segwrite: ifile still has dirty blocks?!
bp=0xca2de1d0, lbn 44, flags 0x24080
bp=0xca2dda50, lbn 51, flags 0x24080
panic: dirty blocks
Stopped in pid 492.1 (sync) at  cpu_Debugger+0x4:       leave
db> bt
cpu_Debugger(0,e38d5000,0,c017bae6,c027446e) at cpu_Debugger+0x4
panic(c027448c,e3c20000,e38d5000,212,c0f73000) at panic+0xb8
lfs_segwrite(c10f1600,5,c0406224,0,0) at lfs_segwrite+0x54b
lfs_sync(c10f1600,2,c1116000,e3be16a0,e3cd3488) at lfs_sync+0x74
sys_sync(e3cd3488,e3f79f80,e3f79f78,c021bf64,8049c60) at sys_sync+0x66
syscall_plain(1f,1f,1f,1f,bfbff6dc) at syscall_plain+0xab
db>

lfs_segwrite: possibly invalid checkpoint!
lfs_segwrite: ifile still has dirty blocks?!
bp=0xca2e0570, lbn 51, flags 0x24080
bp=0xca2e1470, lbn 35, flags 0x24080
panic: dirty blocks
Stopped in pid 11.1 (ioflush) at        cpu_Debugger+0x4:       leave
db> bt
cpu_Debugger(0,e3bf8000,0,c017bae6,c027446e) at cpu_Debugger+0x4
panic(c027448c,e3bfa000,e3bf8000,212,c10ef800) at panic+0xb8
lfs_segwrite(c1095a00,5,e38c7f68,c01e0e1d,0) at lfs_segwrite+0x54b
lfs_sync(c1095a00,3,c0e69f00,e38c31a8,e3f4c8f4) at lfs_sync+0x74
sync_fsync(e38c7f68,12,0,0,e389c500) at sync_fsync+0x5c
sched_sync(e389c500,e38b5500,0,0,c010030c) at sched_sync+0x11e
db>

fsck_lfs shows no problems with the partition.  Note that the partition got
to its current state via a simple "pax -rw -pe" from /misc to /mounts/tempmisc,
immediately after newfs_lfs'ing and mounting it.

The system hangs after "syncing disks...", and never generates a dump file,
so I don't have an image for a thorough post-portem.

>How-To-Repeat:
	
Presumably, just make an LFS partition and start copying stuff to it. ;-)

>Fix:
	
I wish I knew. ;-) I'd really like to use LFS, but this seems to be
an un-recoverable problem here (since fsck doesn't fix it, and it happens
as soon as there's activity).

If there's any further info that is needed to debug this, let me know.  It's
repeatable within seconds. ;-)
>Release-Note:
>Audit-Trail:
>Unformatted: