Subject: Re: Softdep with ffs, how broken is it?
To: None <tech-kern@netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 07/01/2001 01:20:13
[ On Sunday, July 1, 2001 at 03:21:12 (+0300), Petri Koistinen wrote: ]
> Subject: Re: Softdep with ffs, how broken is it?
>
> Heavy load could be for example:
> (cd /usr/src/ ; cvs update) & (cd /usr/pkgsrc/ ; cvs update)
> I am running on 1.5W (build 30th June) now.

I'm running only slighly older code (2001/06/24) now, but I've been
running 2001/06/19 code for a few day and over the whole period I've
done, simultaneously, things similar to the following:

	make build
	rsh sparc make build
	cvs update (multiples in various directories)
	rsync rsync.netbsd.org/anoncvs /cvs/NetBSD

Plus running emacs, netscape(s), tkined, netsaint, apcupsd,
mpg123|sox|auplay, etc., etc., etc.

My load average varies from 2-10, with ~5 average for a busy time (and
the machine stays highly responsive to interactive stuff -- only
occasionally with heavy NFS client activity would things freeze for a
few seconds).

Until this afernoon I only had softdep enabled on my local CVS
repository filesystem, and in my main "work" filesystem where the local
source is checked out.  Now I've got it on RELEASEDIR and BSDOBJDIR too.

> I must admit that -current seems to be stable now, maybe something has
> changed and I can't cause anymore kernel panic.
> 
> Before softdep_update_inodeblock() was causing panic. After crash I made
> kernel with DIAGNOSTIC option turned on and didn't manage to get kernel
> panic.

So far I've only had crashes during the early hours of the AM,
presumably in some phase of /etc/daily.  These were with 06/19 code
though.  We'll see of 06/24 code gets through the night tonight!  :-)

Here's a trace of what the last two looked like:

panic: lockmgr: release of unlocked lock!
Stopped in pid 4517 (cron) at   cpu_Debugger+0x4:       leave
db> trace 
cpu_Debugger(d0ba48d8,6,0,d0d5ce04,c019c8ce) at cpu_Debugger+0x4
panic(c0384380,8085000,1,3000,d0a12420) at panic+0x8e
lockmgr(d0ba48d8,6,0) at lockmgr+0x88e
uvm_loan(d0ba48d4,8085000,3000,c0cf7890,2) at uvm_loan+0x243
pipe_write(d0ddc734,d0ddc750,d0d5cf04,c08a3f00,1) at pipe_write+0x38a
dofilewrite(d0e5e91c,5,d0ddc734,8084000,32fb) at dofilewrite+0x94
sys_write(d0e5e91c,d0d5cf80,d0d5cf78) at sys_write+0x63
syscall_plain(2b,805002b,bfbf001f,807001f,8084000) at syscall_plain+0x98
db> 


> Still once under heavy load I got strange message:
> 
> Data modified on freelist: word 10 of object 0x307b00 size 84 previous
> type UVM amap (0x2f1958 != 0xdeadbeef)
> 
> Data modified on freelist: word 6 of object 0x2f1940 size 28 previous type
> diradd (0x307600 != 0xdeadbeef)

I get those occasionally too, but only on i386, not yet on the sparc
(though the sparc is a lot less busy -- only doing 'make build" really):

Jun 30 19:55:41 proven /netbsd: Data modified on freelist: word 3 of object 0xc0ae1000 size 308 previous type key mgmt (0x2 != 0xdeadbeef)

In my case the "type" has always been "key mgmt".
 
-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods@acm.org>     <woods@robohack.ca>
Planix, Inc. <woods@planix.com>;   Secrets of the Weird <woods@weird.com>