Subject: Re: softdep crash - more info
To: Brian Gregor <bgregor@BUPHY.bu.edu>
From: enami tsugutomo <enami@sm.sony.co.jp>
List: port-i386
Date: 10/04/2001 14:42:29
> The PC has the sources for 1.5.2 installed too.  I mounted
> them over NFS on a Sparc Classic running 1.5.2 kernel,
> 1.5.1 world for a 'make build'.  Here 's the debugger message
> and stack trace:
> 
> panic: softdep_pageiodone: resid < 0, vp 0xd7f97944 lbn 0x0 pcbp
> 0xd812e000
> Stopped in pid 129 (nfsd) at cpu_Debugger+0x1:     ret
> db> t
> cpu_Debugger(c0290420,d7f97944,0,d812e000,c003ef08) at cpu_Debugger+0x1
> softdep_pageiodone(c083ef08,c08ef08,1,d7be6a50,c056bb3c) at
> softdep_pageiodone+0x159

The story is:

1. Some data are appended to file and EOF pasts the fragment boundary
   as a result.

2. ffs code extends the fragment and tell softdep code the new size of
   fragment.

3. ffs code flush up to old end of fragment.  Upon I/O completion,
   softdep code notices that.

4. ffs code copies the data (i.e., modify page) and flush it.  Upon
   I/O completion, softdep code notices that.

5. Now softdep code panics since ``the size told at step 2.'' !=
   ``actuall data transfered at step 3. and step 4.''.  They are
   usually different since there is overlap around old end of
   fragment.

I'm not sure why this begin to occur recently even though the code at
step 3. exists while ago.

enami.