Subject: Re: panic: dequeued wrong buf in -current
To: Andreas Wrede <email@example.com>
From: Manuel Bouyer <firstname.lastname@example.org>
Date: 09/07/2004 00:12:42
[ follow up to tech-kern ]
On Mon, Sep 06, 2004 at 12:05:12PM -0400, Andreas Wrede wrote:
> After upgrading from Aug 6 -current sources to today's I get a "panic:
> sdstart(): dequeued wrong buf" very early in the boot up sequence.
> There is another "panic: biodone already done" in the syncing disks...
> step and while the kernel produces a core dump, savecore does not
> recognize it.
Maybe reboot(0x104) would work ?
> Note that the root fs is on a RAID-1 set.
> Below, you'll find the traceback and the boot messages:
> panic: sdstart(): dequeued wrong buf
> Begin traceback...
> sdstart(c1aa9f00,c1a73080,0,4,0) at netbsd:sdstart+0x2ea
> sdstrategy(c1a73080,0,80,0,0) at netbsd:sdstrategy+0x1db
> spec_strategy(cc927874,cc8771f8,100000,404,c05229a0) at
> VOP_STRATEGY(cc8771f8,c1a73080,cc92791c,293,72) at
> rf_DispatchKernelIO(c1a20000,c1ae5074,1,0,3ddcbf) at
Juergen Hannken-Illjes has reported in private mail a similar problem,
on sparc64 without raidframe or ccd involved.
He started looking at this, and it appears that sdstart() is called twice,
once of the calls being interrupted. I followed the call graph and I don't
know where it could happen.
Both you and Juergen use the esiop driver, and this driver can call
scsipi_done() from esiop_scsipi_request(). This can likely cause sdstart() to
call itself. Other HBA drivers may do this as well.
A workaround would be to add a lock in sdstart() to avoid such recursion,
but this will have an impact on performances, as we loose opportunities to
keep the disk busy.
A better way would be to allow sdstart() to be reentrant.
Basically we need to deqeue the buf before calling the HBA's adapter request.
1) add a struct scsipi_xfer * argument to scsipi_command(): if this pointer is
not null it would use this xfer, otherwise it would try to allocate one
as it does now.
2) make scsipi_command() dequeue the buf itself. We can't do this for every
command with a buf, so this needs a new flag, or something
3) always dequeue the buf, and use a local FIFO queue when we're out of
I prefer 1) myself, as it can allow a more flexible error recovery procedure
on resources shortage in other cases too. However, it's quite intrusive as
all scsipi_command() calls needs to be touched (which means almost all
files in sys/dev/scsipi) As we want to get this pulled up to 2.0, 2) may be
Manuel Bouyer <email@example.com>
NetBSD: 26 ans d'experience feront toujours la difference