Subject: Re: Crash in sbp2_abort()?
To: Chris Tribo <ctribo@hopi.dtcc.edu>
From: Peter Seebach <seebs@plethora.net>
List: current-users
Date: 05/18/2003 21:24:07
In message <Pine.LNX.4.44.0305182137450.12194-100000@hopi.dtcc.edu>, Chris Trib
o writes:
>On Sun, 18 May 2003, Peter Seebach wrote:
>
>> Anyone out there using ieee1394 drives successfully in -current?  I have
>> a drive which used to work fine, but today, I get page fault panics in
>> sbp2_abort().

>	I have an open PR on this, but thus far I have not been able to 
>debug, GDB will not give me a stack trace of the kernel core.

I have debugged it.  The problem is that, in sbp2.c, we have code which
cannot possibly have been correct, in the following ways:

1.  sbp2_free() is freeing orbs, but so is sbp2_abort().
2.  sbp2_free() is doing a loop equivalent to
	while (CIRCLEQ_EMPTY(list)) {
		orb = CIRCLEQ_FIRST(list);
		sbp2_abort(orb);
		sbp2_free_orb(orb);
	}
	orb = CIRCLEQ_FIRST(list);
	sbp2_abort(orb);
	sbp2_free_orb(orb);

This is "obviously wrong", to say nothing of the questionable decision to
hand-duplicate CIRCLEQ_EMPTY rather than using the macro.

1.  sbp2_abort() frees orbs in some cases - so we shouldn't call sbp2_free_orb
again.
2.  In any event, we do not leave the list until it is empty.  At this point,
CIRCLEQ_FIRST(list) is no longer an orb object, but the list itself - thus the
strangely "corrupt" orb.
3.  The comment above this loop says that sbp2_abort won't free the last item,
so we have to do it by hand, but the list makes no provision for escaping the
loop until it's empty.

Simply removing the attempt to delete the "last" item got my machine up and
running, but this code is sufficiently incoherent that I'm pretty sure that
it's half-overhauled, and I don't know what it *should* look like.

-s