Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 10_BETA panic on Ultra 1



> From: Björn Johannesson <rherdware%yahoo.com@localhost>
> Date: Wed, 23 Aug 2023 20:34:02 +0000
> 
> I decided to upgrade my Sun Ultra 1 from 9 to 10_BETA. However booting the 10_BETA kernel
> panics. No serial console on this machine, but I took a picture of the crash.
> https://twitter.com/herdware/status/1694444851733971305
> 
> Just recently upgraded my Ultra 60 with the same build from a few days ago.
> Is this an Ultrasparc I regression?

Curious.  It seems that cpu_intr_p() is returning true from within a
soft interrupt.  The stack trace is incomplete, but I suspect the
softint dispatch happened before a hard interrupt handler (which
called rnd_add_data) completed.  The hard interrupt handler is likely
in a disk driver, probably trying to look up /sbin/init on disk, and
calling dk_done when the disk signalled xfer completion.

Is there something funny about Sun Ultra 1 interrupt dispatch that
doesn't apply to Ultra 60?  Are interrupt priorities broken, so that
(low-priority) soft interrupts can interrupt (high-priority) hard
interrupts?

Another possible explanation, instead of soft interrupts interrupting
hard interrupts, is that the sparc64 soft interrupt dispatch mechanism
always raises ci_idepth and therefore causes cpu_intr_p() to return
true in soft interrupt context.  But that seems implausible, because
there are lots of !cpu_intr_p() assertions in softint context, and we
haven't seen this in the qemu testbed, or in martin's netbsd-10 test
runs on a Sun v210, which are chugging along just fine:

https://releng.netbsd.org/b5reports/sparc64/commits-2023.08.html#end
https://www.netbsd.org/~martin/sparc64-atf-netbsd10/


The change tnn@ is referring to is this:

https://mail-index.netbsd.org/source-changes/2023/08/11/msg146892.html

I think you must already have this change because the stack trace
shows a path through soft interrupt dispatch.

It looks like we did not pull up the stop-gap measure christos had
committed to HEAD before that proper fix:

https://mail-index.netbsd.org/source-changes/2023/07/11/msg145948.html

Would be curious to know if that makes a difference -- patch attached.
From bfe9c83389663fbfe56a4d74f3f6b8d2768a8d0d Mon Sep 17 00:00:00 2001
From: christos <christos%NetBSD.org@localhost>
Date: Tue, 11 Jul 2023 23:26:41 +0000
Subject: [PATCH] Move the rnd_add_uint32 outside the lock and get rid of
 dk_done1() suggested by riastradh@

cherry-picked from
https://mail-index.netbsd.org/source-changes/2023/07/11/msg145948.html
---
 sys/dev/dksubr.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/sys/dev/dksubr.c b/sys/dev/dksubr.c
index 2de7159a3c7f..2c544c726bef 100644
--- a/sys/dev/dksubr.c
+++ b/sys/dev/dksubr.c
@@ -77,7 +77,6 @@ static int dk_subr_modcmd(modcmd_t, void *);
 
 static void	dk_makedisklabel(struct dk_softc *);
 static int	dk_translate(struct dk_softc *, struct buf *);
-static void	dk_done1(struct dk_softc *, struct buf *, bool);
 
 void
 dk_init(struct dk_softc *dksc, device_t dev, int dtype)
@@ -442,7 +441,9 @@ dk_start(struct dk_softc *dksc, struct buf *bp)
 			if (error != 0) {
 				bp->b_error = error;
 				bp->b_resid = bp->b_bcount;
-				dk_done1(dksc, bp, false);
+				mutex_exit(&dksc->sc_iolock);
+				dk_done(dksc, bp);
+				mutex_enter(&dksc->sc_iolock);
 			}
 
 			bp = bufq_get(dksc->sc_bufq);
@@ -454,8 +455,8 @@ done:
 	mutex_exit(&dksc->sc_iolock);
 }
 
-static void
-dk_done1(struct dk_softc *dksc, struct buf *bp, bool lock)
+void
+dk_done(struct dk_softc *dksc, struct buf *bp)
 {
 	struct disk *dk = &dksc->sc_dkdev;
 
@@ -467,24 +468,16 @@ dk_done1(struct dk_softc *dksc, struct buf *bp, bool lock)
 		printf("\n");
 	}
 
-	if (lock)
-		mutex_enter(&dksc->sc_iolock);
+	mutex_enter(&dksc->sc_iolock);
 	disk_unbusy(dk, bp->b_bcount - bp->b_resid, (bp->b_flags & B_READ));
+	mutex_exit(&dksc->sc_iolock);
 
 	if ((dksc->sc_flags & DKF_NO_RND) == 0)
 		rnd_add_uint32(&dksc->sc_rnd_source, bp->b_rawblkno);
-	if (lock)
-		mutex_exit(&dksc->sc_iolock);
 
 	biodone(bp);
 }
 
-void
-dk_done(struct dk_softc *dksc, struct buf *bp)
-{
-	dk_done1(dksc, bp, true);
-}
-
 void
 dk_drain(struct dk_softc *dksc)
 {


Home | Main Index | Thread Index | Old Index