Ok, so I booted both kernels and guess what: no change in behaviour.Then with netbsd.gdb booting, I interrupted them at two different points in time, before the line
[ 63.9893876] admtemp0: workqueue busy: updates stoppedand after that, because there is a long break between the detection of the USB-hub and the admtemp0-message (might well be the indicated 60sec)
Backtrace from time before admtemp0-Timeout:[ 1.1200107] uhub0 at usb0: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
Stopped in pid 0.5 (system) at netbsd:cpu_Debugger+0x4: nop
db{0}> bt
intr_list_handler(101dbc880, 7, e00479b0, 0, 1044060, 2) at
netbsd:intr_list_handler+0x10
sparc_interrupt(101dbcc40, 1, e0047b90, 0, 1044020, e0048000) at
netbsd:sparc_interrupt+0x294
sparc_interrupt(1c70240, 70000000001, 2014000, 101db6ca0, 1, 1c72000) at
netbsd:sparc_interrupt+0x294
frag6_slowtimo(1c95730, 101db6ca0, 1c70000, 1cc5000, 1ca3650, 101db6ca4)
at netbsd:frag6_slowtimo+0x24
pfslowtimo(0, 101d88041, 0, 1cba478, 1c3c938, 18192b8) at
netbsd:pfslowtimo+0x40
callout_softclock(1cba480, 1000000, 10000, 30c0, 20c0, 1cba520) at
netbsd:callout_softclock+0xc8
softint_dispatch(2000, 1, 0, 101db6ca0, 1779500c0, 177950360) at
netbsd:softint_dispatch+0x80
softint_fastintr(101db6ca0, 1, e0047cf0, 0, 1044020, e0048000) at
netbsd:softint_fastintr+0x80
sparc_interrupt(f0056c1c, 1140d0, 1173b8, 0, fff57b48, 1) at
netbsd:sparc_interrupt+0x294
Backtrace from time after admtemp0-Timeout:[ 1.1200103] uhub0 at usb0: NetBSD (0000) OHCI root hub (0000), class 9/0, rev 1.00/1.00, addr 1
[ 63.5593913] admtemp0: workqueue busy: updates stopped
Stopped in pid 0.5 (system) at netbsd:cpu_Debugger+0x4: nop
db{0}> bt
intr_list_handler(101dbc880, 1, e0047b90, 101db6ca0, 1044060, e0048000)
at netbsd:intr_list_handler+0x10
sparc_interrupt(101d88040, 101d88041, 101db6ca0, 1cc5000, 1ca3400,
1cc7400) at netbsd:sparc_interrupt+0x294
callout_schedule_locked(101d88040, 101d88040, 1a80, 1cba478, 1cba478,
101d88040) at netbsd:callout_schedule_locked+0x94
callout_softclock(1cba480, 1000000, 10000, 30c0, 20c0, 1cba520) at
netbsd:callout_softclock+0x274
softint_dispatch(2000, 1, 0, 101db6ca0, 1779500c0, 177950360) at
netbsd:softint_dispatch+0x80
softint_fastintr(101db6ca0, 1, e0047cf0, 0, 1044020, e0048000) at
netbsd:softint_fastintr+0x80
sparc_interrupt(f0056c1c, 1140d0, 1173b8, 0, fff57b48, 1) at
netbsd:sparc_interrupt+0x294
db{0}>
Of course, the actual function that shows up in the bt at the time of
interrupt (BREAK) seems arbitrary, as I had different results in
different runs. I am not sure, how to read the backtrace, whether the
top-most-line is the address of the last return address on the stack, so
that you would go from top to bottom to learn the calling sequence... or
is it vice versa. And does the interupt from the BREAK show up in the
backtrace??
However, something striking is that callout_softclock is always involved... Am 17.06.21 um 13:20 schrieb Julian Coleman:
Hi,hmm, strange. Nothing attached to the Firewire-Ports. The code looks like that watchdog_clock never gets reset? Is there a way to disable the FW-ports via a boot.conf or something?Unfortunately, we need to remove it from the kernel.I downloaded the source for the 9.2-kernel and there seems to be a mismatch in the versions of firewire.c on the website you mentioned and what I have found in the kernel tree. The version in 9.2 seems to be 1.48 and the version on the website is 1.51. Actually, line 1323 in version 1.48 is in a different function... Which kernel is generated by the sources from the website?nxr.netbsd.org has the current sources. The 9.x kernels are on a different branch, which is why you see the different versions. There is also the CVS history for the file at: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/ieee1394/firewire.c Looking at that code though, it hasn't changed for a long time. So, I was wondering if something other change is causing that problem. It tempting to try to increase the multiplier from 15 to something larger to see if we just need to wait longer [1], or just to try a kernel without FW to really check that it is the problem. I've built a kernel from GENERIC without FW: http://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/netbsd and if you are able to test boot that, then it would be useful [2]. There is also the version with full debugging symbols [3]: http://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/netbsd.gdb and the kernel configuration that I used: http://ftp.netbsd.org/pub/NetBSD/misc/jdc/sparc64/GENERIC-NOFW Regards, Julian [1] Instead of waiting, we might just be able to check if we are running with interrupts after start, like the check here: https://nxr.netbsd.org/xref/src/sys/arch/sparc/dev/ts102.c#1044 [2] my test netbsd-9 has a few local changes in some drivers, but nothing that should affect this. [3] The .gdb file is useful because we can match a backtrace to a source line. For example:firewire_watchdog(101d8a040, 101d8a041, 0, 1cba038, 1cba038, 101d8a040) atnetbsd:firewire_watchdog+0x48 :; gdb netbsd.gdb (gdb) list *(firewire_watchdog+0x48) ...