NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [10.0_STABLE] Hard lock



CC: mlelstv@

On 2024/10/05 19:17, BERTRAND Joël wrote:
	I have rebuilt a kernel (same tree) with all diagnostic options. It
panics in iscsi routines when iscsictl tries to connect to first iscsi
volume.

[    74.238270] panic: mutex_vector_enter,517: uninitialized lock
(lock=0xffff938021d86010, from=ffffffff80f71234)
[    74.238270] cpu1: Begin traceback...
[    74.238270] vpanic() at netbsd:vpanic+0x183
[    74.238270] panic() at netbsd:panic+0x3c
[    74.238270] lockdebug_wantlock() at netbsd:lockdebug_wantlock+0x180
[    74.248268] mutex_enter() at netbsd:mutex_enter+0x23f
[    74.248268] send_pdu() at netbsd:send_pdu+0x1b5
[    74.248268] send_logout() at netbsd:send_logout+0x1d4
[    74.248268] kill_connection() at netbsd:kill_connection+0x2fa
[    74.248268] kill_session() at netbsd:kill_session+0x134
[    74.248268] iscsiioctl() at netbsd:iscsiioctl+0x30f
[    74.248268] sys_ioctl() at netbsd:sys_ioctl+0x56d
[    74.248268] syscall() at netbsd:syscall+0x196
[    74.248268] --- syscall (number 54) ---
[    74.248268] netbsd:syscall+0x196:
[    74.248268] cpu1: End traceback...

You can download faulty kernel (with and without debug option) at
ftp://newton.systella.fr. (files NETBSD.;1 and NETBSD.GDB;1).

Please note that this server runs OpenVMS and use binary transfer.

You could try a current kernel.  If the problem is there in current,
it may be detected -- and reported in a more obvious way -- by the new
heartbeat(9) diagnostic where each CPU's progress is periodically
checked on by some other CPU
	I will try.

	Please note also last I cannot reboot my server with shutdown -r now if
I haven't killed (with kill -9) altqd. For me, it's not a real issue as
this server is two floors below my office, but for some users, if server
was far away...

Hmm, two commits for sys/dev/iscsi are missing for netbsd-10:

(1/2) https://mail-index.netbsd.org/source-changes/2023/12/28/msg149090.html
Use correct status value SCSI_BUSY (0x08) instead of XS_BUSY (7)...

(2/2) https://mail-index.netbsd.org/source-changes/2024/08/24/msg153012.html
Avoid race in timeout handling.
Don't try to wake up CCB without connection (which led to a NULL pointer deref).

Cherry-pick may help. Or -current is also broken for your case.

Thanks,
rin


Home | Main Index | Thread Index | Old Index