[10.0_STABLE] Hard lock

To: NetBSD User Maillist <netbsd-users%NetBSD.org@localhost>
Subject: [10.0_STABLE] Hard lock
From: BERTRAND Joël <joel.bertrand%systella.fr@localhost>
Date: Fri, 4 Oct 2024 10:37:24 +0200

	Hello,

	I'm using NetBSD 10.0 (stable) as rolling release on my main server.
This server runs a customized kernel (amd64). Here is the diff between
GENERIC and my kernel:

legendre# diff -u GENERIC CUSTOM
...
 powernow0      at cpu0         # AMD PowerNow! and Cool'n'Quiet (non-ACPI)
-viac7temp*     at cpu?         # VIA C7, Nano and Zhaoxin temperature
sensor
 vmt0           at cpu0         # VMware Tools
...
-#options       ALTQ            # Manipulate network interfaces' output
queues
-#options       ALTQ_BLUE       # Stochastic Fair Blue
-#options       ALTQ_CBQ        # Class-Based Queueing
-#options       ALTQ_CDNR       # Diffserv Traffic Conditioner
-#options       ALTQ_FIFOQ      # First-In First-Out Queue
-#options       ALTQ_FLOWVALVE  # RED/flow-valve (red-penalty-box)
-#options       ALTQ_HFSC       # Hierarchical Fair Service Curve
-#options       ALTQ_LOCALQ     # Local queueing discipline
-#options       ALTQ_PRIQ       # Priority Queueing
-#options       ALTQ_RED        # Random Early Detection
-#options       ALTQ_RIO        # RED with IN/OUT
-#options       ALTQ_WFQ        # Weighted Fair Queueing
+options        ALTQ            # Manipulate network interfaces' output
queues
+options        ALTQ_BLUE       # Stochastic Fair Blue
+options        ALTQ_CBQ        # Class-Based Queueing
+options        ALTQ_CDNR       # Diffserv Traffic Conditioner
+options        ALTQ_FIFOQ      # First-In First-Out Queue
+options        ALTQ_FLOWVALVE  # RED/flow-valve (red-penalty-box)
+options        ALTQ_HFSC       # Hierarchical Fair Service Curve
+options        ALTQ_LOCALQ     # Local queueing discipline
+options        ALTQ_PRIQ       # Priority Queueing
+options        ALTQ_RED        # Random Early Detection
+options        ALTQ_RIO        # RED with IN/OUT
+options        ALTQ_WFQ        # Weighted Fair Queueing

-tco*   at tcoichbus?           # TCO watch dog timer
+tco*   at ichlpcib?            # TCO watch dog timer
+pseudo-device  iscsi

	I have upgraded my tree maybe 10 days ago. Before this upgrade system
was stable (uptime greater than 120 days).

	Since last upgrade, kernel has crashed twice. No panic, no output on
serial line, no information in log files or screen, physical keyboard
doesn't send characters (and of course ctrl+esc doesn't open internal
debugger). Kernel is only hard locked.

	DIAGNOSTIC doesn't return usable information. And DEBUG and LOCKDEBUG
cannot be used to debug as these options add a lot of expensive tests
(and server is unusable).

	I have tried to bissect without success.

	Network configuration :
- bridge0 (wm0 + wm1) => connection to two NAS (iscsi kernel initiator)
- wm2                 => WAN
- lagg0 (vm3 + wm4)   => LAN
- re0                 => DMZ

	altqd runs on this server (I cannot stop altqd with /etc/rc.d/altdq
stop, I have to do a kill -9, I have filled a PR a long time ago).

	Discs :
- wd0 + wd1           => ccd0 that contains swap devices for clients
(wedges are exported through istgt)
- wd2 + wd3           => raidframe (raid 1) for system and local swap
- wd4 + wd5 + wd5     => raidframe (raid 5) for home
- sd0                 => iscsi target (NAS 1)
- sd1                 => iscsi target (NAS 2)

	All partitions are formated with FFSv2 with extended attributes and log
option.	When system enters in lock state, I have to reboot in single
user mode to manualy run fsck -fpP. Without this command, when system
tried to mount disc device, log are replayed but partitions remain in
dirty state (and kernel will panic sooner or later).

	I haven't written particular configuration in /etc. Only nfsd run with
512 connections (/usr/sbin/nfsd -n 512) but system hangs in the middle
of the night when clients and NAS are idle.

	I've just rebuild a new kernel. I don't know if someone use a system
with a similar configuration (I suspect a bad interaction between ccd
and iscsi). But how can I found more information to debug ?

	Best regards,

	JB

Attachment: signature.asc
Description: OpenPGP digital signature

Follow-Ups:
- Re: [10.0_STABLE] Hard lock
  - From: Taylor R Campbell
- Re: [10.0_STABLE] Hard lock
  - From: Rin Okuyama

Prev by Date: Re: KiCad: "Could not use OpenGL, falling back to software rendering"
Next by Date: Re: [10.0_STABLE] Hard lock
Previous by Thread: Upgrading X11
Next by Thread: Re: [10.0_STABLE] Hard lock
Indexes:

Home | Main Index | Thread Index | Old Index