Deadlock (maybe related to PR kern/56925)

To: NetBSD User Maillist <netbsd-users%NetBSD.org@localhost>, tech-kern%netbsd.org@localhost
Subject: Deadlock (maybe related to PR kern/56925)
From: BERTRAND Joël <joel.bertrand%systella.fr@localhost>
Date: Fri, 18 Nov 2022 08:49:53 +0100

	Hello,

	For a very long time (I don't remember if it was from 9.0 or 9.1), my
main server randomly panics or enters in a deadlock when it tries to
access to an iSCSI NAS. This panics is not related to hardware failure
as I have changed motherboard, disks, memory wit spares components with
excatly the same result.

	This night, it enters in a deadlock. Twice ! I have built a new kernel
yesterday from source tree :

legendre# uname -a
NetBSD legendre.systella.fr 9.3_STABLE NetBSD 9.3_STABLE (CUSTOM) #17:
Thu Nov 17 23:16:11 CET 2022
root%legendre.systella.fr@localhost:/usr/src/netbsd-9/obj/sys/arch/amd64/compile/CUSTOM
amd64

	My kernel is a CUSTOMized kernel as I have added ALTQ support. Indeed,
this server is my main professionnal server and it has to set priority
on VoIP packets and some other IP traffics.

	I have tried to rebuild some time ago a kernel with LOCKDEBUG option
but this kernel has never reached init. You will find in attachement
dmesg of current kernel.

	I can see panics, always the same panics when system tries to access to
a Qnap NAS over iSCSI. I have tried to bissect, I believe if I stop NAS,
system is stable. Panic is always panic I have copied in PR/56925

	Network configuration :
- bridge0 (wm0 and wm1) is connected to two Qnap NAS (iSCSI, MTU 9000)
- wm2 : WAN (MTU 1500), IPv4 default route
- agr0 (wm3 and wm4) : LAN (MTU 1500)
- tap0 : WAN, IPv6 default route (MTU 1500), I have to use an IPv6 broker.

	System configuration :
- Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
- 16 GB
- motherboard ASUSTeK Z97-E
- 7 internal SATA disks (ccd0 [wd0, wd1], raid0 [wd2, wd3], raid1 [wd4,
wd5, wd6])

legendre# df
Filesystem    1K-blocks       Used      Avail %Cap Mounted on
/dev/raid0a    32529068   13521382   17381234  43% /
/dev/raid0e    65058298   30805934   30999450  49% /usr
/dev/raid0f    32529068   26574944    4327672  85% /var
/dev/dk5       16515182    3715070   11974354  23% /var/squid/cache
/dev/raid0g   264277976   97612068  153452012  38% /usr/src
/dev/raid0h   548684628  303773356  217477044  58% /srv
/dev/dk0     3876580176 1408841464 2273909704  38% /home
kernfs                1          1          0 100% /kern
ptyfs                 1          1          0 100% /dev/pts
procfs                4          4          0 100% /proc
tmpfs           4162812         48    4162764   0% /var/shm
/dev/dk6     11335898764 9743618440 1025485388  90% /opt/bacula
/dev/dk7     11343502476 2175889896 8600437460  20% /opt/video

legendre# raidctl -s raid0
Components:
           /dev/wd2a: optimal
           /dev/wd3a: optimal

legendre# raidctl -s raid1
Components:
           /dev/wd4e: optimal
           /dev/wd5e: optimal
           /dev/wd6e: optimal

legendre# ccdconfig -g
ccd0            32      0x0     2000408739840   /dev/wd0a /dev/wd1a

/opt/bacula and /opt/video are iSCSI NAS.

ccd0 contains a lot of partitions: /var/squid/cache and all swap
partitions exported (iSCSI) for all diskless workstations on LAN. This
systems acts as an iSCSI initiator (for both NAS) and as iSCSI target
(istgt) for diskless workstations.

	I don't know how investigate (LOCKDEBUG kernel doesn't boot) and I have
to fix this bug as soon as possible as I cannot continue with an
unstable main server.

	Best regards,

	JB

Attachment: dmesg.gz
Description: application/gzip

Prev by Date: Re: switched from 9.99 to 9.3, init dies
Next by Date: Re: help with cron/rsync error message
Previous by Thread: switched from 9.99 to 9.3, init dies
Next by Thread: Local-only (non-ip) rpcbind(8)?
Indexes:

Home | Main Index | Thread Index | Old Index