Hello,
For a very long time (I don't remember if it was from 9.0 or 9.1), my
main server randomly panics or enters in a deadlock when it tries to
access to an iSCSI NAS. This panics is not related to hardware failure
as I have changed motherboard, disks, memory wit spares components with
excatly the same result.
This night, it enters in a deadlock. Twice ! I have built a new kernel
yesterday from source tree :
legendre# uname -a
NetBSD legendre.systella.fr 9.3_STABLE NetBSD 9.3_STABLE (CUSTOM) #17:
Thu Nov 17 23:16:11 CET 2022
root%legendre.systella.fr@localhost:/usr/src/netbsd-9/obj/sys/arch/amd64/compile/CUSTOM
amd64
My kernel is a CUSTOMized kernel as I have added ALTQ support. Indeed,
this server is my main professionnal server and it has to set priority
on VoIP packets and some other IP traffics.
I have tried to rebuild some time ago a kernel with LOCKDEBUG option
but this kernel has never reached init. You will find in attachement
dmesg of current kernel.
I can see panics, always the same panics when system tries to access to
a Qnap NAS over iSCSI. I have tried to bissect, I believe if I stop NAS,
system is stable. Panic is always panic I have copied in PR/56925
Network configuration :
- bridge0 (wm0 and wm1) is connected to two Qnap NAS (iSCSI, MTU 9000)
- wm2 : WAN (MTU 1500), IPv4 default route
- agr0 (wm3 and wm4) : LAN (MTU 1500)
- tap0 : WAN, IPv6 default route (MTU 1500), I have to use an IPv6 broker.
System configuration :
- Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
- 16 GB
- motherboard ASUSTeK Z97-E
- 7 internal SATA disks (ccd0 [wd0, wd1], raid0 [wd2, wd3], raid1 [wd4,
wd5, wd6])
legendre# df
Filesystem 1K-blocks Used Avail %Cap Mounted on
/dev/raid0a 32529068 13521382 17381234 43% /
/dev/raid0e 65058298 30805934 30999450 49% /usr
/dev/raid0f 32529068 26574944 4327672 85% /var
/dev/dk5 16515182 3715070 11974354 23% /var/squid/cache
/dev/raid0g 264277976 97612068 153452012 38% /usr/src
/dev/raid0h 548684628 303773356 217477044 58% /srv
/dev/dk0 3876580176 1408841464 2273909704 38% /home
kernfs 1 1 0 100% /kern
ptyfs 1 1 0 100% /dev/pts
procfs 4 4 0 100% /proc
tmpfs 4162812 48 4162764 0% /var/shm
/dev/dk6 11335898764 9743618440 1025485388 90% /opt/bacula
/dev/dk7 11343502476 2175889896 8600437460 20% /opt/video
legendre# raidctl -s raid0
Components:
/dev/wd2a: optimal
/dev/wd3a: optimal
legendre# raidctl -s raid1
Components:
/dev/wd4e: optimal
/dev/wd5e: optimal
/dev/wd6e: optimal
legendre# ccdconfig -g
ccd0 32 0x0 2000408739840 /dev/wd0a /dev/wd1a
/opt/bacula and /opt/video are iSCSI NAS.
ccd0 contains a lot of partitions: /var/squid/cache and all swap
partitions exported (iSCSI) for all diskless workstations on LAN. This
systems acts as an iSCSI initiator (for both NAS) and as iSCSI target
(istgt) for diskless workstations.
I don't know how investigate (LOCKDEBUG kernel doesn't boot) and I have
to fix this bug as soon as possible as I cannot continue with an
unstable main server.
Best regards,
JB
Attachment:
dmesg.gz
Description: application/gzip