Some news.
System now runs kernel with ALTQ disabled (ALTQ is in kernel, but not
configured as ALTQD=NO is set in rc.conf). And... it has crashed last
night. Thus, altq is not responsible.
But I now, I have a suspect. I don't know if attached file will be sent
to mailing list.
Kernel is in pooldr state and only find is running (from /etc/daily).
Load average is very high (why ? All systems on LAN are idle and nfs
server daemon is idle also).
legendre# df -h
Filesystem Size Used Avail %Cap Mounted on
/dev/raid0a 31G 14G 16G 46% /
/dev/raid0e 62G 30G 29G 50% /usr
/dev/raid0f 31G 23G 6.7G 77% /var
/dev/raid0g 252G 121G 118G 50% /usr/src
/dev/raid0h 523G 337G 160G 67% /srv
/dev/dk0 3.6T 1.9T 1.5T 55% /home
kernfs 1.0K 1.0K 0B 100% /kern
ptyfs 1.0K 1.0K 0B 100% /dev/pts
procfs 4.0K 4.0K 0B 100% /proc
tmpfs 4.0G 16K 4.0G 0% /var/shm
/dev/dk5 11T 9.9T 121G 98% /opt/bacula
/dev/dk6 11T 3.2T 6.8T 31% /opt/video
legendre#
As I consider NetBSD 10.x is tested on standard configuration, I
suppose find crashes system when it try access to /dev/dk5 or /dev/dk6.
dk5 and dk6 are wedges on iSCSI target devices :
[ 2546.611910] sd0 at scsibus0 target 0 lun 0: <QNAP, iSCSI Storage,
4.0> disk fixed
[ 2546.631910] scsibus1 at iscsi0: 1 target, 16 luns per target
[ 2546.641918] sd0: fabricating a geometry
[ 2546.641918] sd0: 10980 GB, 11244416 cyl, 64 head, 32 sec, 512
bytes/sect x 23028563968 sectors
[ 2546.661910] sd0: fabricating a geometry
[ 2546.681910] sd0: GPT GUID: a5d27c7c-8eda-40e8-a29b-e85a539a5bc7
[ 2546.681910] dk5 at sd0: "bacula", 23028563901 blocks at 34, type: ffs
[ 2546.681910] sd0: async, 8-bit transfers, tagged queueing
[ 2546.681910] sd1 at scsibus1 target 0 lun 0: <QNAP, iSCSI Storage,
4.0> disk fixed
[ 2546.711910] sd1: fabricating a geometry
[ 2546.711910] sd1: 10988 GB, 11251968 cyl, 64 head, 32 sec, 512
bytes/sect x 23044030464 sectors
[ 2546.731910] sd1: fabricating a geometry
[ 2546.751909] sd1: GPT GUID: 799b4d25-970c-4a32-a388-a59470280de0
[ 2546.761910] dk6 at sd1: "video", 23044030397 blocks at 34, type: ffs
[ 2546.761910] sd1: async, 8-bit transfers, tagged queueing
Both NAS are connected to server through two dedicated wm interface
(direct connection).
legendre# ifconfig wm0
wm0:
flags=0x8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 9000
capabilities=0x7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=0x7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=0x7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0x3ff00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx>
enabled=0x3ff00<UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx>
enabled=0x3ff00<UDP6CSUM_Rx,UDP6CSUM_Tx>
ec_capabilities=0x17<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,EEE>
ec_enabled=0x3<VLAN_MTU,VLAN_HWTAGGING>
address: b4:96:91:92:77:6e
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
inet6 fe80::b696:91ff:fe92:776e%wm0/64 flags 0 scopeid 0x1
inet 192.168.12.1/24 broadcast 192.168.12.255 flags 0
legendre# ifconfig wm1
wm1:
flags=0x8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST>
mtu 9000
capabilities=0x7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=0x7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=0x7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0x3ff00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx>
enabled=0x3ff00<UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx>
enabled=0x3ff00<UDP6CSUM_Rx,UDP6CSUM_Tx>
ec_capabilities=0x17<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,EEE>
ec_enabled=0x3<VLAN_MTU,VLAN_HWTAGGING>
address: b4:96:91:92:77:6f
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
inet6 fe80::b696:91ff:fe92:776f%wm1/64 flags 0 scopeid 0x2
legendre#
wm0 and wm1 are bridged :
legendre# cat /etc/ifconfig.bridge0
create
mtu 9000
#inet6 2001:7a8:a8ed:1::2 prefixlen 64 alias
!brconfig $int add wm0
!brconfig $int add wm1
!brconfig $int up
!brconfig $int ipf
NetBSD 10
|
+------ wm0 ------ NAS0 (192.168.12.2) ------- /dev/dk5
| bridge0
+------ wm1 ------ NAS1 (192.168.12.3) ------- /dev/dk6
Faulty seems to be iscsi initiator or bridge. System was stable before
my last upgrade of my source tree. Faulty code seems to be added during
last six monthes.
Best regards,
JB
Attachment:
crash.jpg
Description: JPEG image
Attachment:
signature.asc
Description: OpenPGP digital signature