NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/51877: carp related panic during shutdown



The following reply was made to PR kern/51877; it has been noted by GNATS.

From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
To: Hauke Fath <hf%spg.tu-darmstadt.de@localhost>
Cc: "gnats-bugs%NetBSD.org@localhost" <gnats-bugs%netbsd.org@localhost>, kern-bug-people%netbsd.org@localhost, 
	gnats-admin%netbsd.org@localhost
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 13:41:01 +0900

 On Mon, Jan 16, 2017 at 9:14 PM, Hauke Fath <hf%spg.tu-darmstadt.de@localhost> wrote:
 > On 01/16/17 07:58, Ryota Ozaki wrote:
 >>
 >> Can you try with DEBUG && LOCKDEBUG if not enabled?
 >>
 >> And can you show me states of carp0 and routes just before shutdown?
 >> (ifconfig carp0 and netstat -nr -f inet)
 >
 >
 > Booting a 7.99.59 pf DEBUG/LOCKDEBUG/DIAGNOSTIC kernel from today's sources
 > on the carp(4) secondary machine, dmesg has:
 >
 > [...]
 > IPv6 mode: router
 > Configuring network interfaces: wm0 ixg0 wm4wm4: link state DOWN (was
 > UNKNOWN)
 >  vlan2 vlan3 vlan7 vlan8 vlan9 vlan10 vlan11 vlan12 carp0ifconfig:
 > SIOCAIFADDR_IN6: Can'tcarp2: state transition from: I
 >  assign requested address
 > carp3: state transition from: INIT -> to: BACKUP
 >  carp2 carp3 carp7carp7: state transition from: INIT -> to: BACKUP
 >  carp8carp8: state transition from: INIT -> to: BACKUP
 >  carp9carp9: state transition from: INIT -> to: BACKUP
 >  carp10carp10: state transition from: INIT -> to: BACKUP
 >  carp11carp11: state transition from: INIT -> to: BACKUP
 >  carp12carp12: state transition from: INIT -^@> to: BACKUP
 >  pfsync0.
 > [...]
 >
 > - note the mangled "Can't assign requested address" message - the -7 kernel
 > doesn't have that.
 >
 > # ifconfig  carp0
 > carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
 >         capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
 >         capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
 >         capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
 >         enabled=0
 >         carp: MASTER carpdev wm0 vhid 1 advbase 1 advskew 192
 >         address: 00:00:5e:00:01:01
 >         inet 130.83.42.73 netmask 0xfffffff8 broadcast 130.83.42.79
 > # netstat -nr -f inet
 > Routing tables
 >
 > Internet:
 > Destination        Gateway            Flags    Refs      Use    Mtu
 > Interface
 > default            130.83.42.78       UGS         -        -      -L wm0
 > 10.0.49/24         link#14            UC          -        -      -  vlan10
 > 10.0.49.252        link#14            UHL         -        -      -  lo0
 > 127/8              127.0.0.1          UGRS        -        -  33624  lo0
 > 127.0.0.1          lo0                UH          -        -  33624  lo0
 > 130.83.18.0/26     link#15            UC          -        -      -  vlan11
 > 130.83.18.60       link#15            UHL         -        -      -  lo0
 > 130.83.18.64/26    link#16            UC          -        -      -  vlan12
 > 130.83.18.124      link#16            UHL         -        -      -  lo0
 > 130.83.18.128/26   link#13            UC          -        -      -  vlan9
 > 130.83.18.188      link#13            UHL         -        -      -  lo0
 > 130.83.18.192/26   link#12            UC          -        -      -  vlan8
 > 130.83.18.252      link#12            UHL         -        -      -  lo0
 > 130.83.42.72/29    link#3             UC          -        -      -  wm0
 > 130.83.42.73       130.83.42.73       UH          -        -      -  carp0
 > 130.83.42.75       link#3             UHL         -        -      -  lo0
 > 130.83.197.0/28    link#10            UC          -        -      -  vlan3
 > 130.83.197.0/27    link#18            UC          -        -      -  carp2
 > 130.83.197.11      link#10            UHL         -        -      -  lo0
 > 130.83.197.16/28   link#9             UC          -        -      -  vlan2
 > 130.83.197.28      link#9             UHL         -        -      -  lo0
 > 130.83.228.0/26    link#11            UC          -        -      -  vlan7
 > 130.83.228.60      link#11            UHL         -        -      -  lo0
 > 192.168.27.0/28    link#7             UC          -        -      -  wm4
 > 192.168.27.12      link#7             UHL         -        -      -  lo0
 > # shutdown -r now
 > Shutdown  NOW!
 >
 > [...]
 >
 > Done running shutdown hooks.
 > Jan 16 12:55:32 Zinnenwand syslogd[433]: Exiting on signal 15
 > carp0: incorrect hash from 130.83.42.74
 > carp0: incorrect hash from 130.83.42.74
 > carp0: incorrect hash from 130.83.42.74
 > syncing disks... done
 >
 > [...]
 >
 > igphy3: detached
 > wm3: detached
 > igphy2: detached
 > wm2: detached
 > igphy1: detached
 > wm1: detached
 > igphy0: detached
 > carp0: state transition from: MASTER -> to: INIT
 > Mutex error: lockdebug_barrier: spin lock held
 >
 > lock address : 0xfffffe821e74f400 type     :               spin
 > initialized  : 0xffffffff80426c5a
 > shared holds :                  0 exclusive:                  1
 > shares wanted:                  0 exclusive:                  0
 > current cpu  :                  2 last held:                  2
 > current lwp  : 0xfffffe810fc42000 last held: 0xfffffe810fc42000
 > last locked* : 0xffffffff8044b97a unlocked : 0xffffffff8046cfc4
 > owner field  : 0x0000000000010700 wait/spin:                0/1
 >
 > Skipping crash dump on recursive panic
 > panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
 
 The mutex error happened because uvm_fault_internal tries to hold
 a rwlock with holding a spin mutex. Can you identify the spin mutex
 by dissembling the kernel? The addresses above such as "last locked"
 will help to explore.
 
 That said, I guess the spin mutex is held after the fault below.
 (If a spin mutex is held before the fault, the below mutex_enter
 should fail with the same mutex error.)
 
 > cpu2: Begin traceback...
 > vpanic() at netbsd:vpanic+0x140
 > snprintf() at netbsd:snprintf
 > lockdebug_more() at netbsd:lockdebug_more
 > rw_enter() at netbsd:rw_enter+0x5fe
 > uvm_fault_internal() at netbsd:uvm_fault_internal+0x161
 > trap() at netbsd:trap+0x30a
 > --- trap (number 6) ---
 > mutex_tryenter() at netbsd:mutex_tryenter+0x12
 > lwp_trylock() at netbsd:lwp_trylock+0x17
 > turnstile_block() at netbsd:turnstile_block+0x238
 > mutex_enter() at netbsd:mutex_enter+0x36c
 
 I don't know why a fault happens inside mutex_enter. It's a global
 adaptive mutex that is never destroyed and stable. And the fault
 happened on a different place from the fault of the first report.
 Something broken around the mutex...?
 
 Just in case could you clean-build tools and the kernel and try again?
 If it doesn't help could you comment out rt_update_wait in _rt_free
 and try? Actually rt_update_wait isn't needed if !NET_MPSAFE for now.
 
 Thanks,
   ozaki-r
 


Home | Main Index | Thread Index | Old Index