NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/51877: carp related panic during shutdown
The following reply was made to PR kern/51877; it has been noted by GNATS.
From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
To: Hauke Fath <hf%spg.tu-darmstadt.de@localhost>
Cc: "gnats-bugs%NetBSD.org@localhost" <gnats-bugs%netbsd.org@localhost>, kern-bug-people%netbsd.org@localhost,
gnats-admin%netbsd.org@localhost
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 13:41:01 +0900
On Mon, Jan 16, 2017 at 9:14 PM, Hauke Fath <hf%spg.tu-darmstadt.de@localhost> wrote:
> On 01/16/17 07:58, Ryota Ozaki wrote:
>>
>> Can you try with DEBUG && LOCKDEBUG if not enabled?
>>
>> And can you show me states of carp0 and routes just before shutdown?
>> (ifconfig carp0 and netstat -nr -f inet)
>
>
> Booting a 7.99.59 pf DEBUG/LOCKDEBUG/DIAGNOSTIC kernel from today's sources
> on the carp(4) secondary machine, dmesg has:
>
> [...]
> IPv6 mode: router
> Configuring network interfaces: wm0 ixg0 wm4wm4: link state DOWN (was
> UNKNOWN)
> vlan2 vlan3 vlan7 vlan8 vlan9 vlan10 vlan11 vlan12 carp0ifconfig:
> SIOCAIFADDR_IN6: Can'tcarp2: state transition from: I
> assign requested address
> carp3: state transition from: INIT -> to: BACKUP
> carp2 carp3 carp7carp7: state transition from: INIT -> to: BACKUP
> carp8carp8: state transition from: INIT -> to: BACKUP
> carp9carp9: state transition from: INIT -> to: BACKUP
> carp10carp10: state transition from: INIT -> to: BACKUP
> carp11carp11: state transition from: INIT -> to: BACKUP
> carp12carp12: state transition from: INIT -^@> to: BACKUP
> pfsync0.
> [...]
>
> - note the mangled "Can't assign requested address" message - the -7 kernel
> doesn't have that.
>
> # ifconfig carp0
> carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
> capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
> capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
> enabled=0
> carp: MASTER carpdev wm0 vhid 1 advbase 1 advskew 192
> address: 00:00:5e:00:01:01
> inet 130.83.42.73 netmask 0xfffffff8 broadcast 130.83.42.79
> # netstat -nr -f inet
> Routing tables
>
> Internet:
> Destination Gateway Flags Refs Use Mtu
> Interface
> default 130.83.42.78 UGS - - -L wm0
> 10.0.49/24 link#14 UC - - - vlan10
> 10.0.49.252 link#14 UHL - - - lo0
> 127/8 127.0.0.1 UGRS - - 33624 lo0
> 127.0.0.1 lo0 UH - - 33624 lo0
> 130.83.18.0/26 link#15 UC - - - vlan11
> 130.83.18.60 link#15 UHL - - - lo0
> 130.83.18.64/26 link#16 UC - - - vlan12
> 130.83.18.124 link#16 UHL - - - lo0
> 130.83.18.128/26 link#13 UC - - - vlan9
> 130.83.18.188 link#13 UHL - - - lo0
> 130.83.18.192/26 link#12 UC - - - vlan8
> 130.83.18.252 link#12 UHL - - - lo0
> 130.83.42.72/29 link#3 UC - - - wm0
> 130.83.42.73 130.83.42.73 UH - - - carp0
> 130.83.42.75 link#3 UHL - - - lo0
> 130.83.197.0/28 link#10 UC - - - vlan3
> 130.83.197.0/27 link#18 UC - - - carp2
> 130.83.197.11 link#10 UHL - - - lo0
> 130.83.197.16/28 link#9 UC - - - vlan2
> 130.83.197.28 link#9 UHL - - - lo0
> 130.83.228.0/26 link#11 UC - - - vlan7
> 130.83.228.60 link#11 UHL - - - lo0
> 192.168.27.0/28 link#7 UC - - - wm4
> 192.168.27.12 link#7 UHL - - - lo0
> # shutdown -r now
> Shutdown NOW!
>
> [...]
>
> Done running shutdown hooks.
> Jan 16 12:55:32 Zinnenwand syslogd[433]: Exiting on signal 15
> carp0: incorrect hash from 130.83.42.74
> carp0: incorrect hash from 130.83.42.74
> carp0: incorrect hash from 130.83.42.74
> syncing disks... done
>
> [...]
>
> igphy3: detached
> wm3: detached
> igphy2: detached
> wm2: detached
> igphy1: detached
> wm1: detached
> igphy0: detached
> carp0: state transition from: MASTER -> to: INIT
> Mutex error: lockdebug_barrier: spin lock held
>
> lock address : 0xfffffe821e74f400 type : spin
> initialized : 0xffffffff80426c5a
> shared holds : 0 exclusive: 1
> shares wanted: 0 exclusive: 0
> current cpu : 2 last held: 2
> current lwp : 0xfffffe810fc42000 last held: 0xfffffe810fc42000
> last locked* : 0xffffffff8044b97a unlocked : 0xffffffff8046cfc4
> owner field : 0x0000000000010700 wait/spin: 0/1
>
> Skipping crash dump on recursive panic
> panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
The mutex error happened because uvm_fault_internal tries to hold
a rwlock with holding a spin mutex. Can you identify the spin mutex
by dissembling the kernel? The addresses above such as "last locked"
will help to explore.
That said, I guess the spin mutex is held after the fault below.
(If a spin mutex is held before the fault, the below mutex_enter
should fail with the same mutex error.)
> cpu2: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> lockdebug_more() at netbsd:lockdebug_more
> rw_enter() at netbsd:rw_enter+0x5fe
> uvm_fault_internal() at netbsd:uvm_fault_internal+0x161
> trap() at netbsd:trap+0x30a
> --- trap (number 6) ---
> mutex_tryenter() at netbsd:mutex_tryenter+0x12
> lwp_trylock() at netbsd:lwp_trylock+0x17
> turnstile_block() at netbsd:turnstile_block+0x238
> mutex_enter() at netbsd:mutex_enter+0x36c
I don't know why a fault happens inside mutex_enter. It's a global
adaptive mutex that is never destroyed and stable. And the fault
happened on a different place from the fault of the first report.
Something broken around the mutex...?
Just in case could you clean-build tools and the kernel and try again?
If it doesn't help could you comment out rt_update_wait in _rt_free
and try? Actually rt_update_wait isn't needed if !NET_MPSAFE for now.
Thanks,
ozaki-r
Home |
Main Index |
Thread Index |
Old Index