NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-amd64/39283: Kernel crash on Dell Poweredge 2950



The following reply was made to PR port-amd64/39283; it has been noted by GNATS.

From: Mindaugas Rasiukevicius <rmind%netbsd.org@localhost>
To: Tobias Nygren <tnn%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, gnats-admin%netbsd.org@localhost, 
netbsd-bugs%netbsd.org@localhost,
 fredrik%netbsd.se@localhost
Subject: Re: port-amd64/39283: Kernel crash on Dell Poweredge 2950
Date: Mon, 14 Dec 2009 20:54:02 +0000

 Hello,
 
 Tobias Nygren <tnn%NetBSD.org@localhost> wrote:
 >  It tripped over again. Backtrace is similar to before but not identical.
 >  Looks like lock recursion now (notice the bnx interrupt).
 >  Would it be possible (and safe?) to return immediately without doing any
 >  work if mutex_owned()?
 
 Now this is a locking bug.  Do you mean using mutex_owned() to make locking
 decisions?  In such case - no, it would be very wrong, and would also not
 work on spin-mutex.
 
 >  panic: lock error
 >  cpu_Debugger() at netbsd:cpu_Debugger+0x9
 >  panic() at netbsd:panic+0x1f6
 >  lockdebug_abort() at netbsd:lockdebug_abort+0x8f
 >  mutex_abort() at netbsd:mutex_abort+0x29
 >  mutex_vector_enter() at netbsd:mutex_vector_enter+0x1c4
 >  pool_cache_invalidate() at netbsd:pool_cache_invalidate+0x23
 >  pool_reclaim() at netbsd:pool_reclaim+0x69
 >  pool_reclaim_callback() at netbsd:pool_reclaim_callback+0x41
 >  callback_run_roundrobin() at netbsd:callback_run_roundrobin+0x100
 >  ...
 
 From the backtrace, it seems there are three paths competing on the same
 thing, basically - reclaim on VA cache of kmem_map (since more layers are
 involved, like vmem quantum cache, it goes through pool subsystem couple
 times).  The following interrupt happens (3rd path) while reclaiming, and
 it tries to reclaim again from interrupt context and probably locks against
 oneself ("lock error" would be meaningful with LOCKDEBUG, in this case):
 
 > bnx_intr() at netbsd:bnx_intr+0xf1
 > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 > Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xf4
 > --- interrupt ---
 > mutex_enter() at netbsd:mutex_enter+0x11
 > pool_reclaim() at netbsd:pool_reclaim+0x69
 > pool_reclaim_callback() at netbsd:pool_reclaim_callback+0x41
 
 This is a bit confusing.  Since kmem_map is VM_MAP_INTRSAFE, pool should be
 interrupt-safe too i.e. run at IPL_VM and that mutex should be a spin-lock,
 blocking bnx_intr() as it runs at IPL_NET (== IPL_VM).
 
 Unfortunately, I had not have time yet to figure out more, but can add some
 KASSERT()s if you are OK to crash machine a little bit more? :)
 
 -- 
 Mindaugas
 


Home | Main Index | Thread Index | Old Index