Port-ofppc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

re: Current port status



> > 0xa94a3940: at vpanic+0x21c
> > 0xa94a3970: at panic+0x4c
> > 0xa94a39b0: at lockdebug_abort1+0xdc
> > 0xa94a39d0: at mutex_abort+0x50
> > 0xa94a39e0: at mutex_enter+0x26c                          [5]
> > 0xa94a3a20: at pool_get+0x7c                              [4]
> > 0xa94a3a60: at pool_cache_get_slow+0x214
> > 0xa94a3a90: at pool_cache_get_paddr+0x290
> > 0xa94a3ae0: at m_get+0x3c
> > 0xa94a3af0: at m_gethdr+0xc
> > 0xa94a3b00: at vr_rxeof+0x3c8
> > 0xa94a3b70: at vr_intr+0x314
> > 0xa94a3bb0: at intr_deliver+0x7c
> > 0xa94a3bf0: at pic_do_pending_int+0x12c
> > 0xa94a3c30: at splx+0x38                                  [3]
> > 0xa94a3c40: at lockdebug_unlocked+0x190                   [2]
> > 0xa94a3c70: at mutex_exit+0x194                           [1]
> > 0xa94a3c90: at pool_get+0x1f8
> > 0xa94a3cd0: at pool_cache_get_slow+0x214
> > [...]
> > splx() call to return to the previous IPL.  at this point in [3] we
> > check for other interrupts at higher levels, and we end up having a PCI
> > intr on the vr. eventually this goes down to [4], which is operating on
> > the same pool as the mutex in [1] belongs to but hasn't actually been
> > completely released yet,
> 
> This is something I don't understand. Obviously mutex_exit() had been called
> in [1] for this pool, so why didn't it have been "completely released" when
> entering again in [5]?
> 
> I don't know very much about the internals of kernel mutexes, but when
> mutex_exit() would really call splx() before having released everything
> this sounds like a bug in mutex_exit(), which is hard to believe as it
> works in most other environments.

yeah, it was strange at first to me.  i think the problem is that
m_get() is being called at both IPL_SOFTNET (the first m_get() in
the stack trace, from softint) and IPL_VM (the second, from beyond
vr_intr().)  ie, vr(4) shouldn't be allocating from the intr.  but
i'm not really a network driver person...

the problem about completely released is that:

        - before releasing the mutex, we check it is properly held

        - this check, in lockdebug, runs under splhigh()

        - when this code returns, and calls splx() the intr is
          services

        - the very next line after the lockdebug check is the
          release of the underlying simple lock.

it's all ugly but i think correct, and a vr(4) bug as we have
been suspecting.


.mrg.


Home | Main Index | Thread Index | Old Index