Port-alpha archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Panic on -current with MULTIPROCESSOR



> On Mar 30, 2021, at 2:49 PM, Jason Thorpe <thorpej%me.com@localhost> wrote:
> 
> So, something is a little fishy here.  I think there’s a chance that there is some slight brokenness with spin mutexes on Alpha that only show up on MULTIPROCESSOR kernels.  I’ll take a deeper look a little later today.

So, I spent a bunch of time looking at the Alpha spin lock handling code today, and it looks OK to me.  I don’t think there’s a generic problem with those here.

So let’s take a look at what else might be going on…

pmap_tlb_shootnow() takes 2 locks:

- The tlb_lock (1st)
- The specific pmap’s “activation lock” (2nd).  This is a global lock hash; multiple pmaps will share a lock.

tlb_lock is a spin mutex at IPL_VM.

The pmap activation locks are spin mutexes at IPL_SCHED.

Hmm.

So, spin mutexes are kind of interesting.  For those not aware of that they are, in the NetBSD kernel they’re basically like the old “s = splbio(); … splx(s);” pair, but in addition to blocking interrupts on the current cpu, they also block other CPUs from accessing data protected by that mutex.

When you acquire a spin mutex, the IPL is raised to what the mutex requires, and a counter is bumped (it’s actually decremented, but never mind that right now…), and if the new value of the count is 1, then the previous IPL is stashed away in a per-cpu variable.

When a spin mutex is released, the count is dropped, and if the new value is 0, is restores the saved IPL.

So, if you acquire tlb_lock and then a pmap activation lock, and the releasing them (in either order, but in this case it’s the same order), it’s:

	s = splraise(IPL_VM) -> splraise(IPL_SCHED) -> /* nothing */ -> splx(s)

Now, near the top of pmap_tlb_shootnow(), we have the comment:

        /*
         * Acquire the shootdown mutex.  This will also block IPL_VM
         * interrupts and disable preemption.  It is critically important
         * that IPIs not be blocked in this routine.
         */

This is because while we are spinning waiting for other CPUs to process TLB invalidations we’ve signaled, we need to be able to respond to IPIs that they might send (they might be waiting for US to respond to that one before they can process the one that we’ve sent…)

Being at IPL_SCHED would normally block processing of IPIs sent to this CPU.  It’s kind of unfortunate that I forgot about this little detail of how spin mutexes work when I overhauled the Alpha pmap recently.

I can fix this easily enough (with a call to alpha_pal_swpipl() after releasing the activation lock)… but it shouldn’t actually be necessary because of another thing that the low-level spin lock code does…

When spinning, the locking code is supposed to implement a backoff algorithm to reduce the expensive bus traffic that’s needed to implement the atomic memory operations underpinning the locks.  This is implemented by the SPINLOCK_BACKOFF() macro in <sys/lock.h>:

#define SPINLOCK_BACKOFF(count)                                 \
do {                                                            \
        int __i;                                                \
        for (__i = (count); __i != 0; __i--) {                  \
                SPINLOCK_BACKOFF_HOOK;                          \
        }                                                       \
        if ((count) < SPINLOCK_BACKOFF_MAX)                     \
                (count) += (count);                             \
} while (/* CONSTCOND */ 0);

Note that SPINLOCK_BACKOFF_HOOK … On the Alpha, this is defined as:

#define SPINLOCK_SPIN_HOOK                                              \
do {                                                                    \
        struct cpu_info *__ci = curcpu();                               \
        int __s;                                                        \
                                                                        \
        if (__ci->ci_ipis != 0) {                                       \
                /* printf("CPU %lu has IPIs pending\n",                 \
                    __ci->ci_cpuid); */                                 \
                __s = splhigh();                                                \
                alpha_ipi_process(__ci, NULL);                          \
                splx(__s);                                              \
        }                                                               \
} while (0)


I.e. while spinning, we should be able to process IPIs… but I wonder if this hook is not working for some reason…

> 
> 
> 
> 
> 
> 
>> 
>> Thanks,
>> John
>> 
>> 
>> [ 74219.1614327] TLB LOCAL MASK  = 0x0000000000000001
>> [ 74219.1614327] TLB REMOTE MASK = 0x0000000000000002
>> [ 74219.1614327] TLB REMOTE PENDING = 0x0000000000000002
>> [ 74219.1614327] TLB CONTEXT = 0xfffffc001b83dd58
>> [ 74219.1614327] TLB LOCAL IPL = 6
>> [ 74219.1614327] panic: pmap_tlb_shootnow
>> [ 74219.1614327] cpu0: Begin traceback...
>> [ 74219.1614327] alpha trace requires known PC =eject=
>> [ 74219.1614327] cpu0: End traceback...
>> Stopped in pid 5054.5054 (echo) at      netbsd:cpu_Debugger+0x4: ret
>> zero,(ra)
>> db{0}>
>> db{0}> bt
>> cpu_Debugger() at netbsd:cpu_Debugger+0x4
>> db_panic() at netbsd:db_panic+0xc8
>> vpanic() at netbsd:vpanic+0x10c
>> panic() at netbsd:panic+0x58
>> pmap_tlb_shootnow.part.0() at netbsd:pmap_tlb_shootnow.part.0+0x234
>> pmap_remove_internal() at netbsd:pmap_remove_internal+0x328
>> pmap_remove() at netbsd:pmap_remove+0x2c
>> uvm_unmap_remove() at netbsd:uvm_unmap_remove+0x158
>> sys_munmap() at netbsd:sys_munmap+0xb8
>> syscall() at netbsd:syscall+0x260
>> XentSys() at netbsd:XentSys+0x5c
>> --- syscall (73) ---
>> --- user mode ---
>> PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
>> 5054 >5054 7   0         0   fffffc02e8db0500               echo
>> 20918 20918 3   1       180   fffffc000ef9e480               make wait
>> 19235 19235 3   0       180   fffffc0019d9ee40               make pipe_rd
>> 28954 28954 3   1       180   fffffc001aa5c140               make pipe_rd
>> 2913  2913 3   0       180   fffffc001b093a80                top select
>> 7334  7334 3   1       180   fffffc0016176080         pbulk-scan pipe_rd
>> 17864 17864 3   1       180   fffffc000c14ea80                 sh wait
>> 28220 28220 3   0       180   fffffc0019d9e5c0                 sh wait
>> 26180 26180 3   1       180   fffffc001aa5c9c0               tcsh pause
>> 15702 15702 3   1       180   fffffc001b092100               tcsh pause
>> 5805  5805 3   1       180   fffffc001aa5d240               tcsh pause
>> 16556 16556 3   0       180   fffffc000c14eec0               tcsh pause
>> 18472 18472 3   1       180   fffffc000c14f300               tmux kqueue
>> 28224 28224 3   1       180   fffffc001aa5ce00               tcsh pause
>> 11357 11357 3   0       180   fffffc0019d9f280               sshd select
>> 14035 14035 3   0       180   fffffc0016176d40               sshd poll
>> 17970 17970 3   1       180   fffffc000c234680               tcsh ttyraw
>> 18858 18858 3   1       180   fffffc00161764c0               sshd select
>> 23130 23130 3   0       180   fffffc001b093200               sshd poll
>> 338    338 3   0       1c0   fffffc02fd68cdc0              getty ttyraw
>> 326    326 3   1       180   fffffc000ef9f9c0               cron nanoslp
>> 2204  2204 3   1       180   fffffc000ef9f580              inetd kqueue
>> 1695  1695 3   0       180   fffffc000ef9f140               sshd select
>> 206    206 3   0       180   fffffc000ef9e8c0                 cu poll
>> 205    205 3   1       180   fffffc000c234ac0                 cu ttyraw
>> 203    203 3   1       180   fffffc000c235bc0               ntpd pause
>> 2213  2213 3   0       180   fffffc000c235780               tcsh pause
>> 2043  2043 3   0       180   fffffc000c235340               tmux kqueue
>> 1240  1240 3   0       180   fffffc000c234f00            syslogd kqueue
>> 1        1 3   1       180   fffffc000086ee80               init wait
>> 0     1313 3   1       200   fffffc000c234240          acctwatch actwat
>> 0      228 3   0       200   fffffc00007d4140            physiod physiod
>> 0      126 3   1       200   fffffc000c14e200          pooldrain pooldrain
>> 0      125 3   1       200   fffffc000086fb40            ioflush syncer
>> 0      124 3   1       200   fffffc000086f700           pgdaemon pgdaemon
>> 0      121 3   1       200   fffffc02fd68c540            raidio0 raidiow
>> 0      120 3   0       200   fffffc02fd68d640              raid0 rfnodeq
>> 0      119 3   1       200   fffffc000086f2c0             npfgc0 npfgcw
>> 0      118 3   0       200   fffffc000086ea40            rt_free rt_free
>> 0      117 3   0       200   fffffc000086e600              unpgc unpgc
>> 0      116 3   1       200   fffffc000086e1c0    icmp6_wqinput/1 icmp6_wqinput
>> 0      115 3   0       200   fffffc0000787b00    icmp6_wqinput/0 icmp6_wqinput
>> 0      114 3   1       200   fffffc00007876c0            ip6flow ip6flow
>> 0      113 3   1       200   fffffc0000787280          nd6_timer nd6_timer
>> 0      112 3   1       200   fffffc0000786e40    carp6_wqinput/1 carp6_wqinput
>> 0      111 3   0       200   fffffc0000786a00    carp6_wqinput/0 carp6_wqinput
>> 0      110 3   1       200   fffffc00007865c0     carp_wqinput/1 carp_wqinput
>> 0      109 3   0       200   fffffc00007d4580     carp_wqinput/0 carp_wqinput
>> 0      108 3   1       200   fffffc00007d49c0     icmp_wqinput/1 icmp_wqinput
>> 0      107 3   0       200   fffffc00007d4e00     icmp_wqinput/0 icmp_wqinput
>> 0      106 3   1       200   fffffc00007d5240           rt_timer rt_timer
>> 0      105 3   0       200   fffffc00007d5680    ipflow_slowtimo ipflow_slowtimo
>> 
>> 0      104 3   1       200   fffffc00007d5ac0        vmem_rehash vmem_rehash
>> 0      103 3   1       200   fffffc0000786180          entbutler entropy
>> 0       29 3   0       200   fffffc02fd68da80               iic0 iicintr
>> 0       27 3   0       200   fffffc02fd68d200           scsibus2 sccomp
>> 0       25 3   0       200   fffffc02fd68c980           scsibus1 sccomp
>> 0       23 3   0       200   fffffc02fd68c100           scsibus0 sccomp
>> 0       22 3   0       200   fffffc02fd713a40            atabus1 atath
>> 0       21 3   0       200   fffffc02fd713600            atabus0 atath
>> 0       20 3   1       200   fffffc02fd7131c0            xcall/1 xcall
>> 0       19 1   1       200   fffffc02fd712d80          softser/1
>> 0       18 1   1       200   fffffc02fd712940          softclk/1
>> 0       17 1   1       200   fffffc02fd712500          softbio/1
>> 0       16 1   1       200   fffffc02fd7120c0          softnet/1
>> 0    >  15 1   1       201   fffffc02fef39a00             idle/1
>> 0       14 3   0       200   fffffc02fef395c0         pmfsuspend pmfsuspend
>> 0       13 3   0       200   fffffc02fef39180           pmfevent pmfevent
>> 0       12 3   0       200   fffffc02fef38d40         sopendfree sopendfr
>> 0       11 3   1       200   fffffc02fef38900            iflnkst iflnkst
>> 0       10 3   0       200   fffffc02fef384c0           nfssilly nfssilly
>> 0        9 3   0       240   fffffc02fef38080             vdrain vdrain
>> 0        8 3   0       200   fffffc02ff74f9c0          modunload mod_unld
>> 0        7 3   0       200   fffffc02ff74f580            xcall/0 xcall
>> 0        6 1   0       200   fffffc02ff74f140          softser/0
>> 0        5 1   0       200   fffffc02ff74ed00          softclk/0
>> 0        4 1   0       200   fffffc02ff74e8c0          softbio/0
>> 0        3 1   0       200   fffffc02ff74e480          softnet/0
>> 0        2 1   0       201   fffffc02ff74e040             idle/0
>> 0        0 3   1       200   fffffc00014d0f80            swapper uvm
> 

-- thorpej



Home | Main Index | Thread Index | Old Index