Port-alpha archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Panic on -current with MULTIPROCESSOR




> On Mar 30, 2021, at 1:15 PM, John Klos <john%ziaspace.com@localhost> wrote:
> 
> Hi,
> 
> I've been running pbulk builds on my DS25 with no problems at all when the kernel is compiled without MULTIPROCESSOR. We have a full pbulk build of 2020Q4 :)
> 
> I just tried with MULTIPROCESSOR with -current from 28-March-2021, but it paniced within a few hours of starting a pbulk scan.
> 
> Does anyone have a clue about what might be going on here?

This indicates that the pmap is having TLB invalidations processed on the local CPU (id 0) and a remove CPU (id 1).  The local CPU is running at splhigh() (IPL 6 == ALPHA_PSL_IPL_HIGH).  The local CPU has sent an IPI to the remote CPU to process the invalidation, and has timed out waiting for the remote CPU to process it.

Couple of things… I’m actually pretty surprised to see IPL == 6 there.  At the top of pmap_tlb_shootnow() is the following:

        /*
         * Acquire the shootdown mutex.  This will also block IPL_VM
         * interrupts and disable preemption.  It is critically important
         * that IPIs not be blocked in this routine.
         */
        KASSERT((alpha_pal_rdps() & ALPHA_PSL_IPL_MASK) < ALPHA_PSL_IPL_CLOCK);
        mutex_spin_enter(&tlb_lock);

…this is because on the Apha, IPIs come in at the same IPL as the clock interrupt, which is IPL == 5.

tlb_lock is initialized thus:

        mutex_init(&tlb_lock, MUTEX_SPIN, IPL_VM);

..and IPL_VM is ALPHA_PSL_IPL_IO_HI, which is IPL == 4.

So, something is raising our IPL to IPL_HIGH somewhere.  How rude!

mutex_spin_enter() is just an alias for mutex_vector_enter() on Alpha, so looking there, we see that MUTEX_SPIN_SPLRAISE() does:

        s = splraiseipl(MUTEX_SPIN_IPL(mtx));

…and:

#define MUTEX_SPIN_IPL(mtx)             ((mtx)->mtx_ipl)

So, something is a little fishy here.  I think there’s a chance that there is some slight brokenness with spin mutexes on Alpha that only show up on MULTIPROCESSOR kernels.  I’ll take a deeper look a little later today.






> 
> Thanks,
> John
> 
> 
> [ 74219.1614327] TLB LOCAL MASK  = 0x0000000000000001
> [ 74219.1614327] TLB REMOTE MASK = 0x0000000000000002
> [ 74219.1614327] TLB REMOTE PENDING = 0x0000000000000002
> [ 74219.1614327] TLB CONTEXT = 0xfffffc001b83dd58
> [ 74219.1614327] TLB LOCAL IPL = 6
> [ 74219.1614327] panic: pmap_tlb_shootnow
> [ 74219.1614327] cpu0: Begin traceback...
> [ 74219.1614327] alpha trace requires known PC =eject=
> [ 74219.1614327] cpu0: End traceback...
> Stopped in pid 5054.5054 (echo) at      netbsd:cpu_Debugger+0x4: ret
> zero,(ra)
> db{0}>
> db{0}> bt
> cpu_Debugger() at netbsd:cpu_Debugger+0x4
> db_panic() at netbsd:db_panic+0xc8
> vpanic() at netbsd:vpanic+0x10c
> panic() at netbsd:panic+0x58
> pmap_tlb_shootnow.part.0() at netbsd:pmap_tlb_shootnow.part.0+0x234
> pmap_remove_internal() at netbsd:pmap_remove_internal+0x328
> pmap_remove() at netbsd:pmap_remove+0x2c
> uvm_unmap_remove() at netbsd:uvm_unmap_remove+0x158
> sys_munmap() at netbsd:sys_munmap+0xb8
> syscall() at netbsd:syscall+0x260
> XentSys() at netbsd:XentSys+0x5c
> --- syscall (73) ---
> --- user mode ---
> PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
> 5054 >5054 7   0         0   fffffc02e8db0500               echo
> 20918 20918 3   1       180   fffffc000ef9e480               make wait
> 19235 19235 3   0       180   fffffc0019d9ee40               make pipe_rd
> 28954 28954 3   1       180   fffffc001aa5c140               make pipe_rd
> 2913  2913 3   0       180   fffffc001b093a80                top select
> 7334  7334 3   1       180   fffffc0016176080         pbulk-scan pipe_rd
> 17864 17864 3   1       180   fffffc000c14ea80                 sh wait
> 28220 28220 3   0       180   fffffc0019d9e5c0                 sh wait
> 26180 26180 3   1       180   fffffc001aa5c9c0               tcsh pause
> 15702 15702 3   1       180   fffffc001b092100               tcsh pause
> 5805  5805 3   1       180   fffffc001aa5d240               tcsh pause
> 16556 16556 3   0       180   fffffc000c14eec0               tcsh pause
> 18472 18472 3   1       180   fffffc000c14f300               tmux kqueue
> 28224 28224 3   1       180   fffffc001aa5ce00               tcsh pause
> 11357 11357 3   0       180   fffffc0019d9f280               sshd select
> 14035 14035 3   0       180   fffffc0016176d40               sshd poll
> 17970 17970 3   1       180   fffffc000c234680               tcsh ttyraw
> 18858 18858 3   1       180   fffffc00161764c0               sshd select
> 23130 23130 3   0       180   fffffc001b093200               sshd poll
> 338    338 3   0       1c0   fffffc02fd68cdc0              getty ttyraw
> 326    326 3   1       180   fffffc000ef9f9c0               cron nanoslp
> 2204  2204 3   1       180   fffffc000ef9f580              inetd kqueue
> 1695  1695 3   0       180   fffffc000ef9f140               sshd select
> 206    206 3   0       180   fffffc000ef9e8c0                 cu poll
> 205    205 3   1       180   fffffc000c234ac0                 cu ttyraw
> 203    203 3   1       180   fffffc000c235bc0               ntpd pause
> 2213  2213 3   0       180   fffffc000c235780               tcsh pause
> 2043  2043 3   0       180   fffffc000c235340               tmux kqueue
> 1240  1240 3   0       180   fffffc000c234f00            syslogd kqueue
> 1        1 3   1       180   fffffc000086ee80               init wait
> 0     1313 3   1       200   fffffc000c234240          acctwatch actwat
> 0      228 3   0       200   fffffc00007d4140            physiod physiod
> 0      126 3   1       200   fffffc000c14e200          pooldrain pooldrain
> 0      125 3   1       200   fffffc000086fb40            ioflush syncer
> 0      124 3   1       200   fffffc000086f700           pgdaemon pgdaemon
> 0      121 3   1       200   fffffc02fd68c540            raidio0 raidiow
> 0      120 3   0       200   fffffc02fd68d640              raid0 rfnodeq
> 0      119 3   1       200   fffffc000086f2c0             npfgc0 npfgcw
> 0      118 3   0       200   fffffc000086ea40            rt_free rt_free
> 0      117 3   0       200   fffffc000086e600              unpgc unpgc
> 0      116 3   1       200   fffffc000086e1c0    icmp6_wqinput/1 icmp6_wqinput
> 0      115 3   0       200   fffffc0000787b00    icmp6_wqinput/0 icmp6_wqinput
> 0      114 3   1       200   fffffc00007876c0            ip6flow ip6flow
> 0      113 3   1       200   fffffc0000787280          nd6_timer nd6_timer
> 0      112 3   1       200   fffffc0000786e40    carp6_wqinput/1 carp6_wqinput
> 0      111 3   0       200   fffffc0000786a00    carp6_wqinput/0 carp6_wqinput
> 0      110 3   1       200   fffffc00007865c0     carp_wqinput/1 carp_wqinput
> 0      109 3   0       200   fffffc00007d4580     carp_wqinput/0 carp_wqinput
> 0      108 3   1       200   fffffc00007d49c0     icmp_wqinput/1 icmp_wqinput
> 0      107 3   0       200   fffffc00007d4e00     icmp_wqinput/0 icmp_wqinput
> 0      106 3   1       200   fffffc00007d5240           rt_timer rt_timer
> 0      105 3   0       200   fffffc00007d5680    ipflow_slowtimo ipflow_slowtimo
> 
> 0      104 3   1       200   fffffc00007d5ac0        vmem_rehash vmem_rehash
> 0      103 3   1       200   fffffc0000786180          entbutler entropy
> 0       29 3   0       200   fffffc02fd68da80               iic0 iicintr
> 0       27 3   0       200   fffffc02fd68d200           scsibus2 sccomp
> 0       25 3   0       200   fffffc02fd68c980           scsibus1 sccomp
> 0       23 3   0       200   fffffc02fd68c100           scsibus0 sccomp
> 0       22 3   0       200   fffffc02fd713a40            atabus1 atath
> 0       21 3   0       200   fffffc02fd713600            atabus0 atath
> 0       20 3   1       200   fffffc02fd7131c0            xcall/1 xcall
> 0       19 1   1       200   fffffc02fd712d80          softser/1
> 0       18 1   1       200   fffffc02fd712940          softclk/1
> 0       17 1   1       200   fffffc02fd712500          softbio/1
> 0       16 1   1       200   fffffc02fd7120c0          softnet/1
> 0    >  15 1   1       201   fffffc02fef39a00             idle/1
> 0       14 3   0       200   fffffc02fef395c0         pmfsuspend pmfsuspend
> 0       13 3   0       200   fffffc02fef39180           pmfevent pmfevent
> 0       12 3   0       200   fffffc02fef38d40         sopendfree sopendfr
> 0       11 3   1       200   fffffc02fef38900            iflnkst iflnkst
> 0       10 3   0       200   fffffc02fef384c0           nfssilly nfssilly
> 0        9 3   0       240   fffffc02fef38080             vdrain vdrain
> 0        8 3   0       200   fffffc02ff74f9c0          modunload mod_unld
> 0        7 3   0       200   fffffc02ff74f580            xcall/0 xcall
> 0        6 1   0       200   fffffc02ff74f140          softser/0
> 0        5 1   0       200   fffffc02ff74ed00          softclk/0
> 0        4 1   0       200   fffffc02ff74e8c0          softbio/0
> 0        3 1   0       200   fffffc02ff74e480          softnet/0
> 0        2 1   0       201   fffffc02ff74e040             idle/0
> 0        0 3   1       200   fffffc00014d0f80            swapper uvm

-- thorpej



Home | Main Index | Thread Index | Old Index