Subject: Re: unusual panics on NetBSD/alpha 3.0_* and 4.0_BETA
To: <>
From: Eric Schnoebelen <eric@cirr.com>
List: port-alpha
Date: 10/07/2006 19:47:03
Simon Burge writes:
- Eric Schnoebelen wrote:
- > 	I'm running NetBSD/alpha on an assortment of alpha
- > hardware, but  mostly DS10L's.  One of them, running 3.0_STABLE
- > (circa 26 July 2006) is seeing the following panics on a
- > semi-regular basis: (dmesg in the first attachment)
- > 
- > 		[-- eric@localhost attached -- Tue Sep 26 19:09:14 2006]
- > 		db> bt
- > 		cpu_Debugger() at netbsd:cpu_Debugger+0x4
- > 		panic() at netbsd:panic+0x1f8
- > 		trap() at netbsd:trap+0x120
- > 		XentUna() at netbsd:XentUna+0x20
- > 		--- unaligned access fault (from ipl 1) ---
- > 		tcp_sack_option() at netbsd:tcp_sack_option+0x13c
- > 		tcp_dooptions() at netbsd:tcp_dooptions+0x278
- > 		tcp_input() at netbsd:tcp_input+0xa20
- > 		ip_input() at netbsd:ip_input+0xb4c
- > 		ipintr() at netbsd:ipintr+0xa0
- > 		netintr() at netbsd:netintr+0x158
- > 		softintr_dispatch() at netbsd:softintr_dispatch+0x160
- > 		exception_return() at netbsd:exception_return+0x7c
- > 		--- root of call graph ---
- 
- This looks like it happened in netinet/tcp_sack.c at:

	Thanks for looking into this.. I've included it into the
PR I've just sent.

- > unexpected machine check:
- > 
- >     mces    = 0x1
- >     vector  = 0x670
- >     param   = 0xfffffc0000006000
- >     pc      = 0xfffffc0000589174
- >     ra      = 0xfffffc0000589128
- >     code    = 0x100000000
- >     curlwp = 0xfffffc000fcfb800
- >         pid = 7.1, comm = ioflush
- 
- Machine checks are totally different.  Google finds:

	In looking at the console log, the machine check
occurred _after_ the panic, when I tried to "cont" the system to
get a crash dump.  The full sequence on the console looks like
this:

db> 
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x1e8
machine_check() at netbsd:machine_check+0x304
interrupt() at netbsd:interrupt+0x2b8
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
ffs_sync() at netbsd:ffs_sync+0x294
sync_fsync() at netbsd:sync_fsync+0xf0
VOP_FSYNC() at netbsd:VOP_FSYNC+0x48
sched_sync() at netbsd:sched_sync+0x26c
exception_return() at netbsd:exception_return
--- root of call graph ---
db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 3414           760      760         12 2  0x4108    1           pickup  select
 15323          738    15323          0 2  0x4002    1             tcsh   ttyin
 19417            1    19417          0 2       0    1              lpd    poll
 4138          4234     4138        100 2  0x4002    1             tcsh   ttyin
 4234             1     4347        100 2  0x4002    1            xterm  select
 738           1432      738        100 2  0x4002    1             tcsh   pause
 1432           744     1432        100 2  0x4000    1            xterm  select
 744           1589     1589        100 2   0x100    1             sshd  select
 1589           230     1589          0 2  0x4100    1             sshd   netio
 230              1      230          0 2       0    1             sshd  select
 1108             1     1108       1001 2   0x101    1             upsd  select
 164              1      164       1001 2   0x101    1           newapc  select
 591              1      591          0 2  0x4002    1            getty   ttyin
 587            853      853          0 2       0    1             nmbd  piperd
 853              1      853          0 2     0x1    1             nmbd  select
 816              1      816          0 2     0x1    1             smbd  select
 795            812      812       1001 2   0x100    1           upsmon nanosle
 812              1      812          0 2       0    1           upsmon  piperd
 790            760      760         12 2  0x4108    1             qmgr  select
 765              1      765          0 2       0    1             cron nanosle
 593              1      593          0 2       0    1            inetd  kqread
 760              1      760          0 2  0x4108    1           master  select
 441              1      441          0 2       0    1             ntpd   pause
 211              1      211          0 2       0    1          rpcbind    poll
 279              1      279          0 2       0    1          syslogd
 30               0        0          0 2 0x20200    1          physiod physiod
 8                0        0          0 2 0x20200    1         aiodoned aiodone
>7                0        0          0 2 0x20200    1          ioflush
 6                0        0          0 2 0x20200    1       pagedaemon pgdaemo
 5                0        0          0 2 0x20200    1        cryptoret crypto_
 4                0        0          0 2 0x20200    1         scsibus2  sccomp
 3                0        0          0 2 0x20200    1         scsibus1  sccomp
 2                0        0          0 2 0x20200    1         scsibus0  sccomp
 1                0        1          0 2  0x4000    1             init    wait
 0               -1        0          0 2 0x20200    1          swapper schedul
db> cont
syncing disks... tlp0: receive ring overrun

unexpected machine check:

    mces    = 0x1
    vector  = 0x670
    param   = 0xfffffc0000006000
    pc      = 0xfffffc0000589174
    ra      = 0xfffffc0000589128
    code    = 0x100000000
    curlwp = 0xfffffc000fcfb800
        pid = 7.1, comm = ioflush

panic: machine check
Stopped in pid 7.1 (ioflush) at netbsd:cpu_Debugger+0x4:        ret     zero,(ra)
db> cont

dumping to dev 8,1 offset 523725
dump 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 isp0: DMA error for command on 0.0.0
isp0: BOTCHED cmd for 0.0.0 cmd 0x2a datalen 8192
i/o error


sd0(isp0:0:0:0): polling command not done
panic: scsipi_execute_xs
Stopped in pid 7.1 (ioflush) at netbsd:cpu_Debugger+0x4:        ret     zero,(ra)

--
Eric Schnoebelen		eric@cirr.com 		http://www.cirr.com
  Server (n.), 1. Large, extremely expensive machine that goes "Ping!".
  Measuring at least 25 cubic feet, heavy, bulky and giving of more heat
    than a nuclear power plant.  It's big, it's bad, it's beautiful and 
      makes it pretty clear what happened to this year's IT-budget.