Subject: Re: unusual panics on NetBSD/alpha 3.0_* and 4.0_BETA
To: <>
From: Eric Schnoebelen <eric@cirr.com>
List: port-alpha
Date: 10/07/2006 19:47:03
Simon Burge writes:
- Eric Schnoebelen wrote:
- > I'm running NetBSD/alpha on an assortment of alpha
- > hardware, but mostly DS10L's. One of them, running 3.0_STABLE
- > (circa 26 July 2006) is seeing the following panics on a
- > semi-regular basis: (dmesg in the first attachment)
- >
- > [-- eric@localhost attached -- Tue Sep 26 19:09:14 2006]
- > db> bt
- > cpu_Debugger() at netbsd:cpu_Debugger+0x4
- > panic() at netbsd:panic+0x1f8
- > trap() at netbsd:trap+0x120
- > XentUna() at netbsd:XentUna+0x20
- > --- unaligned access fault (from ipl 1) ---
- > tcp_sack_option() at netbsd:tcp_sack_option+0x13c
- > tcp_dooptions() at netbsd:tcp_dooptions+0x278
- > tcp_input() at netbsd:tcp_input+0xa20
- > ip_input() at netbsd:ip_input+0xb4c
- > ipintr() at netbsd:ipintr+0xa0
- > netintr() at netbsd:netintr+0x158
- > softintr_dispatch() at netbsd:softintr_dispatch+0x160
- > exception_return() at netbsd:exception_return+0x7c
- > --- root of call graph ---
-
- This looks like it happened in netinet/tcp_sack.c at:
Thanks for looking into this.. I've included it into the
PR I've just sent.
- > unexpected machine check:
- >
- > mces = 0x1
- > vector = 0x670
- > param = 0xfffffc0000006000
- > pc = 0xfffffc0000589174
- > ra = 0xfffffc0000589128
- > code = 0x100000000
- > curlwp = 0xfffffc000fcfb800
- > pid = 7.1, comm = ioflush
-
- Machine checks are totally different. Google finds:
In looking at the console log, the machine check
occurred _after_ the panic, when I tried to "cont" the system to
get a crash dump. The full sequence on the console looks like
this:
db>
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x1e8
machine_check() at netbsd:machine_check+0x304
interrupt() at netbsd:interrupt+0x2b8
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
ffs_sync() at netbsd:ffs_sync+0x294
sync_fsync() at netbsd:sync_fsync+0xf0
VOP_FSYNC() at netbsd:VOP_FSYNC+0x48
sched_sync() at netbsd:sched_sync+0x26c
exception_return() at netbsd:exception_return
--- root of call graph ---
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
3414 760 760 12 2 0x4108 1 pickup select
15323 738 15323 0 2 0x4002 1 tcsh ttyin
19417 1 19417 0 2 0 1 lpd poll
4138 4234 4138 100 2 0x4002 1 tcsh ttyin
4234 1 4347 100 2 0x4002 1 xterm select
738 1432 738 100 2 0x4002 1 tcsh pause
1432 744 1432 100 2 0x4000 1 xterm select
744 1589 1589 100 2 0x100 1 sshd select
1589 230 1589 0 2 0x4100 1 sshd netio
230 1 230 0 2 0 1 sshd select
1108 1 1108 1001 2 0x101 1 upsd select
164 1 164 1001 2 0x101 1 newapc select
591 1 591 0 2 0x4002 1 getty ttyin
587 853 853 0 2 0 1 nmbd piperd
853 1 853 0 2 0x1 1 nmbd select
816 1 816 0 2 0x1 1 smbd select
795 812 812 1001 2 0x100 1 upsmon nanosle
812 1 812 0 2 0 1 upsmon piperd
790 760 760 12 2 0x4108 1 qmgr select
765 1 765 0 2 0 1 cron nanosle
593 1 593 0 2 0 1 inetd kqread
760 1 760 0 2 0x4108 1 master select
441 1 441 0 2 0 1 ntpd pause
211 1 211 0 2 0 1 rpcbind poll
279 1 279 0 2 0 1 syslogd
30 0 0 0 2 0x20200 1 physiod physiod
8 0 0 0 2 0x20200 1 aiodoned aiodone
>7 0 0 0 2 0x20200 1 ioflush
6 0 0 0 2 0x20200 1 pagedaemon pgdaemo
5 0 0 0 2 0x20200 1 cryptoret crypto_
4 0 0 0 2 0x20200 1 scsibus2 sccomp
3 0 0 0 2 0x20200 1 scsibus1 sccomp
2 0 0 0 2 0x20200 1 scsibus0 sccomp
1 0 1 0 2 0x4000 1 init wait
0 -1 0 0 2 0x20200 1 swapper schedul
db> cont
syncing disks... tlp0: receive ring overrun
unexpected machine check:
mces = 0x1
vector = 0x670
param = 0xfffffc0000006000
pc = 0xfffffc0000589174
ra = 0xfffffc0000589128
code = 0x100000000
curlwp = 0xfffffc000fcfb800
pid = 7.1, comm = ioflush
panic: machine check
Stopped in pid 7.1 (ioflush) at netbsd:cpu_Debugger+0x4: ret zero,(ra)
db> cont
dumping to dev 8,1 offset 523725
dump 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 isp0: DMA error for command on 0.0.0
isp0: BOTCHED cmd for 0.0.0 cmd 0x2a datalen 8192
i/o error
sd0(isp0:0:0:0): polling command not done
panic: scsipi_execute_xs
Stopped in pid 7.1 (ioflush) at netbsd:cpu_Debugger+0x4: ret zero,(ra)
--
Eric Schnoebelen eric@cirr.com http://www.cirr.com
Server (n.), 1. Large, extremely expensive machine that goes "Ping!".
Measuring at least 25 cubic feet, heavy, bulky and giving of more heat
than a nuclear power plant. It's big, it's bad, it's beautiful and
makes it pretty clear what happened to this year's IT-budget.