Subject: Re: 2 hangs in 24 hours on 1.5U system with raid5 disks...
To: Jeff Rizzo <riz@boogers.sf.ca.us>
From: David Maxwell <david@vex.net>
List: current-users
Date: 09/20/2001 21:47:23
On Thu, Sep 20, 2001 at 04:07:55PM -0700, Jeff Rizzo wrote:
> I have *no* idea if this has anything to do with the raid5 setup
> I've just moved the system to, but since it's the only thing
> that's changed in the last two weeks on an otherwise-stable system,
> I suspect that *something* here is related.
>
> Last night, and again this morning, my main NFS server/name server
> machine, a PentiumII-350 running 1.5U locked up. I was able to get into
> DDB, but couldn't do anything else on the console - it was hung.
> When I tried to "reboot" from the db> prompt, it hung solid after
> "syncing disks..." (I waited for several hours this morning) and had to
> be power cycled.
If you get this again, can you check if you can C-A-F2 etc to the other
wsconsoles. My 5 disk raid machine (1.5U or so) has done the same thing
a couple of times. DDB works, wsconsole works, but the machine is
spinning in the idle loop. It won't reply on the network, or let
anything be typed on the console.
I'll append the info on my wait states for comparison.
perl flt_nor
perl flt_nor
sh wait
cron netio
tcsh
sshd
smbd uvn_lp1
smbd uvn_lp1
rtadv
afpd netio
afpd
atalkd select
nfsio nfsidl
nfsio nfsidl
nfsio nfsidl
nfsio nfsidl
mountd select
nmbd uvn_fp1
getty ttyin
getty ttyin
getty
cron km_getw
inetd select
sshd
nfsd nfsd
nfsd nfsd
nfsd nfsd
nfsd nfsd
nfsd select
rpcbind select
syslogd select
raid km_getw
aiodoned aiodone
ioflush km_getw
reaper reaper
pagedemon pgdamo
raid rfwcond
raid rfwcond
raid rfwcond
ahc1:0 sccomp
usb1 usbevt
usb0 usbevt
init wait
swapper schedpw
> following are the 'ps' list from the hang, and a dmesg of the box.
>
> db> ps
> PID PPID PGRP UID S FLAGS COMMAND WAIT
> 6064 6057 6055 6004 3 0x4004 therm flt_nor
> 6063 6062 6054 99 3 0x4004 perl flt_nor
> 6062 6056 6054 99 3 0x4084 sh wait
> 6057 6055 6055 6004 3 0x4084 sh netio
> 6056 6054 6054 99 3 0x4084 perl wait
> 6055 6052 6055 6004 3 0x4084 sh wait
> 6054 6051 6054 99 3 0x4084 sh wait
> 6052 266 266 0 3 0x84 cron netio
> 6051 266 266 0 3 0x84 cron netio
> 5807 221 221 32767 3 0x180 httpd netio
> 5805 221 221 32767 2 0x180 httpd
> 5778 221 221 32767 2 0x180 httpd
> 5777 221 221 32767 2 0x180 httpd
> 5776 221 221 32767 2 0x180 httpd
> 5775 221 221 32767 2 0x180 httpd
> 5774 221 221 32767 2 0x180 httpd
> 5594 5241 5594 6004 4 0x500b mutt
> 5241 5238 5241 6004 3 0x4082 tcsh ttyin
> 5238 253 253 0 3 0x180 sshd select
> 5233 5227 5220 90 3 0x4182 dumper netio
> 5232 5227 5220 90 3 0x4183 dumper netio
> 5231 5227 5220 90 3 0x4183 dumper netio
> 5230 5227 5220 90 3 0x4107 dumper uvn_fp1
> 5229 5228 5220 90 3 0x83 taper netio
> 5228 5227 5220 90 3 0x4083 taper netio
> 5227 5220 5220 90 3 0x4083 driver select
> 5220 5214 5220 90 3 0x4082 sh wait
> 5214 5211 5214 90 3 0x4082 csh pause
> 5211 5048 5211 0 3 0x4082 csh pause
> 5048 5047 5048 6004 3 0x4082 tcsh pause
> 5047 253 253 0 2 0x180 sshd
> 4421 4420 4421 6004 3 0x4082 tcsh ttyin
> 4420 253 253 0 2 0x180 sshd
> 484 1 484 0 2 0x4082 getty
> 434 1 434 0 3 0x4 ntpd flt_nor
> 347 333 347 0 3 0x4082 csh ttyin
> 333 332 333 6004 3 0x4082 tcsh pause
> 332 1 269 6004 3 0x4080 rxvt select
> 302 288 269 6004 3 0x4080 FvwmIconMan select
> 301 288 269 6004 3 0x4080 FvwmPager select
> 290 288 290 6004 3 0x4 ssh-agent flt_nor
> 288 283 269 6004 3 0x4080 fvwm2 select
> 287 283 287 6004 3 0x4004 xclock flt_pmf
> 283 1 269 6004 3 0x4080 csh pause
> 280 1 269 6004 3 0x4004 Xvnc flt_pmf
> 272 1 272 0 3 0x4082 getty ttyin
> 266 1 266 0 3 0x4 cron flt_nor
> 263 1 263 0 3 0x80 inetd select
> 256 1 256 0 3 0x4 sendmail uao_get
> 253 1 253 0 3 0x80 sshd select
> 251 1 251 1000 3 0x80 postgres select
> 242 1 242 0 3 0x4 named flt_nor
> 221 1 221 0 3 0x4 httpd biowait
> 217 1 217 0 3 0x5 nmbd flt_nor
> 215 1 215 0 3 0x81 smbd select
> 213 1 9 0 3 0x82 snmpd select
> 211 1 211 0 3 0x4 afpd anonget
> 209 1 209 0 3 0x80 papd select
> 198 1 198 0 3 0x1004 atalkd biowait
> 161 1 161 0 3 0x80 rpc.lockd select
> 159 154 154 0 3 0x84 nfsd nfsd
> 158 154 154 0 3 0x84 nfsd nfsd
> 157 154 154 0 3 0x84 nfsd nfsd
> 156 154 154 0 3 0x84 nfsd nfsd
> 154 1 154 0 3 0x80 nfsd select
> 145 1 145 0 3 0x80 mountd select
> 117 1 117 0 2 0x80 rpcbind
> 113 1 113 0 3 0x4 named anonget
> 103 1 103 0 3 0x4 syslogd flt_nor
> 8 0 0 0 3 0x20204 aiodoned aiodone
> 7 0 0 0 3 0x20204 ioflush drainvp
> 6 0 0 0 3 0x20204 reaper reaper
> 5 0 0 0 3 0x20204 pagedaemon pgdaemo
> 4 0 0 0 3 0x20204 raid km_getw
> 3 0 0 0 3 0x20204 apm0 apmev
> 2 0 0 0 3 0x20204 usb0 usbevt
> 1 0 1 0 3 0x4080 init wait
> 0 -1 0 0 3 0x20204 swapper schedpw
> db>
--
David Maxwell, david@vex.net|david@maxwell.net -->
All this stuff in twice the space would only look half as bad!
- me