Subject: deadlock in getnewb? (Re: running bonnie on -current/VAX)
To: current-users <current-users@netbsd.org>
From: Kazushi (Jam) Marukawa <jam@pobox.com>
List: current-users
Date: 03/07/2001 04:13:51
Hi,
This time, I experienced some deadlock like situation while
I was writing large block data to the SCSI disk. So, I'm
reporting this. I'm using serial console and it was off
when I got this problem, so I don't have console messages.
My machine is made of Celeron 366, Asus P3V4X, and 512MB
memory. The version of kernel is:
NetBSD 1.5S (sou) #7: Sat Feb 24 05:21:30 CST 2001
First, I noticed that all programs I tried to execute hangs.
I was using screen, so I could switch among screens, but no
more screens nor new programs. For example, if I typed "ls"
nothing happened and ^C, ^\, and ^Z didn't work. Then,
screend also hangs since it called getnewb. However, still
I can ping and send packets through nat on this machine.
I waited about 30 minutes with a hope it would recover.
However, it didn't. So, I sent break through serial console
and got trace, ps, and uvmexp. I had more information, but
I don't know what those information means. So, this time, I
just put some of them at the bottom of this message. If you
have an interest on this and want to see all log or the
result of some specific commands, please ask me so. Or
should I send-pr about this? Thanks.
Oh, and one more thing. What I was doing is "dd
if=/dev/zero of=/dev/sd0 count=XXXX bs=8192" on a SCSI HDD
bigger than 40G. I though I can zero-fill a hard drive in
this way before I sell this...
db> trace
cpu_Debugger(c0f2aa80,e5053560,e5053560,e5092e5c,0) at cpu_Debugger+0x4
comintr(c0f42c00) at comintr+0xcd
Xintr4() at Xintr4+0x70
--- interrupt ---
idle(e5053560) at idle+0x1b
bpendtsleep(e505495c,128,c0486fec,0,0) at bpendtsleep
sigsuspend1(e5053560,e5092f50,e5092f80,0,0) at sigsuspend1+0xf0
sys___sigsuspend14(e5053560,e5092f88,e5092f80) at sys___sigsuspend14+0x36
syscall_plain(1f,bfbf001f,bfbfdc69,0,bfbfdc40) at syscall_plain+0x98
db> c
Stopped at cpu_Debugger+0x4: leave
db> trace
cpu_Debugger(c0f2aa80,e50a4abc,e50a4abc,e50c8da8,0) at cpu_Debugger+0x4
comintr(c0f42c00) at comintr+0xcd
Xintr4() at Xintr4+0x70
--- interrupt ---
idle(e50a4abc) at idle+0x1b
bpendtsleep(c05e710c,118,c048954d,65,0) at bpendtsleep
sys_select(e50a4abc,e50c8f88,e50c8f80) at sys_select+0x309
syscall_plain(1f,1f,0,8,bfbfdc60) at syscall_plain+0x98
db> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
23132 193 193 0 3 0x4 sshd getnewb
23131 9668 23130 0 3 0x4006 less getnewb
23130 9668 23130 0 3 0x4006 ps getnewb
23129 9560 9560 1000 3 0x4006 fetchmail getnewb
23128 21992 21992 1000 3 0x4006 perl getnewb
23105 277 23104 1000 3 0x4006 perl getnewb
23038 214 214 32767 3 0x184 httpd netcon
23036 214 214 32767 3 0x184 httpd netcon
23035 214 214 32767 3 0x184 httpd netcon
22994 214 214 32767 3 0x184 httpd netcon
22971 214 214 32767 3 0x104 httpd getnewb
22962 214 214 32767 3 0x184 htd netcon
22930 214 214 32767 3 0x184 httpd netcon
22859 9668 22859 0 4 0x5002 vi
21992 325 21992 1000 3 0x4086 sh wait
21106 266 21106 0 3 0x4006 dd getnewb
9668 270 9668 0 3 0x5086 zsh pause
9560 264 9560 1000 3 0x5086 mush wait
1163 1162 1161 0 4 0x4002 less
1162 1161 1161 0 4 0x4082 sh
1161 266 1161 0 4 0x5082 man
325 263 325 1000 3 0x4082 zsh pause
270 263 270 1000 3 0x4082 zsh pause
266 265 266 0 3 0x4082 zsh pause
265 263 265 1000 3 0x4082 zsh pause
263 1 263 1000 3 0x104 screen-3.9.8 getnewb
238 1 1 0 3 0x4004 getty vnlock
237 1 237 0 3 0x4006 getty getnewb
236 206 236 32767 3 0x4184 pinger select
231 1 231 0 3 0x4 inetd vnlock
224 1 224 0 3 0x4 sendmail vnlock
214 1 214 0 3 0x84 httpd select
211 206 211 32767 3 0x4080 unlinkd netio
206 202 7 0 3 0x4106 squid getnewb
202 1 7 0 3 0x4082 sh wait
193 1 193 0 3 0x4 sshd vnlock
12 0 0 0 3 0x20204 raid rfwcond
6 0 0 0 3 0x20204 aiodoned aiodone
5 0 0 0 3 0x20204 ioflush getnewb
4 0 0 0 3 0x20204 reaper reaper
3 0 0 0 3 0x20204 pagedaemon pgdaemo
2 0 0 0 3 0x20204 usb0 usbevt
1 0 1 0 3 0x4080 init wait
0 -1 0 0 3 0x20204 swapper schedul
db> kill CE <- squid in newcb
db> kill 48B <- less in normal status, but I cannot kill either
db> show uvmexp
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
127787 VM pages: 78530 active, 36887 inactive, 973 wired, 140 free
14843 anon, 101138 vnode, 0 vtext
freemin=64, free-target=85, inactive-target=38518, wired-max=42595
faults=6992210, traps=85475941, intrs=83588156, ctxswitch=59582042
softint=78595794, syscalls=292274228, swapins=1434, swapouts=1467
fault counts:
noram=3, noanon=0, pgwait=0, pgrele=0
ok relocks(total)=25163(25163), anget(retrys)=613856(3), amapcopy=241944
neighbor anon/obj pg=602787/3113099, gets(lock/unlock)=941513/25164
cases: anon=416621, anoncow=197235, obj=819333, prcopy=122176, przero=219795
9
daemon and swap counts:
woke=797, revs=797, scans=5326460, obscans=4571110, anscans=3436
busy=0, freed=4571111, reactivate=69923, deactivate=5554947
pageouts=222, pending=222, nswget=3
nswapdev=1, nanon=271923, nanonneeded=271923freeanon=257080
swpages=152451, swpginuse=879, swpgonly=0 paging=0
kernel pointers:
objs(kern/kmem/mb)=0xc05cdd40/0xc05cde50/0xc05cde68
db> sync
syncing disks...
(waited more than 5 minutes, no disk I/O, send break, send
break... no response. reset)
Regards,
-- Kazushi
On Jan 29, 7:01, Ken Wellsch wrote:
> Subject: Re: running bonnie on -current/VAX
> Tom Ivar Helbekkmo wrote:
> > Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:
> > > Ken Wellsch <kwellsch@tampabay.rr.com> writes:
> > > > Writing with putc()...^C^C^C^C^C^C^C^Z^Z^Z^Z^Z^Z^Z^\^\^\ (no affect)
> > > > load: 6.38 cmd: bonnie 314 [uvn_fp1] 15.94u 7.80s 0% 296k
> > > > load: 6.43 cmd: bonnie 314 [uvn_fp1] 15.93u 7.81s 0% 296k
> > > > [...]
> > > > with every new ^T I type, the load climbs...
> > >
> > > This might be related to a problem I see [...] just like the above,
> > > except that the state reported by ^T is "[getnewbuf]".
> >
> > It's still there in -current (at least on the Sparc), and it only
> > occurs during large scale writing to partitions that are mounted with
> > the softdep option.
>
> Good point - I've started enabling softdep on /usr to aid in disk performance
> on my slower systems and had it enabled on the VAX that produced the
> issue mentioned above.