Subject: Re: deadlock in getnewb? (Re: running bonnie on -current/VAX)
To: Kazushi Marukawa <jam@pobox.com>
From: Chuck Silvers <chuq@chuq.com>
List: current-users
Date: 03/07/2001 08:19:27
hi,
as a work-around, use /dev/rsd* instead of /dev/sd*.
please send a PR about the new problem if you haven't already.
(this new problem is unrelated to the previous "uvn_fp1" hang.)
-Chuck
On Wed, Mar 07, 2001 at 04:13:51AM -0600, Kazushi Marukawa wrote:
> Hi,
>
> This time, I experienced some deadlock like situation while
> I was writing large block data to the SCSI disk. So, I'm
> reporting this. I'm using serial console and it was off
> when I got this problem, so I don't have console messages.
>
> My machine is made of Celeron 366, Asus P3V4X, and 512MB
> memory. The version of kernel is:
>
> NetBSD 1.5S (sou) #7: Sat Feb 24 05:21:30 CST 2001
>
> First, I noticed that all programs I tried to execute hangs.
> I was using screen, so I could switch among screens, but no
> more screens nor new programs. For example, if I typed "ls"
> nothing happened and ^C, ^\, and ^Z didn't work. Then,
> screend also hangs since it called getnewb. However, still
> I can ping and send packets through nat on this machine.
>
> I waited about 30 minutes with a hope it would recover.
> However, it didn't. So, I sent break through serial console
> and got trace, ps, and uvmexp. I had more information, but
> I don't know what those information means. So, this time, I
> just put some of them at the bottom of this message. If you
> have an interest on this and want to see all log or the
> result of some specific commands, please ask me so. Or
> should I send-pr about this? Thanks.
>
> Oh, and one more thing. What I was doing is "dd
> if=/dev/zero of=/dev/sd0 count=XXXX bs=8192" on a SCSI HDD
> bigger than 40G. I though I can zero-fill a hard drive in
> this way before I sell this...
>
> db> trace
> cpu_Debugger(c0f2aa80,e5053560,e5053560,e5092e5c,0) at cpu_Debugger+0x4
> comintr(c0f42c00) at comintr+0xcd
> Xintr4() at Xintr4+0x70
> --- interrupt ---
> idle(e5053560) at idle+0x1b
> bpendtsleep(e505495c,128,c0486fec,0,0) at bpendtsleep
> sigsuspend1(e5053560,e5092f50,e5092f80,0,0) at sigsuspend1+0xf0
> sys___sigsuspend14(e5053560,e5092f88,e5092f80) at sys___sigsuspend14+0x36
> syscall_plain(1f,bfbf001f,bfbfdc69,0,bfbfdc40) at syscall_plain+0x98
> db> c
> Stopped at cpu_Debugger+0x4: leave
> db> trace
> cpu_Debugger(c0f2aa80,e50a4abc,e50a4abc,e50c8da8,0) at cpu_Debugger+0x4
> comintr(c0f42c00) at comintr+0xcd
> Xintr4() at Xintr4+0x70
> --- interrupt ---
> idle(e50a4abc) at idle+0x1b
> bpendtsleep(c05e710c,118,c048954d,65,0) at bpendtsleep
> sys_select(e50a4abc,e50c8f88,e50c8f80) at sys_select+0x309
> syscall_plain(1f,1f,0,8,bfbfdc60) at syscall_plain+0x98
> db> ps
> PID PPID PGRP UID S FLAGS COMMAND WAIT
> 23132 193 193 0 3 0x4 sshd getnewb
> 23131 9668 23130 0 3 0x4006 less getnewb
> 23130 9668 23130 0 3 0x4006 ps getnewb
> 23129 9560 9560 1000 3 0x4006 fetchmail getnewb
> 23128 21992 21992 1000 3 0x4006 perl getnewb
> 23105 277 23104 1000 3 0x4006 perl getnewb
> 23038 214 214 32767 3 0x184 httpd netcon
> 23036 214 214 32767 3 0x184 httpd netcon
> 23035 214 214 32767 3 0x184 httpd netcon
> 22994 214 214 32767 3 0x184 httpd netcon
> 22971 214 214 32767 3 0x104 httpd getnewb
> 22962 214 214 32767 3 0x184 htd netcon
> 22930 214 214 32767 3 0x184 httpd netcon
> 22859 9668 22859 0 4 0x5002 vi
> 21992 325 21992 1000 3 0x4086 sh wait
> 21106 266 21106 0 3 0x4006 dd getnewb
> 9668 270 9668 0 3 0x5086 zsh pause
> 9560 264 9560 1000 3 0x5086 mush wait
> 1163 1162 1161 0 4 0x4002 less
> 1162 1161 1161 0 4 0x4082 sh
> 1161 266 1161 0 4 0x5082 man
> 325 263 325 1000 3 0x4082 zsh pause
> 270 263 270 1000 3 0x4082 zsh pause
> 266 265 266 0 3 0x4082 zsh pause
> 265 263 265 1000 3 0x4082 zsh pause
> 263 1 263 1000 3 0x104 screen-3.9.8 getnewb
> 238 1 1 0 3 0x4004 getty vnlock
> 237 1 237 0 3 0x4006 getty getnewb
> 236 206 236 32767 3 0x4184 pinger select
> 231 1 231 0 3 0x4 inetd vnlock
> 224 1 224 0 3 0x4 sendmail vnlock
> 214 1 214 0 3 0x84 httpd select
> 211 206 211 32767 3 0x4080 unlinkd netio
> 206 202 7 0 3 0x4106 squid getnewb
> 202 1 7 0 3 0x4082 sh wait
> 193 1 193 0 3 0x4 sshd vnlock
> 12 0 0 0 3 0x20204 raid rfwcond
> 6 0 0 0 3 0x20204 aiodoned aiodone
> 5 0 0 0 3 0x20204 ioflush getnewb
> 4 0 0 0 3 0x20204 reaper reaper
> 3 0 0 0 3 0x20204 pagedaemon pgdaemo
> 2 0 0 0 3 0x20204 usb0 usbevt
> 1 0 1 0 3 0x4080 init wait
> 0 -1 0 0 3 0x20204 swapper schedul
> db> kill CE <- squid in newcb
> db> kill 48B <- less in normal status, but I cannot kill either
> db> show uvmexp
> Current UVM status:
> pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
> 127787 VM pages: 78530 active, 36887 inactive, 973 wired, 140 free
> 14843 anon, 101138 vnode, 0 vtext
> freemin=64, free-target=85, inactive-target=38518, wired-max=42595
> faults=6992210, traps=85475941, intrs=83588156, ctxswitch=59582042
> softint=78595794, syscalls=292274228, swapins=1434, swapouts=1467
> fault counts:
> noram=3, noanon=0, pgwait=0, pgrele=0
> ok relocks(total)=25163(25163), anget(retrys)=613856(3), amapcopy=241944
> neighbor anon/obj pg=602787/3113099, gets(lock/unlock)=941513/25164
> cases: anon=416621, anoncow=197235, obj=819333, prcopy=122176, przero=219795
> 9
> daemon and swap counts:
> woke=797, revs=797, scans=5326460, obscans=4571110, anscans=3436
> busy=0, freed=4571111, reactivate=69923, deactivate=5554947
> pageouts=222, pending=222, nswget=3
> nswapdev=1, nanon=271923, nanonneeded=271923freeanon=257080
> swpages=152451, swpginuse=879, swpgonly=0 paging=0
> kernel pointers:
> objs(kern/kmem/mb)=0xc05cdd40/0xc05cde50/0xc05cde68
> db> sync
> syncing disks...
> (waited more than 5 minutes, no disk I/O, send break, send
> break... no response. reset)
>
> Regards,
> -- Kazushi
>
> On Jan 29, 7:01, Ken Wellsch wrote:
> > Subject: Re: running bonnie on -current/VAX
> > Tom Ivar Helbekkmo wrote:
> > > Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:
> > > > Ken Wellsch <kwellsch@tampabay.rr.com> writes:
> > > > > Writing with putc()...^C^C^C^C^C^C^C^Z^Z^Z^Z^Z^Z^Z^\^\^\ (no affect)
> > > > > load: 6.38 cmd: bonnie 314 [uvn_fp1] 15.94u 7.80s 0% 296k
> > > > > load: 6.43 cmd: bonnie 314 [uvn_fp1] 15.93u 7.81s 0% 296k
> > > > > [...]
> > > > > with every new ^T I type, the load climbs...
> > > >
> > > > This might be related to a problem I see [...] just like the above,
> > > > except that the state reported by ^T is "[getnewbuf]".
> > >
> > > It's still there in -current (at least on the Sparc), and it only
> > > occurs during large scale writing to partitions that are mounted with
> > > the softdep option.
> >
> > Good point - I've started enabling softdep on /usr to aid in disk performance
> > on my slower systems and had it enabled on the VAX that produced the
> > issue mentioned above.