current-users: Re: deadlock in getnewb? (Re: running bonnie on -current/VAX)

Subject: Re: deadlock in getnewb? (Re: running bonnie on -current/VAX)
To: Kazushi Marukawa <jam@pobox.com>
From: Chuck Silvers <chuq@chuq.com>
List: current-users
Date: 03/07/2001 08:19:27
hi,

as a work-around, use /dev/rsd* instead of /dev/sd*.

please send a PR about the new problem if you haven't already.
(this new problem is unrelated to the previous "uvn_fp1" hang.)

-Chuck


On Wed, Mar 07, 2001 at 04:13:51AM -0600, Kazushi Marukawa wrote:
> Hi,
> 
> This time, I experienced some deadlock like situation while
> I was writing large block data to the SCSI disk.  So, I'm
> reporting this.  I'm using serial console and it was off
> when I got this problem, so I don't have console messages.
> 
> My machine is made of Celeron 366, Asus P3V4X, and 512MB
> memory.  The version of kernel is:
> 
> NetBSD 1.5S (sou) #7: Sat Feb 24 05:21:30 CST 2001
> 
> First, I noticed that all programs I tried to execute hangs.
> I was using screen, so I could switch among screens, but no
> more screens nor new programs.  For example, if I typed "ls"
> nothing happened and ^C, ^\, and ^Z didn't work.  Then,
> screend also hangs since it called getnewb.  However, still
> I can ping and send packets through nat on this machine.
> 
> I waited about 30 minutes with a hope it would recover.
> However, it didn't.  So, I sent break through serial console
> and got trace, ps, and uvmexp.  I had more information, but
> I don't know what those information means.  So, this time, I
> just put some of them at the bottom of this message.  If you
> have an interest on this and want to see all log or the
> result of some specific commands, please ask me so.  Or
> should I send-pr about this?  Thanks.
> 
> Oh, and one more thing.  What I was doing is "dd
> if=/dev/zero of=/dev/sd0 count=XXXX bs=8192" on a SCSI HDD
> bigger than 40G.  I though I can zero-fill a hard drive in
> this way before I sell this...
> 
> db> trace
> cpu_Debugger(c0f2aa80,e5053560,e5053560,e5092e5c,0) at cpu_Debugger+0x4
> comintr(c0f42c00) at comintr+0xcd
> Xintr4() at Xintr4+0x70
> --- interrupt ---
> idle(e5053560) at idle+0x1b
> bpendtsleep(e505495c,128,c0486fec,0,0) at bpendtsleep
> sigsuspend1(e5053560,e5092f50,e5092f80,0,0) at sigsuspend1+0xf0
> sys___sigsuspend14(e5053560,e5092f88,e5092f80) at sys___sigsuspend14+0x36
> syscall_plain(1f,bfbf001f,bfbfdc69,0,bfbfdc40) at syscall_plain+0x98
> db> c
> Stopped at      cpu_Debugger+0x4:       leave
> db> trace
> cpu_Debugger(c0f2aa80,e50a4abc,e50a4abc,e50c8da8,0) at cpu_Debugger+0x4
> comintr(c0f42c00) at comintr+0xcd
> Xintr4() at Xintr4+0x70
> --- interrupt ---
> idle(e50a4abc) at idle+0x1b
> bpendtsleep(c05e710c,118,c048954d,65,0) at bpendtsleep
> sys_select(e50a4abc,e50c8f88,e50c8f80) at sys_select+0x309
> syscall_plain(1f,1f,0,8,bfbfdc60) at syscall_plain+0x98
> db> ps
>  PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
>  23132            193        193          0 3     0x4             sshd getnewb
>  23131           9668      23130          0 3  0x4006             less getnewb
>  23130           9668      23130          0 3  0x4006               ps getnewb
>  23129           9560       9560       1000 3  0x4006        fetchmail getnewb
>  23128          21992      21992       1000 3  0x4006             perl getnewb
>  23105            277      23104       1000 3  0x4006             perl getnewb
>  23038            214        214      32767 3   0x184            httpd  netcon
>  23036            214        214      32767 3   0x184            httpd  netcon
>  23035            214        214      32767 3   0x184            httpd  netcon
>  22994            214        214      32767 3   0x184            httpd  netcon
>  22971            214        214      32767 3   0x104            httpd getnewb
>  22962            214        214      32767 3   0x184            htd  netcon
>  22930            214        214      32767 3   0x184            httpd  netcon
>  22859           9668      22859          0 4  0x5002               vi
>  21992            325      21992       1000 3  0x4086               sh    wait
>  21106            266      21106          0 3  0x4006               dd getnewb
>  9668             270       9668          0 3  0x5086              zsh   pause
>  9560             264       9560       1000 3  0x5086             mush    wait
>  1163            1162       1161          0 4  0x4002             less
>  1162            1161       1161          0 4  0x4082               sh
>  1161             266       1161          0 4  0x5082              man
>  325              263        325       1000 3  0x4082              zsh   pause
>  270              263        270       1000 3  0x4082              zsh   pause
>  266              265        266          0 3  0x4082              zsh   pause
>  265              263        265       1000 3  0x4082              zsh   pause
>  263                1        263       1000 3   0x104     screen-3.9.8 getnewb
>  238                1          1          0 3  0x4004            getty  vnlock
>  237                1        237          0 3  0x4006            getty getnewb
>  236              206        236      32767 3  0x4184           pinger  select
>  231                1        231          0 3     0x4            inetd  vnlock
>  224                1        224          0 3     0x4         sendmail  vnlock
>  214                1        214          0 3    0x84            httpd  select
>  211              206        211      32767 3  0x4080          unlinkd   netio
>  206              202          7          0 3  0x4106            squid getnewb
>  202                1          7          0 3  0x4082               sh    wait
>  193                1        193          0 3     0x4             sshd  vnlock
>  12                 0          0          0 3 0x20204             raid rfwcond
>  6                  0          0          0 3 0x20204         aiodoned aiodone
>  5                  0          0          0 3 0x20204          ioflush getnewb
>  4                  0          0          0 3 0x20204           reaper  reaper
>  3                  0          0          0 3 0x20204       pagedaemon pgdaemo
>  2                  0          0          0 3 0x20204             usb0  usbevt
>  1                  0          1          0 3  0x4080             init    wait
>  0                 -1          0          0 3 0x20204          swapper schedul
> db> kill CE    <- squid in newcb
> db> kill 48B   <- less in normal status, but I cannot kill either
> db> show uvmexp
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   127787 VM pages: 78530 active, 36887 inactive, 973 wired, 140 free
>     14843 anon, 101138 vnode, 0 vtext
>   freemin=64, free-target=85, inactive-target=38518, wired-max=42595
>   faults=6992210, traps=85475941, intrs=83588156, ctxswitch=59582042
>   softint=78595794, syscalls=292274228, swapins=1434, swapouts=1467
>   fault counts:
>     noram=3, noanon=0, pgwait=0, pgrele=0
>     ok relocks(total)=25163(25163), anget(retrys)=613856(3), amapcopy=241944
>     neighbor anon/obj pg=602787/3113099, gets(lock/unlock)=941513/25164
>     cases: anon=416621, anoncow=197235, obj=819333, prcopy=122176, przero=219795
> 9
>   daemon and swap counts:
>     woke=797, revs=797, scans=5326460, obscans=4571110, anscans=3436
>     busy=0, freed=4571111, reactivate=69923, deactivate=5554947
>     pageouts=222, pending=222, nswget=3
>     nswapdev=1, nanon=271923, nanonneeded=271923freeanon=257080
>     swpages=152451, swpginuse=879, swpgonly=0 paging=0
>   kernel pointers:
>     objs(kern/kmem/mb)=0xc05cdd40/0xc05cde50/0xc05cde68
> db> sync
> syncing disks... 
> (waited more than 5 minutes, no disk I/O, send break, send
>  break...  no response.  reset)
> 
> Regards,
> -- Kazushi
> 
>    On Jan 29,  7:01, Ken Wellsch wrote:
>    > Subject: Re: running bonnie on -current/VAX
>    > Tom Ivar Helbekkmo wrote:
>    > > Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:
>    > > > Ken Wellsch <kwellsch@tampabay.rr.com> writes:
>    > > > > Writing with putc()...^C^C^C^C^C^C^C^Z^Z^Z^Z^Z^Z^Z^\^\^\  (no affect)
>    > > > > load: 6.38  cmd: bonnie 314 [uvn_fp1] 15.94u 7.80s 0% 296k
>    > > > > load: 6.43  cmd: bonnie 314 [uvn_fp1] 15.93u 7.81s 0% 296k
>    > > > > [...]
>    > > > > with every new ^T I type, the load climbs...
>    > > > 
>    > > > This might be related to a problem I see [...] just like the above,
>    > > > except that the state reported by ^T is "[getnewbuf]".
>    > > 
>    > > It's still there in -current (at least on the Sparc), and it only
>    > > occurs during large scale writing to partitions that are mounted with
>    > > the softdep option.
>    > 
>    > Good point - I've started enabling softdep on /usr to aid in disk performance
>    > on my slower systems and had it enabled on the VAX that produced the
>    > issue mentioned above.