Subject: port-sparc64/19196: filesystem deadlock (lfs over nfs)
To: None <gnats-bugs@gnats.netbsd.org>
From: Lubomir Sedlacik <salo@Xtrmntr.org>
List: netbsd-bugs
Date: 11/28/2002 23:33:11
>Number:         19196
>Category:       port-sparc64
>Synopsis:       filesystem deadlock (lfs over nfs)
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    port-sparc64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Nov 28 14:34:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Lubomir Sedlacik
>Release:        NetBSD 1.6K 20021127
>Organization:
>Environment:
>Description:

machine hung when updating cvs tree over nfs.  underlying filesystem is lfs.
nfs is unresponsive, console is hung, i can ping it (no other services are
running).

the system was cross-build on i386, if that could indicate some problems.  the
machine is still in ddb and i can leave it there for few hours for further
investigation if someone will respond quickly.  otherwise i'll try to build
system directly on sparc64 and try it again.

db> ps /n                                                                     
 PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
 355                1        355          0 3  0x4006            getty getnewb
 328              327        327          0 3     0x4     lfs_cleanerd getnewb
 327                1        327          0 3    0x84     lfs_cleanerd    wait
 310                1        310          0 3    0x84           mountd  select
 147              140        140          0 3     0x4             nfsd  vnlock
 146              140        140          0 3     0x4             nfsd  vnlock
 145              140        140          0 3     0x4             nfsd  vnlock
 144              140        140          0 3     0x4             nfsd getnewb
 140                1        140          0 3    0x84             nfsd  select
 108                1        108          0 3    0x84          rpcbind  select
 95                 1         95          0 3     0x4          syslogd getnewb
 6                  0          0          0 3 0x20204         aiodoned aiodone
 5                  0          0          0 3 0x20204          ioflush lfs seg
 4                  0          0          0 3 0x20204           reaper  reaper
 3                  0          0          0 3 0x20204       pagedaemon pgdaemo
 2                  0          0          0 3 0x20204         scsibus0  sccomp
 1                  0          1          0 3  0x4084             init    wait
 0                 -1          0          0 3 0x20204          swapper schedul
db> show uvmexp                                  
Current UVM status:
  pagesize=8192 (0x2000), pagemask=0x1fff, pageshift=13
  10872 VM pages: 1036 active, 73 inactive, 44 wired, 7809 free
  min  10% (25) anon, 10% (25) file, 5% (12) exec              
  max  80% (204) anon, 50% (128) file, 30% (76) exec
  pages  1649 anon, 5 file, 165 exec                
  freemin=32, free-target=42, inactive-target=3115, wired-max=3624
  faults=3206494, traps=1542060, intrs=28890934, ctxswitch=2784039
  softint=0, syscalls=5224181, swapins=101, swapouts=101          
  fault counts:                                         
    noram=0, noanon=0, pgwait=0, pgrele=0
    ok relocks(total)=276530(276530), anget(retrys)=12144(0), amapcopy=3581
    neighbor anon/obj pg=5684/135923, gets(lock/unlock)=304406/276530      
    cases: anon=8782, anoncow=3362, obj=302007, prcopy=2399, przero=1211476
  daemon and swap counts:                                                  
    woke=4326, revs=4326, scans=801265, obscans=621417, anscans=0
    busy=357, freed=0, reactivate=145609, deactivate=629768      
    pageouts=0, pending=0, nswget=0                        
    nswapdev=1, nanon=75911, nanonneeded=75911 nfreeanon=74972
    swpages=65759, swpginuse=0, swpgonly=0 paging=0           

disklabel:

type: unknown
disk: BSD
label:
flags:
bytes/sector: 512
sectors/track: 133
tracks/cylinder: 27
sectors/cylinder: 3591
cylinders: 4924
total sectors: 17682084
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0

8 partitions:
#        size    offset     fstype  [fsize bsize cpg/sgs]
 a:   2100735         0     4.2BSD   1024  8192    16   # (Cyl.    0 - 584)
 b:   1052163   2100735       swap                      # (Cyl.  585 - 877)
 c:  17682084         0     unused      0     0         # (Cyl.    0 - 4923)
 d:   1048572   3152898     4.2BSD   1024  8192    16   # (Cyl.  878 - 1169)
 e:   4194288   4201470     4.4LFS   1024  8192     7   # (Cyl. 1170 - 2337)
 f:   9286326   8395758     4.2BSD   1024  8192    16   # (Cyl. 2338 - 4923)

/etc/fstab:

/dev/sd0a / ffs rw,softdep 1 2
/dev/sd0b none swap sw 0 0
/dev/sd0d /var ffs rw,softdep 1 1
/dev/sd0e /cvs lfs rw 1 1        
/dev/sd0f /pub ffs rw,softdep 1 1

dmesg:

Boot device: disk0  File and args: 
NetBSD IEEE 1275 Bootblock
..>> NetBSD/sparc64 OpenFirmware Boot, Revision 1.6
>How-To-Repeat:

try to stress lfs exported over nfs from sparc64 machine (unconfirmed yet).
>Fix:

none provided.
>Release-Note:
>Audit-Trail:
>Unformatted:
 >> (salo@otaku, Wed Nov 27 17:32:01 CET 2002)
 loadfile: reading header
 elf64_exec: Booting /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0:a/netbsd
 4454312@0x1000000+140472@0x1800000+4053832@0x18224b8 
 symbols @ 0xfef7e340 90+342696+182268 start=0x1000000
 chain: calling OF_chain(800000, e4b0, 1000000, fffb5a80, 18)
 [ using 525896 bytes of netbsd ELF symbol table ]
 console is /sbus@1f,0/zs@f,1100000:a             
 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002
     The NetBSD Foundation, Inc.  All rights reserved. 
 Copyright (c) 1982, 1986, 1989, 1991, 1993           
     The Regents of the University of California.  All rights reserved.
                                                                       
 NetBSD 1.6K (GENERIC) #0: Wed Nov 27 19:14:56 CET 2002
     salo@otaku:/opt/src/obj/sys/arch/sparc64/compile/GENERIC
 total memory = 98304 KB                                     
 avail memory = 80840 KB
 using 627 buffers containing 5016 KB of memory
 bootpath: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0
 mainbus0 (root): SUNW,Ultra-1                             
 cpu0 at mainbus0: SUNW,UltraSPARC @ 166.989 MHz, version 0 FPU
 cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 512K external (64 b/l)
 timer0 at mainbus0 addr 0xfffc7c00 irq vectors 7f0 and 7f1               
 sbus0 at mainbus0 addr 0xfffcc000: clock = 25 MHz         
 DVMA map: ff800000 to ffffe000                   
 IOTSB: 776000 to 778000       
 audiocs0 at sbus0 slot 13 offset 0xc000000 vector 24 ipl 13: CS4231A
 audio0 at audiocs0: full duplex                                     
 auxio0 at sbus0 slot 15 offset 0x1900000
 flashprom at sbus0 slot 15 offset 0x0 not configured
 SUNW,fdtwo at sbus0 slot 15 offset 0x1400000 vector 29 ipl 11 not configured
 clock0 at sbus0 slot 15 offset 0x1200000: mk48t59: hostid 8086f6f0          
 zs0 at sbus0 slot 15 offset 0x1100000 vector 28 ipl 12 softpri 6  
 zstty0 at zs0 channel 0 (console i/o)                           
 zstty1 at zs0 channel 1              
 zs1 at sbus0 slot 15 offset 0x1000000 vector 28 ipl 12 softpri 6
 zstty2 at zs1 channel 0                                         
 kbd0 at zstty2         
 zstty3 at zs1 channel 1
 ms0 at zstty3          
 sc at sbus0 slot 15 offset 0x1300000 not configured
 SUNW,pll at sbus0 slot 15 offset 0x1304000 not configured
 dma0 at sbus0 slot 14 offset 0x8400000: dma rev 2        
 esp0 at dma0 slot 14 offset 0x8800000 vector 20 ipl 3: ESP200, 40MHz, SCSI ID 7
 scsibus0 at esp0: 8 targets, 8 luns per target                                 
 ledma0 at sbus0 slot 14 offset 0x8400010: dma rev 2
 le0 at ledma0 slot 14 offset 0x8c00000 vector 21 ipl 6: address 08:00:20:86:f6:f0
 le0: 8 receive buffers, 2 transmit buffers                                       
 bpp0 at sbus0 slot 14 offset 0xc800000 vector 22 ipl 2: dma rev 2
 cgsix0 at sbus0 slot 2 offset 0x0 vector 5 ipl 5: SUNW,501-2325, 1152 x 900, rev 11
 cgsix0: attached to /dev/fb                                                        
 pcons at mainbus0 not configured
 Kernelized RAIDframe activated  
 scsibus0: waiting 2 seconds for devices to settle...
 sd0 at scsibus0 target 0 lun 0: <IBM, DDRS39130SUN9.0G, S98E> disk fixed
 sd0: 8637 MB, 4926 cyl, 27 head, 133 sec, 512 bytes/sect x 17689267 sectors
 sd0: sync (100.0ns offset 15), 8-bit (10.000MB/s) transfers                
 sd1 at scsibus0 target 1 lun 0: <IBM, DCAS32160SUN2.1G, S65A> disk fixed
 sd1: 2063 MB, 8188 cyl, 3 head, 172 sec, 512 bytes/sect x 4226725 sectors
 sd1: sync (100.0ns offset 15), 8-bit (10.000MB/s) transfers              
 cd0 at scsibus0 target 6 lun 0: <TOSHIBA, XM5701TASUN12XCD, 0997> cdrom removable
 cd0: sync (100.0ns offset 8), 8-bit (10.000MB/s) transfers                       
 root on sd0a dumps on sd0b                                
 root file system type: ffs