Subject: 3.0_BETA I/O hang
To: None <tech-kern@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 04/14/2005 12:20:40
Hi,
I have a 3.0_BETA system which went in a strange state: all access to
a partition (read or write) hangs, the process is stuck in disk wait.
I have this setup:
mooney:/#mount
/dev/raid0a on / type ffs (local)
/dev/raid1e on /usr type ffs (local)
/dev/raid1f on /graveur type ffs (local)
/dev/raid2e on /domains type ffs (local)
mfs:828 on /tmp type mfs (synchronous, local)
/dev/wd1e on /distrib type ffs (soft dependencies, NFS exported, local)
kernfs on /kern type kernfs (local)
pid103@mooney:/auto on /auto type nfs (hidden)
hera-ip6:/home/hera1 on /amd/hera-ip6/home/hera1 type nfs
tibre:/home/tibre1 on /amd/tibre/home/tibre1 type nfs
hera-ip6:/comptes on /amd/hera-ip6/comptes type nfs

The mount point causing the problem is /domains. It contains only 2 large
files (one 6GB, one 16GB). I was writing to the 16GB one (create, not
overwrite) when this happended. The process creating the file is waiting on
uvn_fp2:
mooney:/#ps axl |grep 17063
  0 17063 12620   0 -18  0   80     4 uvn_fp2  DW+  ttyp2 3:34.89 /tmp/mkfile
Others are stuck on vnlock:
mooney:/#ps axl | grep vnlock
  0 21601 13819   0  -2  0   76     4 vnlock   DW+  ttyp3 0:00.01 ls -l 
  0 16715 15220   0  -2  0  936     4 vnlock   DW+  ttyp5 0:00.10 -csh (tcsh)
  0 21354 18277   0  -2  0   24     4 vnlock   DW   ttyp9 0:00.04 umount -f /do
  0 21987 18277   0  -2  0   56     4 vnlock   DW   ttyp9 0:00.01 df -k 

I can read from /dev/raid2d without problems,
so it's not the underlying device which is stuck. The box keeps running
fine, expect accesses to /domains.

Any idea what could cause this ? Anyone tried to create a file larger than
16GB already ? This filesystem uses has 32k block/4k fragment.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--