Subject: Re: 3.0_BETA I/O hang
To: Bill Studenmund <wrstuden@NetBSD.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: tech-kern
Date: 04/15/2005 15:02:24
On Thu, Apr 14, 2005 at 09:36:25AM -0700, Bill Studenmund wrote:
> On Thu, Apr 14, 2005 at 12:20:40PM +0200, Manuel Bouyer wrote:
> > 
> > The mount point causing the problem is /domains. It contains only 2 large
> > files (one 6GB, one 16GB). I was writing to the 16GB one (create, not
> > overwrite) when this happended. The process creating the file is waiting on
> > uvn_fp2:
> 
> I think that's the main culprit. I think the other nodes are all pilled up 
> on it in one way or another.

Yes, most probably.

> 
> > mooney:/#ps axl |grep 17063
> >   0 17063 12620   0 -18  0   80     4 uvn_fp2  DW+  ttyp2 3:34.89 /tmp/mkfile
> > Others are stuck on vnlock:
> > mooney:/#ps axl | grep vnlock
> >   0 21601 13819   0  -2  0   76     4 vnlock   DW+  ttyp3 0:00.01 ls -l 
> >   0 16715 15220   0  -2  0  936     4 vnlock   DW+  ttyp5 0:00.10 -csh (tcsh)
> >   0 21354 18277   0  -2  0   24     4 vnlock   DW   ttyp9 0:00.04 umount -f /do
> >   0 21987 18277   0  -2  0   56     4 vnlock   DW   ttyp9 0:00.01 df -k 
> > 
> > I can read from /dev/raid2d without problems,
> > so it's not the underlying device which is stuck. The box keeps running
> > fine, expect accesses to /domains.
> > 
> > Any idea what could cause this ? Anyone tried to create a file larger than
> > 16GB already ? This filesystem uses has 32k block/4k fragment.
> 
> No, but I've seen this on occasion. For me, it happens when I'm getting a
> crash dump of a multi-threaded app I'm working on. Sometimes crash dumps 
> tickle it, sometimes they don't. I have had days where every core works, 
> and days where every core does this.
> 
> The problem is that a page has been marked busy, and genfs_getpages is 
> waiting for it to unbusy. I expect that either we lost an unbusy, or we 
> somehow or another already have busied the pages and thus are deadlocked 
> on ourself.

My (very simple) application use O_WRONLY | O_SYNC, if that matters. Do core
dumping also use synchronous I/O ?

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--