Subject: Re: Something locks up in file system on RAID5
To: None <current-users@netbsd.org>
From: Kazushi (Jam) Marukawa <jam@pobox.com>
List: current-users
Date: 10/28/2001 17:53:04
And then, I got lock-up trouble again on Sep 21 kernel also.

If I use applications which just open file and write, it
works fine with Sep 21 kernel if applications write over the
limit.

If I use other applications which do lock and other jobs for
writing, applications lock up when it write over the limit.
This time I got following ps status, and cannot kill this
process.

 1000 284  282   2 -18   0  4556  1740 genput   DL+  p1  0:20.91 mush -N

I'm pretty sure that older kernel worked fine in both cases,
but I have no older kernel here.

Please, anybody remember some changes might cause this
problem, please check the code.  Thanks.

Regards,
-- Kazushi

   On Oct 28, 16:41, Kazushi (Jam) Marukawa wrote:
   > Subject: Re: Something locks up in file system on RAID5
   > Hi,
   > 
   > I tested old kernels.  I got the same lock-up problem with
   > Oct 6 kernel also.  The kernel what worked well here was Sep
   > 21 kernel.  Both are 1.5Y, but some changes in the kernel
   > made from around Sep 21 to Oct 6 are causing this problem.
   > 
   > Please check this out if somebody remember something.
   > 
   > The problem is that all processes try to write over the file
   > system limit lock up.  This limit is software (user-level)
   > limit.  After that, any processes accessing the same file
   > system area (directory) lock up.  After 8 hours, entire file
   > system became untouchable.  I'm using regular file system
   > without softdep option on RAID5.  More details are below.
   > Thanks in advance.
   > 
   > Regards,
   > -- Kazushi
   > 
   >    On Oct 28, 10:14, Kazushi (Jam) Marukawa wrote:
   >    > Subject: Re: Something locks up in file system on RAID5
   >    > The trouble is in the latest current also.  I created new
   >    > kernel Oct 27, and got the same lock-up problem, when I
   >    > wrote data to RAID5 disk although it's full.  It's something
   >    > like this.  I received "write failed, file system is full"
   >    > message 5 times, and then received the same message once.
   >    > Kernel gone.
   >    > 
   >    >   Oct 28 09:25:19 sou /netbsd: uid 1000 comm perl on /mnt2: file system full
   >    >   Oct 28 09:25:20 sou last message repeated 5 times
   >    >   Oct 28 09:25:24 sou /netbsd: uid 1000 comm wget on /mnt2: file system full
   >    > 
   >    > I remember Oct 6 kernel or at least Sep 22 kernel wasn't
   >    > locked up in the same situation.  Now, I'm fscking file
   >    > system.  I'll try older kernels after fsck.
   >    > 
   >    > Ps shows different status this time.  Last time, it was
   >    > vnlock.
   >    > 
   >    >   1000  7783 20305   1  -5   0  2860  3188 biowait  DL+  p5 1:24.86 /usr/bin/perl ...
   >    > 
   >    > FYI, my /mnt2 is regular ufs without any option like softdep
   >    > on RAID5.
   >    > 
   >    >    On Oct 26, 23:39, Kazushi (Jam) Marukawa wrote:
   >    >    > Subject: Something locks up in file system on RAID5
   >    >    > I'm using Oct 24 current version of NetBSD.  The file system
   >    >    > on RAID5 just locks up somehow.  Other processes accessed
   >    >    > that area locks up also.  I don't know this is RAID5 problem
   >    >    > or general problem.  I just experienced this problem on
   >    >    > RAID5 file system.
   >    >    > 
   >    >    > Oh yes, I was forgetting to mention this.  I experienced
   >    >    > this lock up after having file system full problem.  All
   >    >    > processes locked up were writing some huge data and stops
   >    >    > while it was showing file system full warning.
   >    >    > 
   >    >    > 
   >    >    > BTW, some processes locked up show following different
   >    >    > information.
   >    >    > 
   >    >    > 1000 24172 24122   0  -2   4   324     4 vnlock   DWN+  p7 0:00.00 ls -F
   >    >    > 1000 24169   290   7 -14   0   476     4 vgone    DW+   p2 0:00.00 du -sk ...
   >    >    > 1000 23129     1   0 -14   0  2376     4 vget     DW    p7- 0:00.00 /usr/bin/perl ...
   >    > 
   >    > Regards,
   >    > -- Kazushi
   >    >-- End of excerpt from Kazushi (Jam) Marukawa
   >-- End of excerpt from Kazushi (Jam) Marukawa