Subject: Re: Something locks up in file system on RAID5
To: None <current-users@netbsd.org>
From: Kazushi (Jam) Marukawa <jam@pobox.com>
List: current-users
Date: 10/28/2001 17:53:04
And then, I got lock-up trouble again on Sep 21 kernel also.
If I use applications which just open file and write, it
works fine with Sep 21 kernel if applications write over the
limit.
If I use other applications which do lock and other jobs for
writing, applications lock up when it write over the limit.
This time I got following ps status, and cannot kill this
process.
1000 284 282 2 -18 0 4556 1740 genput DL+ p1 0:20.91 mush -N
I'm pretty sure that older kernel worked fine in both cases,
but I have no older kernel here.
Please, anybody remember some changes might cause this
problem, please check the code. Thanks.
Regards,
-- Kazushi
On Oct 28, 16:41, Kazushi (Jam) Marukawa wrote:
> Subject: Re: Something locks up in file system on RAID5
> Hi,
>
> I tested old kernels. I got the same lock-up problem with
> Oct 6 kernel also. The kernel what worked well here was Sep
> 21 kernel. Both are 1.5Y, but some changes in the kernel
> made from around Sep 21 to Oct 6 are causing this problem.
>
> Please check this out if somebody remember something.
>
> The problem is that all processes try to write over the file
> system limit lock up. This limit is software (user-level)
> limit. After that, any processes accessing the same file
> system area (directory) lock up. After 8 hours, entire file
> system became untouchable. I'm using regular file system
> without softdep option on RAID5. More details are below.
> Thanks in advance.
>
> Regards,
> -- Kazushi
>
> On Oct 28, 10:14, Kazushi (Jam) Marukawa wrote:
> > Subject: Re: Something locks up in file system on RAID5
> > The trouble is in the latest current also. I created new
> > kernel Oct 27, and got the same lock-up problem, when I
> > wrote data to RAID5 disk although it's full. It's something
> > like this. I received "write failed, file system is full"
> > message 5 times, and then received the same message once.
> > Kernel gone.
> >
> > Oct 28 09:25:19 sou /netbsd: uid 1000 comm perl on /mnt2: file system full
> > Oct 28 09:25:20 sou last message repeated 5 times
> > Oct 28 09:25:24 sou /netbsd: uid 1000 comm wget on /mnt2: file system full
> >
> > I remember Oct 6 kernel or at least Sep 22 kernel wasn't
> > locked up in the same situation. Now, I'm fscking file
> > system. I'll try older kernels after fsck.
> >
> > Ps shows different status this time. Last time, it was
> > vnlock.
> >
> > 1000 7783 20305 1 -5 0 2860 3188 biowait DL+ p5 1:24.86 /usr/bin/perl ...
> >
> > FYI, my /mnt2 is regular ufs without any option like softdep
> > on RAID5.
> >
> > On Oct 26, 23:39, Kazushi (Jam) Marukawa wrote:
> > > Subject: Something locks up in file system on RAID5
> > > I'm using Oct 24 current version of NetBSD. The file system
> > > on RAID5 just locks up somehow. Other processes accessed
> > > that area locks up also. I don't know this is RAID5 problem
> > > or general problem. I just experienced this problem on
> > > RAID5 file system.
> > >
> > > Oh yes, I was forgetting to mention this. I experienced
> > > this lock up after having file system full problem. All
> > > processes locked up were writing some huge data and stops
> > > while it was showing file system full warning.
> > >
> > >
> > > BTW, some processes locked up show following different
> > > information.
> > >
> > > 1000 24172 24122 0 -2 4 324 4 vnlock DWN+ p7 0:00.00 ls -F
> > > 1000 24169 290 7 -14 0 476 4 vgone DW+ p2 0:00.00 du -sk ...
> > > 1000 23129 1 0 -14 0 2376 4 vget DW p7- 0:00.00 /usr/bin/perl ...
> >
> > Regards,
> > -- Kazushi
> >-- End of excerpt from Kazushi (Jam) Marukawa
>-- End of excerpt from Kazushi (Jam) Marukawa