Subject: Re: Why my life is sucking.
To: Herb Peyerl <hpeyerl@beer.org>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: current-users
Date: 01/15/2001 22:35:07
On Mon, Jan 15, 2001 at 12:53:42PM -0700, Herb Peyerl wrote:
> So, I've been experiencing some "problems" in my life as owner of netbsd
> systems.  First things first, 'lager', my life-on-disk machine is a Sparc
> running 1.4C.  It's had impressive uptimes and hasn't had many problems
> except for dying disks.  Since I can't afford to be buying replacement 
> SCSI disks, I decided to buy a PC with cheap IDE disks.
> 
> I bought an ABIT with 800Mhz Athlon and 2 45G IBM IDE disks that I 
> intended to raid1 together.  In conjunction with my DLT drive on 
> an Adaptec controller for backups.  Coupled with a PC Weasel, it was
> supposed to improve my quality of life.
> 
> This whole situation has turned bad on me and I have no idea how to 
> proceed and I'm losing confidence in NetBSD as a system.
> 
> The first problem came when I raid1'd the two partitions together.  Every-
> thing performs admirably except when it comes to extracting something 
> like pkgsrc.tar.gz.  mkdir(2) calls consume 3500 I/O's and take 17 seconds
> to complete, most of the time:
> 
> 	nlager# time mkdir foo
> 	0.0u 0.6s 0:17.72 3.8% 0+0k 3687+7io 0pf+0w

And is the machine hung during this time ?
I have something similar on a NFS server, but not in normal operation.
This machine has 20 9G SCSI disks in raid1. All is fine for normal ops,
but rebuilding a failed disk first hang the machine for several minutes
with lots of disk activity on the mirror of the disk being reconstructed.
Then reconstruction begins and all is back to normal.

> 
> I've ruled out 'bad disk' because the work is all being done on either one
> or the other of the two disks.  ie: it makes no difference.
> 
> The kernel in question is a GENERIC 1.5 with the raid stuff linked in. The
> userland is also generic 1.5 from ftp.netbsd.org.
> 
> I've discussed this with Greg and he has no further ideas.  He's sanity
> checked my configuration however.
> 
> The second problem came when I installed a 10G disk and tried to duplicate
> the OS and /home onto the disk using "dump | restore".  I actually used
> "restore -i" because I wanted to exclude my massive mp3 library.
> 
> Here's a transcript.  Note: /mnt is a freshly newfs'd 10G partition.
> 
> 
> dump 0f - /dev/rraid1a | ( cd /mnt ; restore -if - )
> [...]
> I've duplicated this 3 times and each time the same files don't get copied. I
> illustrate with /sbin as an example but the lossage is everywhere.  My 1G
> /home partition is different by about 100MB between /home and /mnt/home.

Is the filesystem you copy mouted ? Maybe a mounted partiton, with some
activity, could produce problems like that. With softdep it could even be
worse.

> 
> Also, from dmesg, everytime I mount a filesystem:
> 
> 	Non-unique normal route, mask not entered<3>Non-unique normal route, mask not entered<3>Non-unique normal route, mask not entered
> 
> 
> Not sure what that's about.

Seems to be from net/radix.c. Is your network config sane ?
An ipv6 problem, maybe ?

--
Manuel Bouyer <bouyer@antioche.eu.org>
--