current-users: Re: Why my life is sucking.

Subject: Re: Why my life is sucking.
To: Herb Peyerl <hpeyerl@beer.org>
From: Greywolf <greywolf@starwolf.com>
List: current-users
Date: 01/15/2001 13:29:10
On Mon, 15 Jan 2001, Herb Peyerl wrote:

# So, I've been experiencing some "problems" in my life as owner of netbsd
# systems.  First things first, 'lager', my life-on-disk machine is a Sparc
# running 1.4C.  It's had impressive uptimes and hasn't had many problems
# except for dying disks.  Since I can't afford to be buying replacement 
# SCSI disks, I decided to buy a PC with cheap IDE disks.
# 
# I bought an ABIT with 800Mhz Athlon and 2 45G IBM IDE disks that I 
# intended to raid1 together.  In conjunction with my DLT drive on 
# an Adaptec controller for backups.  Coupled with a PC Weasel, it was
# supposed to improve my quality of life.
# 
# This whole situation has turned bad on me and I have no idea how to 
# proceed and I'm losing confidence in NetBSD as a system.

Coming from one of the long-time users of NetBSD, I see this as a bad 
thing.

This means that either more and more fringe cases are coming into the
mainstream, or stuff is getting broken and not thoroughly tested.
Either way, it's not favourable.

# The first problem came when I raid1'd the two partitions together.  Every-
# thing performs admirably except when it comes to extracting something 
# like pkgsrc.tar.gz.  mkdir(2) calls consume 3500 I/O's and take 17 seconds
# to complete, most of the time:
# 
# 	nlager# time mkdir foo
# 	0.0u 0.6s 0:17.72 3.8% 0+0k 3687+7io 0pf+0w

Gah!

Have you, perchance, tried running with the ccd instead, or do we intend
to phase out ccd in favour of raid?

[I thought ccd required less overhead than raid did, but I guess you can't
 boot off a ccd...]


# I've ruled out 'bad disk' because the work is all being done on either one
# or the other of the two disks.  ie: it makes no difference.

# The kernel in question is a GENERIC 1.5 with the raid stuff linked in. The
# userland is also generic 1.5 from ftp.netbsd.org.
# 
# I've discussed this with Greg and he has no further ideas.  He's sanity
# checked my configuration however.
# 
# The second problem came when I installed a 10G disk and tried to duplicate
# the OS and /home onto the disk using "dump | restore".  I actually used
# "restore -i" because I wanted to exclude my massive mp3 library.
# 
# Here's a transcript.  Note: /mnt is a freshly newfs'd 10G partition.

I'll assume this has already been send-pr'd...

[snip]
# Changing volumes on pipe input?
# abort? [yn] n
# Changing volumes on pipe input?
# abort? [yn] n
# Changing volumes on pipe input?
# abort? [yn] n
# Changing volumes on pipe input?
# abort? [yn] 
# ^Crestore > setmodes

I can't tell if the bug is in dump or restore or in the ffs code in general.
I'm absolutely baffled as to why a mkdir would require 3500 I/O calls!

[snip]
# I've duplicated this 3 times and each time the same files don't get copied. I
# illustrate with /sbin as an example but the lossage is everywhere.  My 1G
# /home partition is different by about 100MB between /home and /mnt/home.
# 
# Also, from dmesg, everytime I mount a filesystem:
# 
# 	Non-unique normal route, mask not entered<3>Non-unique normal route, mask not entered<3>Non-unique normal route, mask not entered

That looks almost as though, for some reason, you're trying to do a loopback
mount on the fs, which I'm sure isn't what you had in mind.

# Not sure what that's about.

Lots of shots in the dark; hoping one of them will have struck a familiarity
bit.

I've got my own sun4m woes, but I'm about to sanity-check my kernel config
first.

In any case, I'd like to see this stuff fixed and for NetBSD to continue.
It'll be a sad day when I have to relegate my SS5 to Linux.

				--*greywolf;