current-users: Re: Why my life is sucking.

Subject: Re: Why my life is sucking.
To: None <current-users@netbsd.org>
From: Herb Peyerl <hpeyerl@beer.org>
List: current-users
Date: 01/15/2001 15:18:15
Greywolf <greywolf@starwolf.com>  wrote:
 > # This whole situation has turned bad on me and I have no idea how to 
 > # proceed and I'm losing confidence in NetBSD as a system.
 > 
 > Coming from one of the long-time users of NetBSD, I see this as a bad 
 > thing.

It's entirely possible that it's bozosity on my part. I'm certainly not
ruling it out.

 > # 	nlager# time mkdir foo
 > # 	0.0u 0.6s 0:17.72 3.8% 0+0k 3687+7io 0pf+0w
 > 
 > Have you, perchance, tried running with the ccd instead, or do we intend
 > to phase out ccd in favour of raid?
 > 
 > [I thought ccd required less overhead than raid did, but I guess you can't
 >  boot off a ccd...]

No. I haven't considered 'ccd'.  using 'raidframe' seemed like "the right
answer".

 > # Here's a transcript.  Note: /mnt is a freshly newfs'd 10G partition.
 > 
 > I'll assume this has already been send-pr'd...

no, it has not. I wanted to see if anyone had any brilliant ideas or could
point to an error in my ways before send-pr'ing.

 > # Changing volumes on pipe input?
 > # abort? [yn] 
 > # ^Crestore > setmodes
 > 
 > I can't tell if the bug is in dump or restore or in the ffs code in general.
 > I'm absolutely baffled as to why a mkdir would require 3500 I/O calls!

the 'mkdir' problem is not related to the dump/restore problem.

 > # Also, from dmesg, everytime I mount a filesystem:
 > # 
 > # 	Non-unique normal route, mask not entered<3>Non-unique normal route, mask not entered<3>Non-unique normal route, mask not entered
 > 
 > That looks almost as though, for some reason, you're trying to do a loopback
 > mount on the fs, which I'm sure isn't what you had in mind.

I'm not.  straight ffs. 

 > Idea, forwarded by jmcneill:  Have you checked out the potential problems
 > with the UBC code?

No, I've not.

 > # The second problem came when I installed a 10G disk and tried to duplicate
 > # the OS and /home onto the disk using "dump | restore".  I actually used
 > # "restore -i" because I wanted to exclude my massive mp3 library.
 > 
 > [snip.  Many files missing, much lossage...]
 > 
 > # Here's a transcript.  Note: /mnt is a freshly newfs'd 10G partition.
 > 
 > How is /mnt mounted (what opts)?
 > How is / mounted?

/dev/raid1a on / type ffs (NFS exported, local)
/dev/wd1a on /mnt type ffs (local)

 > Manuel Bouyer <bouyer@antioche.lip6.fr> wrote:
 > > 	nlager# time mkdir foo
 > > 	0.0u 0.6s 0:17.72 3.8% 0+0k 3687+7io 0pf+0w
 > 
 > And is the machine hung during this time ?

other disk accesses are blocked, yes.  I've confirmed that the parity is
clean on the raid partition and that everything is otherwise idle.  I've
also tried it with one of the components failed.  No change.  I built a
regular filesystem on one of the components and untarred pkgsrc.tar.gz
(my benchmark for reproducing this problem) and it literally screamed
along and worked flawlessly.

 > > dump 0f - /dev/rraid1a | ( cd /mnt ; restore -if - )
 > > [...]
 > > I've duplicated this 3 times and each time the same files don't get copied. I
 > > illustrate with /sbin as an example but the lossage is everywhere.  My 1G
 > > /home partition is different by about 100MB between /home and /mnt/home.
 > 
 > Is the filesystem you copy mouted ? Maybe a mounted partiton, with some
 > activity, could produce problems like that. With softdep it could even be
 > worse.

/dev/rraid1a is my root filesystem.  The system is otherwise quiescant. In
fact, I can reproduce the problem in single-user mode.  I don't have 
softdep enabled.

 > > 	Non-unique normal route, mask not entered<3>Non-unique normal route, ma
 > sk not entered<3>Non-unique normal route, mask not entered
 > > 
 > Seems to be from net/radix.c. Is your network config sane ?
 > An ipv6 problem, maybe ?

My network config is sane as far as I'm aware:

	fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
		address: 00:d0:b7:26:ab:f8
		media: Ethernet 100baseTX
		status: active
		inet 199.166.37.36 netmask 0xffffff00 broadcast 199.166.37.255
		inet6 fe80::2d0:b7ff:fe26:abf8%fxp0 prefixlen 64 scopeid 0x1


Greg Oster suggested my problems with dump/restore leaving out files might
be related to the files that have been left out are all ones that have hard
links.  ie: mount_ufs, newfs, swapctl, swapon, etc.