Subject: Core2 Duo 1.8 NetBSD 4BETA SLOWER than Celeron M 1.3 NetBSD3 - Help!
To: None <netbsd-help@netbsd.org>
From: =?ISO-8859-1?Q?Lasse_Hiller=F8e_Petersen?= <lhp@toft-hp.dk>
List: netbsd-help
Date: 10/18/2006 22:41:58
Help!

This is really beyond me.

I have posted a few times about this "great fast" new Core2 Duo machine 
I bought a while ago.
After a problem with a defective 512 MB RAM block, which I swapped for 
two 1 GB block of a less unknown brand, I thought my problems were 
solved. The machine runs NetBSD 4.0BETA, from dmesg (which I have posted 
before, so here only some relevant excerpts):
NetBSD 4.0_BETA (GENERIC.MPACPI) #0: Fri Sep 15 03:25:05 UTC 2006
        
builds@b3.netbsd.org:/home/builds/ab/netbsd-4/i386/200609140000Z-obj/home/builds/ab/netbsd-4/src/sys/arch/i386/compile/GENER
IC.MPACPI
total memory = 2039 MB
avail memory = 1994 MB

The machine is equipped with a Samsung 80 GB SATA II disk, and I added 
an older Maxtor because I need to clean up a lot of old "garbage".

atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 1: <LITE-ON DVD SOHD-16P9S, , FS09> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
wd0 at atabus0 drive 0: <Maxtor 6Y080L0>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 78167 MB, 158816 cyl, 16 head, 63 sec, 512 bytes/sect x 160086528 
sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using 
DMA)
cd0(piixide0:0:1): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
wd1 at atabus1 drive 0: <SAMSUNG HD080HJ>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 76319 MB, 155061 cyl, 16 head, 63 sec, 512 bytes/sect x 156301488 
sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 7
wd1(piixide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using 
DMA)
boot device: wd1
root on wd1a dumps on wd1b
root file system type: ffs

I also have a slightly old, mediocre ThinkPad R50e, with a 1.3 GHz 
Celeron M CPU and 1270 MB RAM. I have upgraded its disk to a 120 GB WD 
Scorpion. It is this machine that has accumulated the "garbage" (Music 
CD images, ripped music from my CD collection, huge Maildir mail 
archives with thousands of small mail files, disk dumps, source code, 
etc etc.) 43722 MB in total.

Thinking that the new machine will be better suited for sorting all this 
out, and because I want to clean the ThinkPad and reinstall it 
differently, I have moved all this data to the new machine. As I intend 
to buy bigger SATA disks when economy allows, and as the two disks are 
dissimilar, I have not configured RAID-1, so to be safe, I copied the 
data twice.

Because I really am an insane, paranoid nut, I use this little script 
(md5dir) to verify data integrity:
#! /bin/sh
cd $1
find . -type f |(IFS="" ; while read f ; do echo `md5 <"$f"`"##$f" ; done)

It's perhaps not as fast/efficient/smart as mtree, but it does precisely 
what I need. Generates a simple list that can easily be manipulated with 
sed, cut, uniq, sort, perl, diff etc. It's also great for identifying 
duplicate files and so on.

Here is the time from the ThinkPad (which is called "able". I use the 
old phonetic alphabet to name my machines, but as you will see later, 
this has turned ironic on me):

able:/home $ time sudo md5dir >/tmp/lhp.md5sums
/home/lhp/bin/md5dir: cannot open ./MUSIK/Modest_Musorgskij/Nina 
Kavtaradze/Piano Music (Disc 2)/2-12 Limoges. Le MarcheÌ (La Grande 
Nouvelle).m4a: no such file
/home/lhp/bin/md5dir: cannot open ./MUSIK/Modest_Musorgskij/Nina 
Kavtaradze/Piano Music (Disc 2)/2-17 Hopak De Jeunes Ukrainiens (De 
L'opeÌra) _La Foire De Sorotchintsy_.m4a: no such file
/home/lhp/bin/md5dir: cannot open ./MUSIK/Modest_Musorgskij/Nina 
Kavtaradze/Piano Music (Disc 2)/2-18 SceÌne De Foire (Fragment De 
L'opeÌra) _La Foire De Sorotchintsy_.m4a: no such file
/home/lhp/bin/md5dir: cannot open ./MAIL_NEWS/News/Archive/Re  Tintins 
"far" HergeÌ i  Horn: no such file
 4264.78s real  1336.45s user  1363.77s system

As you see, it took 71 minutes to completely hash everything. /bin/sh 
has problems with some filenames, but that's unimportant.

Of course I ran the same on the two copies on the Core2Duo machine. Now, 
I *have* had problems with /bin/sh giving segmentation faults now and 
then, even after replacing the RAM, but no more memory faults. As a 
temporary fix, I did "mv /bin/sh /bin/osh ; ln /bin/ksh /bin/sh", which 
helped a bit when I built stuff from pkgsrc.

This had the added bonus of not giving errors with 8bit characters in 
filenames as seen above. When  I ran my script on the copy on the Maxtor 
disk, it ran OK. I let it run over night, so I don't know the time it 
took. (I just reran on the Maxtor with /bin/osh, and it crashed after 20 
minutes. I then timed the Maxtor disk with ksh, and this time it ran - 
again without any fault:
dog:/disk2/usr/ablehome $ time sudo md5dirKSH lhp 
 >ksh.md5sums              
 3909.83s real  1189.11s user  1362.91s system

It is worth noticing that this was barely faster than "able". Presumably 
this just implies that I/O is the limiting bottleneck in this operation.)

But when I tried to do the same on the Samsung SATA disk, I got  *memory 
fault* errrors after processing  about 250000 of the 1.4 million files. 
Sometimes sooner, sometimes later. I tried to switch to /bin/osh, and to 
/rescue/sh and /rescue/ksh, but still I would get a memory fault after 
some (fairly long) time. Also, it would take noticeably longer. After 
hacking up a way to do a shorter list of files at a time, and then cat 
together the complete list, I remembered that I had bash installed in 
/usr/pkg/bin. I have now been running the script for more than 6 hours - 
but at least bash didn't crash! (It just finished right now.)

So, to sum things up:

I have a supposedly "wicked fast" machine, which turns out to live up to 
the name I happened to bestow upon it: dog.
I get occasional segfaults with /bin/sh, whereas /bin/ksh works slightly 
better, but in some situations, it also segfaults - at least when doing 
stuff with the SATA disk. Bash seems to work better, but is slow as hell.

The whole mess seems to be related to the system it's running: 
i386-MPACPI, 4.0BETA build 200609140000Z, the size of its memory (?), 
and the type of task I try to perform: a shell script going through 
1,405,214 files of varying size, doing an MD5 sum on each. This I 
suppose implies large pipes, lots of memory mapped file I/O, etc.

However I don't really have the knowledge to even find out where to 
begin debugging this mess. I can barely come up with a few questions, 
which I hope some knowledgeable persons may have answers for:
* Am I correct in assuming that the RAM is not necessarily to blame 
here, IOW, can memory faults occur due to other reasons than bad RAM?
* Is there a more suitable system/kernel than i386-GENERIC.MPACPI I 
could use? Switch to amd64 perhaps? Others have been talking about XEN 
in connection with Core2Duo machines?
* Why is there such a difference between the SATA disk and the PATA 
disk? Running the same script on the same data on the PATA is fine, on 
the other I eventually get memory faults. Consistently.
* Am I doing myself a disservice by running 4.0BETA rather than 3.x? I 
had hoped I would gain support for the Realtek 8168B ethernet device on 
the motherboard (ASRock ConRoe 945G-DVI), but I haven't had any luck 
there either.
* What I am most concerned about is whether there is still a hardware 
fault, which only shows up under heavy load. But after having replaced 
the RAM, I feel this is rather unlikely? Am I being too optimistic?

Any suggestions as to what I should do with this machine (well, 
obviously excluding suggestions to donate it, trash it etc) would be 
most welcome! And if I have accidentally stumbled upon some rare -  
maybe even subtle - bug, that only shows up under special circumstances 
and loads, I sure would like to help get this fixed. I would file a PR - 
if I wasn't so unsure as to what to write in it! If someone could 
suggest some tests to run, I would be delighted to do so!

My plan was for this machine to replace the Pentium II 233 MHz with it's 
whining 40GB drive, which is my current home server. (This machine was 
set up quickly to stand in for a 350 MHz machine, which didn't come up 
after a power outage.) The high-pitch howling of "fox" is getting on my 
nerves, but before I put the "dog" on its watch, I want to be sure it 
can handle the job!

-Lasse