Re: writing cdhdtape to CD

To: port-alpha%NetBSD.org@localhost
Subject: Re: writing cdhdtape to CD
From: "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost>
Date: Wed, 10 Jun 2009 10:45:37 -0600 (MDT)

On Sun, 7 Jun 2009, Manuel Bouyer wrote:

On Sat, Jun 06, 2009 at 10:58:08AM +0200, Anders Hogrelius wrote:


This problem has been there a long time, at least since 3.0. It does not
only affect the CS20/DS20L but also the DS20 and 264DP. It is also the
reason why I gave up on trying to use NetBSD on my production boxes. I
suspect it might be related in some way to the problem Michael described
too as I ran in to that problem when I tried to boot my boxes with a
non-MP kernel. I didn't dig deeper into the cause however I can say
that it is not driver specific, the same problem occurs regardless of if
your disk is hooked up to the internal SCSI chain or to a card in the PCI
slots. It doesn't seem to be SCSI specific either as for me it thrashed
the filesystem on disks hooked up to the ATA controller too when I tried
that.


Can you give more details on the issue ? I have NetBSD 5.0 running on a
DS20 and several XP1000, and don't have issues with it. All are UP though.

Both of my CS20 systems have 2GB of memory and the Symbios Logic 53c1010SCSI controller.

Both ran quite well with 3.x and 4.0 with the SCSI drives I had beenusing for quite some time. One CS20 had a 72GB Hitachi drive (laterreplace with a 72GB Compaq drive) and the other had an 18GB SeagateCheetah. Once the problems I found with SMP in 3.x had been fixed, I wasable to run MP on both for the duration of 3.x and 4.0.

The disk problems began showing up after I upgraded the drives to 140GBFujitsu drives (MBA3147NC). Because I didn't need to reboot very often, Ididn't concern myself with the disk I/O problems.

After the work Andy did on 4.99.x for locking, I tried an MP kernel afew times (it may have been on another MP alpha, I can't remember forsure). I found a problem with the tlb shootdown corrupting thepool_cache, which resulted in one of the CPUs looping. I tried using adifferent way of dealing with the tlb shootdown stuff, and was able to geta kernel that would run for a while, but eventually paniced (and I don'trecall where that paniced). I kind of gave up pursuing that at the time,and sometime later noted that someone had addressed some kind of problemwith corruption in the pool cache code (can't remember the details ofthat, either - age is getting to me). I attempted an MP kernel again, andI think I got panics similar to the previously mentioned ones. Theproblem there appears related to the tlb shootdown code again, but Ihaven't had the time to delve into that yet (some day I'll get there, ifno one else figures it out before then). These things occured prior tothe netbsd-5 branch, so the netbsd-5 kernels (and -current) are notcapable of running MP at this time. [The GENERIC.MP kernel does appear torun fine with only 1 cpu enabled.]

Back to the disk I/O problems: while running a 5.0 kernel, I had one ofmy CS20 crash for some reason (can't remember what it was now, since I gotsidetracked with the recovery). When rebooting, I ran into the disk I/Oproblem, and along with that found that something appeared to havescrambled one of the inode blocks on the root partition. After trackingdown exactly which disk blocks contained those inodes, I was able todetermine that the data was not close to what it should be. [Note to theperson whose fsck clobbered the disk when it had problems reading: whenmy fsck fails during the preen on bootup, I am usually very careful aboutrunning an fsck that modifies the disk until I'm sure what it'scomplaining about and what it's going to do to fix it. That saved me fromclobbering the disk more that the one block of inodes did. And that blockcontained files in the /.sysinst directory, so I didn't loose anything atthat time.]

I continued running the 5.0 kernel after that, and again got a panic(this one was somewhere in the UDP checksum code, if I remember). Again,I experienced the disk I/O problems and once the disk I/O was workingcorrectly, I found that another block of inodes had gotten overwrittenwith data similar to what happened previously. I don't know if the baddata is due to a caching issue with the disks, or a problem with theNetBSD kernel, or a problem due to the disk I/O problems.

One thing I need to try sometime is to try to see what data corruptionis occuring when the disk I/O problems are occuring (but when thathappens, I have problems running some programs, so it may be hard tocapture any specific data to determine exactly what is getting corrupted).


--
Michael L. Hitch                        mhitch%montana.edu@localhost
Computer Consultant
Information Technology Center
Montana State University        Bozeman, MT     USA

References:
- Re: writing cdhdtape to CD
  - From: Michael L. Hitch
- Re: writing cdhdtape to CD
  - From: Dustin Marquess
- Re: writing cdhdtape to CD
  - From: Anders Hogrelius
- Re: writing cdhdtape to CD
  - From: Manuel Bouyer

Prev by Date: Re: writing cdhdtape to CD
Next by Date: Re: Package binaries for NetBSD/alpha 3.1 / pkgsrc-2009Q1
Previous by Thread: Re: writing cdhdtape to CD
Next by Thread: Re: writing cdhdtape to CD
Indexes:

Home | Main Index | Thread Index | Old Index