Subject: Re: RAID1 bootblocks for 2.0 (and more ramblings)
To: None <port-sparc64@netbsd.org>
From: Jonathan Perkin <jonathan@perkin.org.uk>
List: port-sparc64
Date: 12/12/2004 13:35:25
* On 2004-12-12 at 12:48 GMT, Martin Husemann wrote:

> On Sat, Dec 11, 2004 at 05:07:40PM +0000, Jonathan Perkin wrote:
>
> >   Boot device: /pci@1f,0/ide@d/disk@0,0  File and args: netbsd
> >   NetBSD IEEE 1275 Bootblock
> >   .Inode not directory
> 
> Did you have a non-raid partition there previously?

On wd0 yes (the initial installation).  On wd1, no - I completely
trashed the first few sectors before starting the install process.

> Maybe the bootblk finds the old supberlock magic and accepts it
> (yes, this is a bug). The disklabel on wd0 says "RAID" as partition
> type?

Yep:

  # disklabel wd0
  [..]
  3 partitions:
  #        size    offset     fstype [fsize bsize cpg/sgs]
   a: 234441648         0       RAID                     # (Cyl. 0 - 232580)
   c: 234441648         0     unused      0     0        # (Cyl. 0 - 232580)

  # disklabel wd1
  [..]
  3 partitions:
  #        size    offset     fstype [fsize bsize cpg/sgs]
   a: 234441648         0       RAID                     # (Cyl. 0 - 232580)
   c: 234441648         0     unused      0     0        # (Cyl. 0 - 232580)

Some more musings:

No matter how cleanly I try to shut down the machine, when booting I
always get

  /dev/rraid0c: Parity status: DIRTY
  /dev/rraid0c: Initiating re-write of parity

and another 3 hour wait for it to sync.  I've even tried shutting down
to single user, syncing partitions, unmounting them, fscking them and
then rebooting, but I still get dirty parity.  For good measure, the
RAID disklabel is:

  # disklabel raid0
  [..]
  8 partitions:
  #        size    offset     fstype [fsize bsize cpg/sgs]
   a:   1048576         0     4.2BSD   1024  8192 43696  # (Cyl.  0 -   1023)
   b:   4194304   1048576       swap                     # (Cyl.  1024 -   5119)
   c: 234441472         0     unused      0     0        # (Cyl.  0 - 228946*)
   d:   4194304   5242880     4.2BSD   2048 16384 21848  # (Cyl.  5120 -   9215)
   e:  33554432   9437184     4.2BSD   2048 16384 28720  # (Cyl.  9216 -  41983)
   f: 191449856  42991616     4.2BSD   2048 16384 28872  # (Cyl.  41984 - 228946*)

  (/ swap /var /usr /home)

Also, I'm finding a significant number of

  Dec 11 17:32:46 icthus /netbsd: wd0: transfer error, downgrading to Ultra-DMA mode 2
  Dec 11 17:32:46 icthus /netbsd: wd0a: DMA error reading fsbn 229568 of 229568-229695 (wd0 bn 229568; cn 227 tn 11 sn 59), retrying
  Dec 11 17:32:46 icthus /netbsd: wd0: soft error (corrected)

on both disks.  Archive posts suggest this is due to sub-optimal
cabling, which I'd hope given they are brand new disks!

  wd0 at atabus0 drive 0: <ST3120026A>
  wd0: drive supports 16-sector PIO transfers, LBA48 addressing
  wd0: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
  wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
  wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)

  wd1 at atabus1 drive 0: <ST3120026A>
  wd1: drive supports 16-sector PIO transfers, LBA48 addressing
  wd1: 111 GB, 232581 cyl, 16 head, 63 sec, 512 bytes/sect x 234441648 sectors
  wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
  wd1(aceride0:1:0): using PIO mode 4, Ultra-DMA mode 4 (Ultra/66) (using DMA data transfers)

The fact the errors are on somewhat random blocks and happen on both
disks suggest this to be the case.  Have any other Netra X1 owners
seen this with the supplied cables?  I'm pondering getting a couple of
80pin Ultra-133 rounded ones instead, so any experience with certain
cables being suitable for remote servers would be appreciated.  The
ones supplied with the X1s certainly don't look too high-end...

Thanks,

-- 
Jonathan Perkin                                     The NetBSD Project
http://www.perkin.org.uk/                       http://www.netbsd.org/