Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RAIDframe lockup (sparc64, MP kernel, multiple controllers)



Hi,

I get a complete machine lockup when reconstructing RF RAID 1 mirrors.
After a bit of experimenting, I can only reproduce the lockup with an MP
kernel and with disks on different controllers.  Relevant parts of the
dmesg:

  NetBSD 4.99.69 (ULTRA-PCI.MP) #4: Sat Jul  5 21:41:49 BST 2008
          
jdc%sirion.coris.org.uk@localhost:/tmp/kernels/sparc64/usr/src/sys/arch/sparc64/compile/ULTRA-PCI.MP
  total memory = 4096 MB
  avail memory = 4011 MB
  mainbus0 (root): SUNW,Ultra-80 (Sun Enterprise 420R): hostid 80e8d665
  cpu0 at mainbus0: SUNW,UltraSPARC-II @ 450.027 MHz, UPA id 0
  cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 4096K external (64 b/l)
  cpu1 at mainbus0: SUNW,UltraSPARC-II @ 450.027 MHz, UPA id 1
  cpu1: 32K instruction (32 b/l), 16K data (32 b/l), 4096K external (64 b/l)
  cpu2 at mainbus0: SUNW,UltraSPARC-II @ 450.027 MHz, UPA id 2
  cpu2: 32K instruction (32 b/l), 16K data (32 b/l), 4096K external (64 b/l)
  cpu3 at mainbus0: SUNW,UltraSPARC-II @ 450.027 MHz, UPA id 3
  cpu3: 32K instruction (32 b/l), 16K data (32 b/l), 4096K external (64 b/l)
    ...
  esiop0 at pci0 dev 3 function 0: Symbios Logic 53c875 (ultra-wide scsi)
  esiop0: using on-board RAM
  esiop0: interrupting at ivec 1820
  scsibus0 at esiop0: 16 targets, 8 luns per target
  esiop1 at pci0 dev 3 function 1: Symbios Logic 53c875 (ultra-wide scsi)
  esiop1: using on-board RAM
  esiop1: interrupting at ivec 1826
  scsibus1 at esiop1: 16 targets, 8 luns per target
    ...
  esiop2 at pci0 dev 4 function 0: Symbios Logic 53c875 (ultra-wide scsi)
  esiop2: using on-board RAM
  esiop2: interrupting at ivec 1818
  scsibus2 at esiop2: 16 targets, 8 luns per target
  esiop3 at pci0 dev 4 function 1: Symbios Logic 53c875 (ultra-wide scsi)
  esiop3: using on-board RAM
  esiop3: interrupting at ivec 1819
  scsibus3 at esiop3: 16 targets, 8 luns per target
    ...
  sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST318203LSUN18G, 034A> disk fixed
  sd0: 17274 MB, 7508 cyl, 19 head, 248 sec, 512 bytes/sect x 35378533 sectors
  sd0: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
  sd1 at scsibus0 target 1 lun 0: <SEAGATE, ST318203LSUN18G, 034A> disk fixed
  sd1: 17274 MB, 7508 cyl, 19 head, 248 sec, 512 bytes/sect x 35378533 sectors
  sd1: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
    ...
  sd2 at scsibus1 target 0 lun 0: <SEAGATE, ST373405LC, 2207> disk fixed
  sd2: 70007 MB, 29550 cyl, 8 head, 606 sec, 512 bytes/sect x 143374650 sectors
  sd2: sync (50.00ns offset 16), 16-bit (40.000MB/s) transfers, tagged queueing
   ...
  sd8 at scsibus2 target 0 lun 0: <SEAGATE, ST373405LC, 2207> disk fixed
  sd8: 70007 MB, 29550 cyl, 8 head, 606 sec, 512 bytes/sect x 143374650 sectors
  sd8: sync (50.00ns offset 16), 16-bit (40.000MB/s) transfers, tagged queueing
    ...
  raid0: RAID Level 1
  raid0: Components: /dev/sd0a /dev/sd1a
  raid0: Total Sectors: 35368192 (17269 MB)
  raid1: RAID Level 1
  raid1: Components: /dev/sd2a /dev/sd8a
  raid1: Total Sectors: 143374464 (70007 MB)
    ...

If I run a UP kernel, all is fine.  If I run an MP kernel and reconstruct on
raid0 (`raidctl -R /dev/sd1a raid0`), all is fine.  If I run an MP kernel and
reconstruct on raid1 (`raidctl -R /dev/sd8a raid1`), or on other RAID sets
that span controllers, the machine will lock hard.  Sometimes pretty much
straight away, sometimes after a few minutes.

Any ideas?

Thanks,

J

PS.  Lockup means I can only power-cycle, I can't access ddb.

-- 
  My other computer also runs NetBSD    /        Sailing at Newbiggin
        http://www.netbsd.org/        /   http://www.newbigginsailingclub.org/


Home | Main Index | Thread Index | Old Index