Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,tron%zhadum.org.uk@localhost
Subject: Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown
From: Greg Oster <oster%cs.usask.ca@localhost>
Date: Mon, 9 Feb 2009 23:05:03 +0000 (UTC)

The following reply was made to PR kern/40569; it has been noted by GNATS.

From: Greg Oster <oster%cs.usask.ca@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/40569: Faild RAIDframe parity rewrite prevents system 
shutdown 
Date: Mon, 09 Feb 2009 17:01:45 -0600

 Matthias Scheler writes:
 > The following reply was made to PR kern/40569; it has been noted by GNATS.
 > 
 > From: Matthias Scheler <tron%zhadum.org.uk@localhost>
 > To: NetBSD GNATS <gnats-bugs%NetBSD.org@localhost>
 > Cc: Greg Oster <oster%cs.usask.ca@localhost>
 > Subject: Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutd
 > own
 > Date: Mon, 9 Feb 2009 21:07:13 +0000
 > 
 >  On Mon, Feb 09, 2009 at 07:50:03PM +0000, Matthias Scheler wrote:
 >  >  This time it actually seems to fail because of the LBA48 bug. I know
 >  >  remember that re-construct was done in the opposite direction in
 >  >  the past. I'll try to add the harddisk to the quick table.
 >  
 >  1.) The system wedge again during shutdown:
 >  
 >  ahcisata0 port 2: device present, speed: 1.5Gb/s
 >  Feb  9 19:56:43 colwyn su: tron to root on /dev/ttyp1
 >  Feb  9 19:57:56 colwyn shutdown: reboot by tron: Kernel bug fix 
 >  Feb  9 19:58:11 colwyn syslogd: Exiting on signal 15
 >  syncing disks... 5 done
 >  unmounting file systems...raid1: Waiting for reconstruction to stop...

 So it is very likely sleeping in the reconstruct code and waiting for 
 a write that is never going to happen...

 AHHHH... I think I see the bug there is at least one missing:

  num_writes++;

 in rf_reconstruct:rf_ContinueReconstructFailedDisk().  (it might be 
 that two are missing.. I need to do more analysis...)  Basically 
 writes with errors are still writes that need to be accounted for, 
 and that's not happening properly...  I'll see about geting this 
 fixed for 5.0.  (testing may prove to be a pain... I may have to 
 resurect an old testing box so that I have some disks with 
 real write errors... and I'm not sure those will even be sufficient 
 to replicate this :-/ )

 >  2.) The kernel with the two hard disks in the quick table has managed
 >      to re-construct the RAID 1.
 >  
 >  So it seems this is not a RAIDframe bug after all but rather a problem
 >  with the drives (and eventually error handling in ahcisata(4)).

 There's a RAIDframe bug in there too... 

 Later...

 Greg Oster

Prev by Date: Re: kern/40587: half undone change to vfs_wapbl.c causes hang when mounting
Next by Date: Re: kern/40587: half undone change to vfs_wapbl.c causes hang when mounting
Previous by Thread: Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown
Next by Thread: Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown
Indexes:

Home | Main Index | Thread Index | Old Index