NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/53956: raidframe fails to create raid set



The following reply was made to PR bin/53956; it has been noted by GNATS.

From: Havard Eidnes <he%NetBSD.org@localhost>
To: mrg%eterna.com.au@localhost
Cc: christos%zoulas.com@localhost, gnats-bugs%NetBSD.org@localhost, netbsd-bugs%netbsd.org@localhost,
 oster%netbsd.org@localhost
Subject: Re: bin/53956: raidframe fails to create raid set
Date: Fri, 08 Feb 2019 10:00:48 +0100 (CET)

 >> | > # raidctl -C /tmp/raid0.conf raid0
 >> | > Hosed component: /dev/wd1e
 >> | > Hosed component: /dev/wd0e
 >> | > raid0: Ignoring /dev/wd0e
 >> | [snip]
 >> | > raid0: Components: /dev/wd0e[**FAILED**] /dev/wd1e[**FAILED**]
 >>
 >> Can you dd some zeros in wd0e and wd1e and see if that changes the
 >> situation?
 >
 > can you save the current contents as first!  so we can debug
 > the problem i this turns out to be the cause..
 
 Hmm...  The raid has now been created using the 7.2_STABLE kernel and
 tools, and is presently undergoing initialization ("raidctl -i").
 
 I suspect therefore that the initial content has been overwritten.
 
 However, the point of the "-C" option to raidctl is that no matter the
 contents on the drives (zeroed or "random"), any errors related to the
 contents on the drives must be ignored, and the raid should be
 created, as long as the components can be physically read or written
 (I proved they could be read, and 7.2_STABLE certainly can write them,
 so I don't see why 8.0 should fail to do so).
 
 As you saw from the 7.2_STABLE transcript, "raidctl -C" spewed
 basically the same sort of errors as on 8.0 ("hosed component", "row
 out of alignment", "coloumn out of alignment" etc.), although some
 error messages didn't come on 7.2_STABLE ("ignoring <component>",
 "failed to create a dag. Too many component failures."), but on
 7.2_STABLE those errors it did spew were ignored, while that didn't
 happen on 8.0.  Therefore, I lean towards Greg's explanation, that the
 "ignore errors" flag has somehow become unset in 8.0 even though "-C"
 is specified.
 
 Hmm, this function was introduced in netbsd-8:
 
 static void
 rf_handle_hosed(RF_Raid_t *raidPtr, RF_Config_t *cfgPtr, int hosed_column,
     int again)
 {
         printf("Hosed component: %s\n", &cfgPtr->devnames[0][hosed_column][0]);
         if (!cfgPtr->force)
                 return;
 
         /* we'll fail this component, as if there are
            other major errors, we aren't forcing things
            and we'll abort the config anyways */
         if (again && raidPtr->Disks[hosed_column].status == rf_ds_failed)
                 return;
 
         raidPtr->Disks[hosed_column].status = rf_ds_failed;
         raidPtr->numFailures++;
         raidPtr->status = rf_rs_degraded;
 }
 
 Compare this to the open-coded version in netbsd-7:
 
                         printf("Hosed component: %s\n",
                                &cfgPtr->devnames[0][hosed_column][0]);
                         if (!force) {
                                 /* we'll fail this component, as if there are
                                    other major errors, we arn't forcing things
                                    and we'll abort the config anyways */
                                 if (raidPtr->Disks[hosed_column].status != rf_ds_failed) {
                                         raidPtr->Disks[hosed_column].status
                                                 = rf_ds_failed;
                                         raidPtr->numFailures++;
                                         raidPtr->status = rf_rs_degraded;
                                 }
                         }
 
 Am I reading this wrong, or shouldn't the test in rf_handle_hosed()
 instead be
 
         if (cfgPtr->force)
                 return;
 
 i.e. the sense of the test is wrong?
 
 Regards,
 
 - Havard
 


Home | Main Index | Thread Index | Old Index