NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: bin/53956: raidframe fails to create raid set



>> | > # raidctl -C /tmp/raid0.conf raid0
>> | > Hosed component: /dev/wd1e
>> | > Hosed component: /dev/wd0e
>> | > raid0: Ignoring /dev/wd0e
>> | [snip]
>> | > raid0: Components: /dev/wd0e[**FAILED**] /dev/wd1e[**FAILED**]
>>
>> Can you dd some zeros in wd0e and wd1e and see if that changes the
>> situation?
>
> can you save the current contents as first!  so we can debug
> the problem i this turns out to be the cause..

Hmm...  The raid has now been created using the 7.2_STABLE kernel and
tools, and is presently undergoing initialization ("raidctl -i").

I suspect therefore that the initial content has been overwritten.

However, the point of the "-C" option to raidctl is that no matter the
contents on the drives (zeroed or "random"), any errors related to the
contents on the drives must be ignored, and the raid should be
created, as long as the components can be physically read or written
(I proved they could be read, and 7.2_STABLE certainly can write them,
so I don't see why 8.0 should fail to do so).

As you saw from the 7.2_STABLE transcript, "raidctl -C" spewed
basically the same sort of errors as on 8.0 ("hosed component", "row
out of alignment", "coloumn out of alignment" etc.), although some
error messages didn't come on 7.2_STABLE ("ignoring <component>",
"failed to create a dag. Too many component failures."), but on
7.2_STABLE those errors it did spew were ignored, while that didn't
happen on 8.0.  Therefore, I lean towards Greg's explanation, that the
"ignore errors" flag has somehow become unset in 8.0 even though "-C"
is specified.

Hmm, this function was introduced in netbsd-8:

static void
rf_handle_hosed(RF_Raid_t *raidPtr, RF_Config_t *cfgPtr, int hosed_column,
    int again)
{
        printf("Hosed component: %s\n", &cfgPtr->devnames[0][hosed_column][0]);
        if (!cfgPtr->force)
                return;

        /* we'll fail this component, as if there are
           other major errors, we aren't forcing things
           and we'll abort the config anyways */
        if (again && raidPtr->Disks[hosed_column].status == rf_ds_failed)
                return;

        raidPtr->Disks[hosed_column].status = rf_ds_failed;
        raidPtr->numFailures++;
        raidPtr->status = rf_rs_degraded;
}

Compare this to the open-coded version in netbsd-7:

                        printf("Hosed component: %s\n",
                               &cfgPtr->devnames[0][hosed_column][0]);
                        if (!force) {
                                /* we'll fail this component, as if there are
                                   other major errors, we arn't forcing things
                                   and we'll abort the config anyways */
                                if (raidPtr->Disks[hosed_column].status != rf_ds_failed) {
                                        raidPtr->Disks[hosed_column].status
                                                = rf_ds_failed;
                                        raidPtr->numFailures++;
                                        raidPtr->status = rf_rs_degraded;
                                }
                        }

Am I reading this wrong, or shouldn't the test in rf_handle_hosed()
instead be

        if (cfgPtr->force)
                return;

i.e. the sense of the test is wrong?

Regards,

- Havard


Home | Main Index | Thread Index | Old Index