NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

wd pb: atactl standby, smart and errors



Hello,

Node: NetBSD 9.99.7/evbarm (earmv7hf, a20-olinuxino-lime2)

disk:
wd0: <Hitachi HDS721010CLA330>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6
	(Ultra/133), WRITE DMA FUA, NCQ (32 tags) w/PRIO
wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6
 	(Ultra/133) (using DMA), NCQ (31 tags) w/PRIO

(In the dmesg above, two last lines, why the "NCQ (32 tags)" in one line,
and "NCQ (31 tags)" in the next?)

It shall be noted that this is an old disk I'm using for tests, so 
hardware problems are not excluded.

Description: I was compiling packages on the node, the bulk of the work
being done on memory (pkgsrc workdir allocated via tmpfs) only the
binary packages being written on the wd disk (pkgsrc is also on disk,
read-only).

Since the disk was scarcely used, I set:

# atactl wd0 setstandby 120

The ARM SoC has a SATA III connector and the disk is thus attached with
a eSATA <-> SATA cable, the disk being physically in an external
enclosure with USB or eSATA connectivity, the eSATA being used.

Since neither the SoC nor the enclosure has fans, I have launched a bulk
building during the night that could annoy nobody. (Just to say that I'm
seeing the whole messages today and did not get the opportunity to see
messages appearing individually to understand if a first error was
caused by the disk sleeping, time for it to wake up or if error messages
were repeated on a sufficiently long period to rule at sleeping diks
problem.)

Reported problems are like this:

[ 51498.493926] wd0c: error reading fsbn 8740546 of 8740546-8740577 (wd0
bn 8740546; cn 8671 tn 2 sn 52), xfer 12f0, retry 0
[ 51498.503930] wd0: (uncorrectable data error)
[ 51502.065095] wd0c: error reading fsbn 8740546 of 8740546-8740577 (wd0
bn 8740546; cn 8671 tn 2 sn 52), xfer 12f0, retry 1
[ 51502.075102] wd0: (uncorrectable data error)

(same until retry 4) then

[ 51516.577931] wd0c: error reading fsbn 8740546 of 8740546-8740577 (wd0
bn 8740546; cn 8671 tn 2 sn 52)
[ 51516.587933] wd0: (uncorrectable data error)
[ 51516.587933] wd0c: error reading fsbn 8740546 of 8740546-8740577 (wd0
bn 8740546; cn 8671 tn 2 sn 52)
[ 51519.718148] wd0c: error reading fsbn 8740546 of 8740546-8740577 (wd0
bn 8740546; cn 8671 tn 2 sn 52), xfer 12f0, retry 0
[ 51519.728148] wd0: (uncorrectable data error)
[ 51522.598338] wd0: soft error (corrected) xfer 12f0

Question: can this be linked with the power status (disk sleeping;
driver retrying until command served?).

I noticed too that if I query the SMART status, the first command failed
with "time out"; a second command can fail with "SMART not enabled"
while a third (sometimes the second depending on the time between the
commands) succeeds. Which I attribute to the standby setting too.

So what are the interactions between (idle, standby, sleep---not
selected since man page says to use it with caution) and
reading/writing? Can the errors be ignored and are just soft errors due
to delay repowering up the disk and are the powering states reliable
datawise?

TIO,
-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                     http://www.kergis.com/
                       http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


Home | Main Index | Thread Index | Old Index