current-users: Re: NetBSD iSCSI HOWTOs

Subject: Re: NetBSD iSCSI HOWTOs
To: None <current-users@netbsd.org>
From: Miles Nordin <carton@Ivy.NET>
List: current-users
Date: 02/24/2006 19:27:10
--pgp-sign-Multipart_Fri_Feb_24_19:27:10_2006-1
Content-Type: text/plain; charset=US-ASCII

>>>>> "ajg" == Aaron J Grier <agrier@poofygoof.com> writes:

   ajg> I'm thinking a RAID of iSCSI targets would be interesting.

sorry to be a wet blanket, but I think this could be dangerous because
although RAID is ``redundant,'' people seem to confuse its ability to
route around failures with something like TCP.  It's not like the bad
drive can just do whatever it wants, and RAID will make up for any
problems.  The target needs to unambiguously either complete an entire
request, or return a clear error code to RAID indicating that it has
failed.  For example, RAID can do nothing to help the common behavior
of IDE disks just silently corrupting data.  I would expect a target
underneath RAID (or even softdep) needs to support some kind of write
barrier as well.

I doubt the current target, since it's operating entirely in
userspace, will ever be able to pass through and convert the hard- and
soft-error reports from disks, nor any of that convoluted TCQ and
write barrier stuff.  Like, I read on some web page (sdparm maybe?)
that FCAL-to-SATA bridges will rebatch requests and redo TCQ
themselves, because Fibre Channel's TCQ API handles multiple
initiators while SATA's NCQ does not.  I dunno everything that's going
on here, but am just pointing out it's way more complciated than
implementing a few logon state machines and just flipping buffers
between network and disk.  so, just based on web pages that I've
understood only poorly so far:

feature              |  needed for
---------------------+-----------------------------
TCQ/NCQ translation  | performance
---------------------+-----------------------------
write barriers       | correctness: avoiding pathological filesystem
                     | corruption
---------------------+-----------------------------
SYNC command         | correctness: SMTP MTAs and databases
---------------------+-----------------------------
pass through of      | correctness: needed for safe RAID
``mode sense'' or    |
whatever you call    |
those read-error     |
messages             |


If you don't do this right, can't you go from an un-RAIDed filesystem
with softdep or a journal that you can relatively safely pull the plug
at any time, to a RAIDed filesystem spread over many boxes with iSCSI
where a power loss or maybe even just some transitory network outage
that exceeds some arcane timeout value could cause massive metadata
corruption, breaking softdep guarantees and losing ext2fs-ly collossal
amounts of data?  AFAICT, using RAID can actually make your data
_significantly_ less safe, and this idea seems to agree with the
anecdotes many people tell about massive, mysterious array meltdowns
with cheap RAID controllers and inexperienced sysadmins.  I think
there are a lot of RAID users out there who aren't familiar with
buzzwords like ``The RAID5 Write Hole'' and think RAID will just make
all their data perfectly safe through ``N+1 Redundancy'' or some other
bulleted feechur marketing hogcrap.

but yeah, something like an iSCSI target is a huge missing tool for
anyone who wants to make a single gigantic filesystem that's too large
to fit in a single PeeCee case, and do it cheaply.  All the SAN,
fibrechannel, SATAraid, whatever stuff I've stumbled into so far ends
up costing more per GB than the disks themselves, and it does suck
loading up PeeCee towers with eight drives a piece and using plain old
NFS.

--pgp-sign-Multipart_Fri_Feb_24_19:27:10_2006-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (NetBSD)

iQCVAwUAQ/+kXonCBbTaW/4dAQL1hwP9EeK8aszVFXlmd1P6OCbr1MnS3UxFkEVS
H5x3a2rprj26BZGxhYbOiW1lZGo40cFhvoYaoCsi9gTBqb0PgbfFZ+rymGcuo6/8
1RJ0F1qPVLLJxNTlanexkF2OXo0u2S9Y3lqxTPsLr3Am4wXqp2YkF4LqLGUCWGVw
hKIGaEU3zzo=
=sTla
-----END PGP SIGNATURE-----

--pgp-sign-Multipart_Fri_Feb_24_19:27:10_2006-1--