Subject: Re: NetBSD iSCSI HOWTOs
To: Miles Nordin <carton@Ivy.NET>
From: Bill Studenmund <wrstuden@netbsd.org>
List: current-users
Date: 02/27/2006 20:32:30
--KuLpqunXa7jZSBt+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 24, 2006 at 07:27:10PM -0500, Miles Nordin wrote:
> >>>>> "ajg" =3D=3D Aaron J Grier <agrier@poofygoof.com> writes:
>=20
>    ajg> I'm thinking a RAID of iSCSI targets would be interesting.
>=20
> sorry to be a wet blanket, but I think this could be dangerous because
> although RAID is ``redundant,'' people seem to confuse its ability to
> route around failures with something like TCP.  It's not like the bad
> drive can just do whatever it wants, and RAID will make up for any
> problems.  The target needs to unambiguously either complete an entire
> request, or return a clear error code to RAID indicating that it has
> failed.  For example, RAID can do nothing to help the common behavior
> of IDE disks just silently corrupting data.  I would expect a target
> underneath RAID (or even softdep) needs to support some kind of write
> barrier as well.
>=20
> I doubt the current target, since it's operating entirely in
> userspace, will ever be able to pass through and convert the hard- and
> soft-error reports from disks, nor any of that convoluted TCQ and
> write barrier stuff.  Like, I read on some web page (sdparm maybe?)

I believe you would be incorrect.

If the file system reports an error to the target, it certainly could pass=
=20
that on to the initiator.

TCQ is a pain to implement correctly, but the main thing it's needed for=20
(in disks) is to permit multiple i/os to be active at the disks at once.=20
Performance will suck otherwise.

The trick with write barriers is that you just can't lie to the initiator
(or you shouldn't). As long as the mode pages don't lie, you are fine. =20
You also might want to read the man page for fsync_range() in detail; the
pieces are there (or should be, a driver or two may need to learn to
support DIOCCACHESYNC).

If an o/s wants semantics that the mode pages don't advertize and doesn't=
=20
adapt, it's no the target's problem.

> that FCAL-to-SATA bridges will rebatch requests and redo TCQ
> themselves, because Fibre Channel's TCQ API handles multiple
> initiators while SATA's NCQ does not.  I dunno everything that's going
> on here, but am just pointing out it's way more complciated than
> implementing a few logon state machines and just flipping buffers
> between network and disk.  so, just based on web pages that I've
> understood only poorly so far:

Please keep learning. :-)

> feature              |  needed for
> ---------------------+-----------------------------
> TCQ/NCQ translation  | performance

SANs need TCQ. You get CRAPPY performance on a SAN if you only have one=20
i/o outstanding. SANs have much more latency than do local disks, and if=20
you serialize ops, you feel that delay. If you have multiple commands in-=
=20
flight at once, you get great performance.

> ---------------------+-----------------------------
> write barriers       | correctness: avoiding pathological filesystem
>                      | corruption
> ---------------------+-----------------------------
> SYNC command         | correctness: SMTP MTAs and databases
> ---------------------+-----------------------------
> pass through of      | correctness: needed for safe RAID
> ``mode sense'' or    |
> whatever you call    |
> those read-error     |
> messages             |

Take care,

Bill

--KuLpqunXa7jZSBt+
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)

iD8DBQFEA9JeWz+3JHUci9cRAoarAJ9jvne6Cc3tXrmFFgOMSJNE2g0+GwCcC+UJ
HNvvTewl5qevgtyxXolPEjI=
=MTWB
-----END PGP SIGNATURE-----

--KuLpqunXa7jZSBt+--