Subject: Re: Anyone working on ATA over Ethernet?
To: Jason Thorpe <thorpej@shagadelic.org>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 02/15/2005 15:44:24
--o+ZCuNqY+dEAKBWl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Feb 14, 2005 at 07:16:45PM -0800, Jason Thorpe wrote:
>=20
> On Feb 14, 2005, at 5:18 PM, Daniel Carosone wrote:
>=20
> I personally see iSCSI's niche as being "really cheap, moderately=20
> performing, expandable bulk storage".  Data archiving, etc.=20

Sure, except the 'really cheap' aspect is yet to realise, for reasons
we've both outlined; even 'cheap' is a stretch right now.  This
lightweight protocol seems like it might allow really cheap big dumb
disk, as part of the larger solution.

> >The RAID controllers are getting the smarts, and are learning to speak
> >iSCSI for such purposes as cross-site replication.  Cluster
> >filesystems on hosts, slowly, too.
>=20
> Cluster file systems on hosts don't need to speak iSCSI.  To any=20
> clustered file system, iSCSI looks *exactly* like FC, except for the=20
> use of IQNs rather than WWNs for LU naming.=20

Agreed, from the point of view of implementing such a filesystem.  On
a vendor's sales quote, these technical layerings aren't so apparent.
If nothing else, vendors at least need to certify supported
combinations, and some vendors have their own interests to push here,
alas.

> As for cross-site replication, I don't see iSCSI as being
> particularly valuable for that. From my perspective, this is best
> done at the FILE layer, *not* the block layer.

And in many ways I agree, but "best" means many things to many people.
Storage vendors are selling (to big customers) rather a lot of 'remote
mirrored disk', and it is conceptually simple (thus easier to sell to
management) and has the added advantage of being platform and software
agnostic.  You can almost certainly get better results, by other
metrics, with smarter software - but you need the software.  Plenty of
stuff, including legacy stuff, can be made to do site failover and
crash recovery on what looks like dumb disk with magical mirroring.
Management get to divide up their IT department's responsibilities and
skills into clearer layers, too (for better or worse).

But then again, if you're doing it at the file layer, you're doing it
in something with a filesystem (host, or NAS-to-SAN type box), and
that host wants wide-area connectivity to the geographically separated
storage pools... which sounds like it suits iSCSI nicely.

> Anything-over-Ethernet has to account for the fact that frames can be=20
> delivered out-of-order; Ethernet provides no ordering guarantees. =20
> Eventually, you have to solve basically all the same problems that=20
> iSCSI had to solve (PDUs can be delivered out-of-order if you have=20
> multiple TCP connections in your iSCSI sessions).=20

Not necessarily..  that is, at least, you don't necessarily have to
solve it *in the protocol*.

If I have a smart RAID controller that presents an FC or iSCSI host
interface (with known ordering), and an NV write cache, I don't
necessarily need to preserve any ordering to private ata-over-ethernet
disks behind the controller. Ordering only matters as far as the NV
cache, since that's what's visible to the outside. The spec includes a
tag with each request that is echoed in the reply, that really should
be all the controller needs to check off write-back requests out of
the NV cache. If the ethernet and disks are private, the raid
controllers can arbitrate with eachother for reservations and other
things that complicate iSCSI or FC, and the disks don't need to care
which controller issued which command, other than to reply.  (All this
presumes that each ata operation fits in one frame, of course).

Having one (or two) implementations of these smarts in the raid
controller(s) clearly has potential to offer significant cost savings
over having smarts (and protocol overhead) in every disk or shelf -
where they're not really needed if you already have the SAN raid
controller in front anyway.

--
Dan.
--o+ZCuNqY+dEAKBWl
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (NetBSD)

iD8DBQFCEX4nEAVxvV4N66cRAhDxAKCweNmBv8diquS/Nvf1Rb4qKQ3mcwCeOgDj
KFHnIsGsOzgTud3NB4vHUhA=
=qApr
-----END PGP SIGNATURE-----

--o+ZCuNqY+dEAKBWl--