tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: write alignment matters?



    Date:        Fri, 24 Jun 2011 20:26:19 -0400 (EDT)
    From:        der Mouse <mouse%Rodents-Montreal.ORG@localhost>
    Message-ID:  <201106250026.UAA12885%Sparkle.Rodents-Montreal.ORG@localhost>

  | This would mean that raw devices as interfaces to disks are essentially
  | useless.

Not at all, as history has proven, as that's what the rule has always been.

  | It becomes impossible to write _anything_ that works on "raw
  | disks", because you don't know what restrictions might be demanded by
  | the next disk device to come along.

I suppose that's true in theory, just as we don't know that the next
such device might not have 739 byte sectors, or rotate backwards, or ...

However, in the real world, the manufacturers don't make products that
they can't sell, and people don't buy products that don't work (or not
very many of them), so anything that you're (likely to) buy is almost
certainly going to be reasonably compatible with what has gone before,
if perhaps not as efficient that way (which is why all the modern bigger
sector drives still pretend to have 512 byte sectors of course).

  | Indeed, it's entirely possible that two devices might make mutually
  | incompatible demands, making it impossible to support both of them
  | with the same code.

While that's entirely possible in theory (and extremely unlikely in
practice), the conclusion is still wrong, it just means smarter code
would be needed if that ever really did happen - the code would need
to determine the requirements and adapt to the particular device it
was being used on now, more work certainly, but definitely not impossible
(as an example, just consider programs that write to terminals, and
imagine your words applying if the manufacturers of terminals happened
to make them incompatible with each other in some way .... it would be
impossible to have a single editor (or other terminal accessing program)
that worked on all those different terminals, wouldn't it?)

  | And, for the specific case that started this off, that's not what's
  | going on; it does write 64K of the 4M, so it clearly doesn't mind the
  | alignment.

That may be, but for raw access, it isn't just the alignment, it is the
size too, the device (&/or driver) gets to place restrictions on that
too - both as to the absolute size permitted, and to the unit size (some
integer multiple of which is required in all accesses).

Whether your problem was caused by any of this, or whether it was
just a bug somewhere, I don't have the information to judge of course.

And while I'm here ...

mouse%Rodents-Montreal.ORG@localhost said:
  | > Yes, so it keeps being said.  It would truly be a pity to see that
  | > happen.

  | Why? 

David Holland answered most of that already - but for me, this one is because
that's exactly what you want, access to a device in a way that makes
the device act like any other file.   That's the interface that block
devices present, access at random offsets, random lengths, ... all the
kinds of things that you could do to a file, but applied to the device
directly with no filesystem, remapping.   I really would not like to
lose that ability, and you seem to not want to lose it either (but you seem
to want to achieve that by forcing the raw access mechanism to make it
work.)

Thor Lancelot Simon <tls%panix.com@localhost> said:
  | At least for NetBSD, that's never been true.  The most glaring problem is
  | that there's no protection against causing the same underlying disk blocks
  | to be multiply cached by accessing the buffer cache with a different stride.
  | And no way to keep those multiple cachings coherent... 

While there's no enforcement, the basic rule has always been "one at a time",
the same problem applies to raw devices if you have multiple processes
accessing the things without coordination.   While this one is worse,
because of the caching delays and unpredictable write order, it all just
goes away if we prohibit multiple access, doesn't it?   That is, open,
i/o from the process that opens, and close (with accompanying sync and
buffer invalidation).   Next open it all starts again, and the new process
isn't bothered by the previous one.

However, I don't expect that this is the only problem that exists.

tls%panix.com@localhost continued:
  | I don't know (I guess I could look) whether the original Unix code had that
  | problem but the replacement code in NetBSD had them from the very
  | beginnings.

The original unix buffer cache had fixed size 512 byte buffers, and nothing
else, so this was never an issue there.   That changes when the FFS appeared
in 4.2bsd (or 4.1c or whenever it first escaped) requiring handling of
filesystems with different block sizes.   The filesystem (as you say) picks
one block size for each filesystem, and sticks to that, so has no problem,
if we simply made block access to a device containing a filesystem adopt
the same block size, and allow nothing else (and install an unalterable
default for block devices without filesystems) then the problem you mention
would just vanish, wouldn't it?   We don't need to abandon block devices
to avoid it, or we shouldn't.

mmap(), and UBC made things more complicated, one more time.

David Holland <dholland-tech%NetBSD.org@localhost> said:
  | There's also a problem that the buffer cache code wasn't ever designed to
  | cope with removable devices, so bad things happen if you try to use the
  | block device for something that isn't there (e.g. a floppy drive with no
  | media in it) or that you eject before writeback has completed.

Oh you young'uns ...  believe it or not, removable media was just about
all that existed when the buffer cache code was designed.   Non-removable
discs didn't appear until we started seeing winchester class drives, which
was long after unix was invented (sometime during the 80's...)   Before
then, the only non-removable mass storage that existed (that I recall anyway)
were drums, and while there's no reason unix wouldn't have been able to
use such things (and Steve Bellovin will probably tell us that at the Labs
they actually did), they were incredibly rare (and expensive, and low
capacity - like 128KB would have been typical).

Normal unix used rp06's, rk07's, rm03's, rm05's, etc - all removable.
What didn't exist at the time (rather than removable media) was the
expectation that it should be possible to remove the media without bothering
to inform the OS that was about to happen, and give the OS type to update the
media properly before releasing it.   It took the invention of that paragon of
OS's - DOS - to teach the populace that simply pulling the device/media was
an acceptable operating procedure.

The one difference we have today, is that the controllers have moved
out to accompany the media, rather than remaining in the system, and we
haven't done a very good job (yet) of adapting to that change of
methodology - but that's just a SMOP (or not-so-S perhaps) not anything
that's a fundamental paradigm shift.

A device with no media isn't a problem, never was, accessing that is
simply an error, today as it always was.  You only really notice it with
floppies because that horrible interface made the only way to detect the
absence of media was by actually trying I/O, failing, repeatedly, and
then eventually concluding the disc must be absent.  On anything sane,
the controller just says "not there" and the error is immediate.

David Laight <david%l8s.co.uk@localhost> said:
  | Apparantly it was useful to put a bad block at the end of every tape
  | track so that otherwise sequential data wouldn't cross track boundaries.

I don't recall ever doing that - DECtape capacity was so small that
artificially reducing it doesn't seem like a good idea to me.

  | IIRC I have seen DECtape in action - on a pdp8. 

My fingers used to remember the key change sequence to boot a pdp8 from
DECtape (the optimal way to move the data entry keys to enter the
boot sequence with as little pain as possible) - fortunately, that's now
so long ago that I now remember no more than that I used to remember!

kre




Home | Main Index | Thread Index | Old Index