tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: write alignment matters?

On Jun 23, 2011, at 7:43 34PM, Johnny Billquist wrote:

> On 2011-06-23 23:05, Steven Bellovin wrote:
>> On Jun 23, 2011, at 4:36 25AM, Robert Elz wrote:
>>>    Date:        Wed, 22 Jun 2011 19:30:55 -0400 (EDT)
>>>    From:        der Mouse<mouse%Rodents-Montreal.ORG@localhost>
>>>    Message-ID:<201106222330.TAA28359%Sparkle.Rodents-Montreal.ORG@localhost>
>>>  | But the interface is much older than that, and, even if it's not
>>>  | codified, there's a lot of history behind the notion that userland
>>>  | alignment of write() buffers affects, at most, performance, to the
>>>  | point where I consider it part of the interface.
>>> Not on access to raw devices it isn't, and never was - what Erik Fair
>>> said 
>>> (Message-id:<>)
>>> was 100% correct - if you're using a raw device, it is up to the
>>> application to meet whatever the requirements of that particular device
>>> are, because one of the properties of raw devices is that they don't
>>> do any kind of rebuffering of data (and the driver must not - that is
>>> a part of the interface contract).
>>> What the rules are vary from device to device, if you don't like this,
>>> don't use raw devices.   If you want to function on a large subset
>>> (possibly all) raw devices you need to make your code extremely
>>> pessimistic about what it can do (align to 4K or so boundary, use
>>> sizes a multiple of 512, and no bigger than 64KB).
>> For fun, I looked at the (online) man pages from 6th Edition Unix,
>> which is circa 1976.  Without exception, the raw disk (hp, hs, rf, rk,
>> rp), and tape devices (tm only; raw I/O didn't work on ht) required
>> buffers to be on word boundaries; for the former, the count had to be
>> a multiple of 512 bytes, and for the tm tape driver the count had to
>> be even.  (See 
>> In other words, Erik is right, at least if we're talking historically.
>> Of course, at least there it's documented.  (I took a quick glance
>> at the code, too -- it did appear to check for erroneous parameters,
>> though I think it just truncated the count in some drivers.)
> That's because of a hardware limitation of many controllers on a PDP-11. They 
> can only start DMA to even addresses. And that in turn is partly because of 
> the whole design of the PDP-11 itself. It really is, in many ways, a word 
> addressable machine, and only partially 8-bit byte oriented.

Yup.  That even shows in the manuals.
> So I'm not sure how relevant that is for a general case. It would seem to be 
> very architecture specific.

The point is that when dealing with raw devices, you take what the hardware
gives you.  6th Edition could have detected this and copied the user data
into a properly-aligned buffer, with the corresponding performance hit.
Instead, it said "this is the way the hardware works; adapt".  Given that
der Mouse's problem may also be related to hardware limits, it's quite
directly relevant: he hasn't matched what the hardware limits are.  Of 
course (and as I noted), in 6th Edition the limitation was documented,
which is not the case here.  If we are indeed dealing with a hardware
issue (the jury still seems to be out on that question), then the PR
should be a documentation issue, not a kernel issue.

                --Steve Bellovin,

Home | Main Index | Thread Index | Old Index