Subject: Re: Endianness conversion functions
To: None <tech-misc@NetBSD.org>
From: Christian Biere <christianbiere@gmx.de>
List: tech-misc
Date: 01/18/2007 23:47:47
Krister Walfridsson wrote:
> On Thu, 18 Jan 2007, Christian Biere wrote:
 
> >are there any objections against the patch below? These could then
> >completely replace the almost identical code in sys/fs/cd9660/iso.h
> >as well as sys/fs/msdosfs/bpb.h and -DUNALIGNED_ACCESS could be removed
> >from the relevant Makefiles.
 
> This breaks the C aliasing rules [*],

I know that's not C but since there was prior art and due to the
alignment check, I assumed it would be fine for this "implementation".
Also shouldn't the "void *" invalidate any such aliasing assumptions?

> so I would prefer to instead
> change sys/fs/cd9660/iso.h and sys/fs/msdosfs/bpb.h to use a correct
> implementation (using e.g. memcpy, which for sufficiently new gcc
> should generate as efficient code but without the potential lossage
> arising from aliasing issues.)

Yes, it should be either all or nothing.

> Or use the current sys/sys/endian.h
> implementation.  Does it really give a measurable slow down in real
> life?

Actually the classic shift/or solution gets slower on misaligned data whereas
the hack has a constant speed at least during my short rudimentary tests.
However, speed wasn't my concern but rather size.  At least he latter was the
reason for using this in msdos fs code, as far as I know, as the kernel became
to large for floppies.

Anyway, actually it looks that GCC really replaces memcpy() with a direct
assignment.

	uint32_t x;
	memcpy(&x, buf, 4);
	return le32toh(x);

Even with -O0 there's no call to memcpy() only to this function. With -O1 the
code is always inlined even if inline isn't specified. The applies to be32toh()
except for additional byteswap code. The shift/or version is always slower at
each optimization level. So it seems the memcpy() version should always be used
- unless this doesn't apply to other archs.

It's a bit sad that GCC doesn't recognize the shift/or construct though because
I think it's cleanest version - considering that memcpy() might cause a huge
penalty.

-- 
Christian