Subject: Re: fsync performance hit on 1.6.1
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.ORG>
From: Christoph Hellwig <email@example.com>
Date: 07/09/2003 17:43:30
On Wed, Jul 09, 2003 at 12:26:03PM -0400, Greg A. Woods wrote:
> > Umm, posix SHM _does_ use mmap. It just uses shm_open to get a suitable
> > fd, on Solaris and Linus that would be on tmpfs.
> Yes, that's my point. :-)
> If you know how IEEE standards committees work and you understand how
> much they (are supposed to) hate inventing new things, the fact that
> they invented shm_open and shm_unlink() suggests that some strong
> member(s) of the comittee were just completely and totally unwilling to
> allow for mmap() to work on all normal files and that the only way they
> would be happy with mmap() becoming the true standard shared memory
> interface was if it was required that the file descriptors it used be
> allocated by some special new function.
Not trying to defend IEEE here, but there is some sense at leat behind
shm_open. Given that for shm your really want an object that's not
backed by permantent storage (= a normal filesystem) you need to know
where to look for a tmpfs-lookalike or, in the case you mentioned above
something outside the normal filesystem namespace (yuck!). As IEEE
isn't into the filesystem namespace business shm_open is an okay wrapper
for leaving this to the implementation.
Why the heck they specified shm_unlink is completly unclear to me,
> I don't know why POSIX doesn't include MAP_ANON either -- that would
> have made things ever so much simpler! The rationale in P1003.1-2001
> claims they decided to use the SysVr4 mmap() implementation as the basis
> of the POSIX API, and indeed SysVr4 lacks MAP_ANON, however MAP_ANON was
> very well known before mmap() was finalized since 4.3net2 was already
> widely disseminated (1003.4 was still in draft at the end of 1991).
Just because it was know that doesn't mean it should be standandardize.
And MAP_ANON really doesn't fit into the SunOS4/SVR4 VM that wants a backing
vnode for each memory object unlike the Mach VM. Thus the horrible
mmap() of /dev/zero hack, btw..