tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]


On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote:
 > > I was recently talking to some people who'd been working with some
 > > (physicists, I think) doing data-intensive simulation of some kind,
 > > and that reminded me: for various reasons, many people who are doing
 > > serious data collection or simulation tend to encode vast amounts of
 > > metadata in the names of their data files. Arguably this is a bad way
 > > of doing things, but there are reasons for it and not so many clear
 > > alternatives... anyway, 256 character filenames often aren't enough in
 > > that context.
 > It's only my opinion, but they really should be using multiple files or
 > a database for the metadata with as necessary a "link" to an actual
 > file for data.

Perhaps, but telling people they should be working a different way
usually doesn't help. (Have you ever done any stuff like this? Even if
you have only a few settings and only a couple hundred output files,
there's still no decent way to arrange it but name the output files
after the settings.)

 > > (This sort of usage also often involves things like 50,000 files in
 > > one directory, so the columnizing behavior of ls is far from the top
 > > of the list of relevant issues.)
 > This reminds me, does anyone know about the current state of
 > UFS_DIRHASH?  I remember reading about some issues with it and ending up
 > disabling it on my kernels, yet huge directories can occur in a number
 > of scenarios (probably a more pressing issue than extending file names,
 > actually)...

I don't know. At best it's not really a complete solution, anyway...

 > > Well... yes but there are other considerations. As you noted, going
 > > past one physical sector is problematic; going past one filesystem
 > > block very problematic. Plus, as long as MMU pages remain 4K,
 > > allocating contiguous kernel virtual space for path buffers (since if
 > > NAME_MAX were raised to 64K, PATH_MAX would have to be at least that
 > > large) could start to be a problem.
 > I agree, especially with all the software that allocates path/file name
 > buffers on the stack (but even on the heap it could be a general memory
 > waste with 64KB, other than the memory management performance issues).

Pathname buffers generally shouldn't be (and in NetBSD, aren't) on the
stack regardless. Even at only 1K each, it's really easy to blow a 4k
kernel stack with them. (In practice you can generally get away with
one; but two, like you need for rename, link, symlink, etc. is too

Or I guess you don't mean in the kernel, do you...

David A. Holland

Home | Main Index | Thread Index | Old Index