tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]


On Mon, 14 Nov 2011 16:58:02 +0000
David Holland <> wrote:

> On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote:
>  > > I was recently talking to some people who'd been working with some
>  > > (physicists, I think) doing data-intensive simulation of some kind,
>  > > and that reminded me: for various reasons, many people who are doing
>  > > serious data collection or simulation tend to encode vast amounts of
>  > > metadata in the names of their data files. Arguably this is a bad way
>  > > of doing things, but there are reasons for it and not so many clear
>  > > alternatives... anyway, 256 character filenames often aren't enough in
>  > > that context.
>  > 
>  > It's only my opinion, but they really should be using multiple files or
>  > a database for the metadata with as necessary a "link" to an actual
>  > file for data.
> Perhaps, but telling people they should be working a different way
> usually doesn't help. (Have you ever done any stuff like this? Even if
> you have only a few settings and only a couple hundred output files,
> there's still no decent way to arrange it but name the output files
> after the settings.)

I agree that if they already started on the wrong path it's hard to
tell them to change their methods, but it was probably not ideal to
expect that file length was an unlimited resource...

The situations where I had to deal with such were web sites, with media
stored as files and metadata in databases (with file names either being
hashes or a serial number); another instance was in camera security
software saving stills and archiving videos as files, with the
directory and file names being based on a type of time stamp.  Another
case is mmmail where mail is stored in a custom format in files, backed
by a postgresql database.

It works well, but it can be tricky not to leak files (in the case of a
web application using postgresql for instance, delete trigger functions
can be used to insert entries in a table for files to be deleted, with
a scheduled event or daemon cleaning those up).  The few instances
where I've seen leaked files were after abnormal crashes/reboots
though; some recovery/cleanup software is then useful.

I guess that this also gives an answer you expected however: that
it's more complex to DTRT, as user software must create the link
between two loosely coupled systems :)

> Pathname buffers generally shouldn't be (and in NetBSD, aren't) on the
> stack regardless. Even at only 1K each, it's really easy to blow a 4k
> kernel stack with them. (In practice you can generally get away with
> one; but two, like you need for rename, link, symlink, etc. is too
> many.)
> Or I guess you don't mean in the kernel, do you...

Oh, yes I meant userland indeed; as kernel code should minimize stack

Home | Main Index | Thread Index | Old Index