Subject: Re: namei caching of newly created files?
To: None <tech-kern@NetBSD.org>
From: J Chapman Flack <flack@cs.purdue.edu>
List: tech-kern
Date: 01/20/2005 13:38:56
> For some values of "we", yes.
> 
> > Who would ever put 45,000 files into a single directory?  Who would
> > even put 100 files into a directory if getting a directory listing
> > takes two or more screens just to display?
> 
> The only place I've seen it is in dysfunctional corporate
> bureaucracies.  I previously had a job at Interwoven, which sells
> [...] with idiot IT managers who insisted on creating directories with tens
> or hundreds of thousands of files.  These files were (for the most
> part) being manipulated by programs [...]

When I worked in a dysfunctional government bureaucracy, we had a server
responding to requests for docket reports.  The reports were generated
nightly into separate files, and one would only be regenerated by the
server if a quick db query showed activity since the mtime on the file.
So you'd ask for <district>:<year>-<type>-<caseno> and the server (which
was all of about two dozen lines of script) would go stat a file like

  <year>/<type><caseno%10>/<district>/<caseno>.gz

query the db, compare mtime, fork a regenerate if needed, and serve the file.

The component with the caseno%10 was a form of cheap hash so that the
final directory components would not have *thousands* of files, but
they would easily have hundreds.  It was easy to write, easy to understand,
easy to maintain, and performed well (and replaced an older system that
tried to do something more complicated and had none of those attributes).

File-system-as-database designs can be very simple and work very well.
There is flexibility in designing one (as with the %10 in the example)
to accommodate the scaling properties of the file system. We got our
design by looking empirically for the directory size at which the
operations we cared about started getting slow; it would be even better if
the docs for a file system simply stated something about its scaling
properties, and then a programmer could easily choose the right file
system and an appropriate directory layout for an fs-as-db design.

-Chap