Subject: Re: sup hell
To: None <current-users@NetBSD.ORG>
From: Ty Sarna <tsarna@endicor.com>
List: current-users
Date: 11/02/1995 05:37:06
In article <199511012142.IAA01848@zen.void.oz.au>,
Simon J. Gerraty <sjg@zen.void.oz.au> wrote:
> Since most sup server sites seem to use supscan, would it be feasible
> to have supscan produce an MD5 checksum list, which is returned to the
> client and used in retreival decsions.
> 
> You would probably want to still use timestamps as well, the MD5
> checksum is used only to veto an update where timestamp on server is
> newer but files are the same.
> 
> This model would (assuming the MD5 list is accurate) avoid needless
> downloads while maintaining the essential flavour and behavior of SUP.
> 
> If there is enough interest, I can look at making the necessary
> changes - assuming someone else is not already doing so.

If you're willing to do this much work, how about writing something
totally new? sup seems to have so many problems, and it shouldn't be too
hard to do something 1000% better (more reliable, faster, more portable)
to replace it. Some ideas to consider: Keep an MD5 hash for each
directory as well, essentially a hash of the concatenation of the hashes
for each file in that directory (including the hashes for
subdirectories) or something equivalent. That way whole subdirectory
trees can be checked and transfers avoided in one quick step. Have the
client operate in this way. Have it ask for the directory asked for.
Same hash? We're done, else check each file in the directory, and
transfer if it's changed. Then recurse on each directory. Maybe forget
collections and just work on a model of allowing transfers of any file
or directory within a whole tree.

Even better, it would be easy give the a client to output a file listing
each file that needs to be transferred (or deleted), rather than
transferring them. Then it'd be easy for folks with limited disk space
on their internet-connected machines to only keep the hash database
online, and to whip up a script to automate ftping the files a few at
a time, or whatever.

With clever design, the hash database used by the client could be used
by the server, so mirrors wouldn't need to do a scan at all, only the
master server would.

I bet this wouldn't take a whole lot more work than trying to fix sup,
and is probably much less work in the long run.