tech-userlevel: Re: calendar vs. NIS

Subject: Re: calendar vs. NIS
To: tech-userlevel <tech-userlevel@netbsd.org>
From: Joerg Klemenz <joerg@gmx.net>
List: tech-userlevel
Date: 11/07/2002 15:43:59

Matthias Buelow wrote:
> 
> Just split it into two loops: one to gather UIDs; then sort the UIDs
> thus found.  Then make a second go over the new list, doing the
> calendar stuff on each UID.  You can either remove duplicates from
> the sorted list before going throug it for the calendar processing,
> or, since it's sorted, you can check if the next entry is the same
> UID as the current one, and just skip it.

This sounds like massive overkill to me. The vast majority of systems
have an already semi-sorted passwd file with <100 users. Calendar is
not a high-end database application. 

I just looked at the code a little: it has three gotos and never
closes its temp file, but throws away the descriptor and unlinks the
pathname (with UID 0).

Also an
#include </dev/zero>

in the calendar file will grow the memory usage of calendar forever. I
just tried it out on a simulated machine (what else can you do with
this weather): the process gets killed in seconds when the swap runs
out. A malicous user can kill the calendar -a process. CPP should
_really_ check its input files!

> This entirely removes the highly suboptimal duplicates checking from
> the passwd-walking loop.  In contrast to your linear-list 
> duplicates-finding solution, which runs on O(n^2), this method would
> be in O(n log n), which should be optimal for that application.

The reverse sorted list I showed in the other post should attack that.
It has almost no performance drop for normal users while it fixes NIS
without beeing overly complex. Exponential runtime is not a problem if
the worst case scenario (100000+ users on NIS) can be handled in
seconds or minutes.
The real bottleneck is NIS itself followed by the getpw* routines. The
runtime of the sorting list routine probably makes little difference.

	joerg  <joerg@gmx.net>