Subject: Re: nore on disk stats
To: None <dennis@Ipsilon.COM>
From: Charles Hannum <Charles-Hannum@deshaw.com>
List: tech-kern
Date: 11/16/1995 21:28:48
   (2) You cite "performance reasons" as making this "impractical" (in all
       situations?), and give network I/O as an example.  Yet I'd point
       out that doing a full read of the routing table, and interface
       list, already delays all network I/O through the kernel (see
       net/rtsock.c, in sysctl_rtable()), and I've rewritten code for real
       life routers which used that system call when the size of that table
       was about 80% of the entire kernel.

In a multitasking system, this is a different issue.  What happens if
some user decides to read the routing table 1000000 times?  They could
seriously hose network performance.  Note that I'm *not* arguing for
the current implementation; it's definitely suboptimal.

   > 2) Neither the current tools nor any competing proposal guarantees
   > atomicity.

   See above.  Reading the routing table and the interface list is already
   atomic.

No, it's not.  There are cases (mainly paging) where a process will be
preempted while sysctl_rtable() is reading the table, and the table
will change behind its back.  (In fact, it looks like it may be
possible for this to cause a stray pointer reference.  People forget
about paging frequently.)  Note that for the case of paging over NFS,
having a lock on the routing table at this point could cause you to
deadlock.

   >   I can think of some examples of all of these.  For netstat(1), or some
   >   other interested piece of software, to read the kernel routing table
   >   now requires about 3 system calls: a sysctl(2) to find out the size of
   >   the thing, a call to sbrk()/mmap() to acquire the (possibly very large
   >   chunk of) memory, and another call to sysctl(2) to fetch an atomic snapshot
   >   of the table.
   >
   > First of all, this is an oversimplification.  There are actually a few
   > system calls done per route. as you can see just by looking at the
   > function p_rtentry() or ktrace output.

   Hardly.  In fact I don't care if netstat(1) makes mistakes, I'll just
   run it again.  I care a lot if my routing protocol implementation (i.e.
   "other interested piece of software") makes mistakes, however, and for
   that the process described is no over-simplification at all.

It looks like I was mistaken about how the routing socket works; I
thought you needed to chase the socket address pointers explicitly.
However, the above still applies.

   > Secondly, even if the above weren't the case,, the snapshot is not
   > atomic.  There is no lock on the tables, and no lock on the memory
   > they're being copied to, and network interrupts are not blocked while
   > the table is copied.  As I said above, this would be undesirable for
   > performance reasons.

   Wrong (are we looking at the same kernel?).  The snapshot produced by
   sysctl(2) is certainly atomic.  Network hardware interrupts aren't blocked,
   of course, but the whole operation is run at splsoftnet() which effectively
   blocks changes (net/rtsock.c, in sysctl_rtable()).  And I directly disagree
   with your opinion of what is "undesirable for performance reasons" based
   on experience with using this stuff in hard situations.

See above.

   Doing atomic reads of some tables is still necessary.

Okay; suppose I'm convinced of this.  What would you propose doing,
that allows reading the tables from crash dumps and across the network
(both of which are important for debugging)?