Subject: PPP crashes
To: None <>
From: Michael L. VanLoon -- Iowa State University <>
List: current-users
Date: 10/24/1994 22:47:41
Arrgh, this is really starting to piss me off, and I'm hoping you guys
can give me a clue how to track this thing down, and/or an idea of
what might be causing it.

Background: Zenith 386/25 running NetBSD-current as a SLIP/PPP/ethernet
router.  With current binaries from around April, the machine routing
SLIP packets from one SLIP interface and one ethernet interface, with
no PPP active, had an uptime of 73 days.

I started using the machine in mid-July.  We updated the binaries back
up to current in mid-July, and again at the beginning of August.  I
started running a PPP interface on the machine at that time in
addition to the continuation of the SLIP and ethernet interface
already active.  In mid-July, I couldn't even run a sup on the machine
through either the SLIP or PPP interface without crashing it before
the sup finished.  A mbuf (or something) bug-fix around the end of
July fixed that.

However, ever since then, the machine would still crash roughly once a
week.  Well, it doesn't generally crash, which is annoying as hell, it
just hangs hard with a bunch of vm_fault things scrolling off the
screen, so I have to drive to campus and reboot it -- at least if it
crashed, it would reboot itself.  Anyway, no matter how many hundreds
of kernels I've build, with any mix of options, it still managed to
crash about once a week with one PPP interface, one SLIP interface,
and one ethernet interface.

The last couple weeks, however, explorer has moved his SLIP line over
to PPP, and we've added a third PPP test line.  Now the machine
hangs every day to two days.  The few times I've run a kernel with DDB
in it and actually dumped a stack trace, it seems to die in a
different place each time, but all seem to be directly connected to
pppstart().  All look like something trying to reference a page that
doesn't exist or something.  Occasionally I've seen references to
multiple frees also.

Another strange data point is that my home machine never crashes like
this.  But 1) my home machine doesn't route any packets -- it's simply
an endpoint, and 2) there's about twice as much traffic going thru the
office machine (two PPP machines feeding off it, as a router thru the
ethernet to the Internet).  I'm pretty sure that the hardware is not
at fault since it had a 73-day uptime immediately prior to our upgrade
and my staring to use PPP on it.  I admit the hardware could have
suddenly gone bad at the same time, but I find it highly unlikely.

At this point, the machine is almost unusable for PPP.  I can't go
SLIP because I use a V.Fast modem, and the phone lines suck in Ames,
and I get hung up at least once a day just from bad lines.  I need the
things that pppd provides to make it easy to have auto-reconnect
scripts -- SLIP just won't cut it.  We NEED to get this PPP bug fixed.

Now, I've build a kernel with the ``config -g'' options, and have the
small one running on the PPP server.  But I have no real idea how to
actually track this down, or even what to look for if I catch the
machine dead again at the debugger in the office -- I'm not real
proficient at the innards of the PPP code, or even the networking code
in general.  HELP!

I've even considered running FreeBSD for the router in hopes that it
would at least be stable running PPP, but I don't really consider that
to be the preferred option at this point.

(I'm not bitching at you guys; I'm just pissed at this damn machine.)

I really neeed a hand with this, because something is definitely not
right in the PPP code, and it's making it really worthless for us at
the moment, plus frustrating the hell out of me (can you tell? ;-).

Thanks for any advice/helping hands you can lend.

 Michael L. VanLoon                 Iowa State University Computation Center                    Project Vincent Systems Staff
  Free your mind and your machine -- NetBSD free Un*x for PC/Mac/Amiga/etc.