current-users: SLIP/PPP crashes

Subject: SLIP/PPP crashes
To: None <current-users@sun-lamp.cs.berkeley.edu>
From: Michael L. VanLoon -- Iowa State University <michaelv@iastate.edu>
List: current-users
Date: 07/27/1994 15:02:45
My co-worker and I share a Zenith 386/25 in our office that we use for
a SLIP/PPP server.  He has an internal 14.4k modem with a built-in
16550 in it, and I have a 16550 board in the machine connected to my
28.8k external modem.  The machine had like 73 days uptime when he was
using it alone until recently, and was running NetBSD-"current" from
the beginning of the year.  Once we added my serial board to the
machine, we had to build a new kernel, so we upgraded it to current as
of a few weeks ago.

It seems to work just fine for us when traffic is light.  However, it
seems like every time I try to sup, I hang the machine hard.  I don't
know why, but supping current from sun-lamp seems to be what breaks
it.  Maybe because it's just non-stop heavy TCP traffic in both
directions.  Anyway, normal traffic doesn't seem to do it, but it
seems to happen every single time I try to sup from sun-lamp.

He built the kernel that we tried first, and this stuff was happening.
So, I build a kernel on my machine, from the source tarballs of the
17th.  The exact same thing happens from the kernel I build.  My
machine at home doesn't crash, but the machine routing the traffic in
our office does.  It doesn't matter whether I'm using SLIP or PPP, it
happens either way, indicating to me it's not actually in the SLIP/PPP
code, but is somewhere else.

We don't have DDB built into the kernel there, because we wanted the
machine to just reboot if it ever crashed.  Unfortunately, the nature
of these crashes just causes it to hang solid, so I have to go drive
to campus to fix it. :-(  Here's what I see on the display when it's
hung...

With the kernel from the beginning of the month built on explorer's
box:

vm_fault(f818d90c, 0, 1, 0) -> 5
fatal page fault in supervisor mode
trap type 6 code f8100000 eip f8100ff9 es f7bf0008 eflags 13292 cr2 f0 cpl ffffffff
panic: trap

With the kernel from the 17th tarballs, built on my machine:

vm_fault(f81cf2c8, 0, 1, 0) -> 5
fatal page fault in supervisor mode
trap type 6 code f8100000 eip ... cs ... eflags 13286 cr2 f0 cpl ffffffff

(Both times it looked like it had scrolled a ton of these on the
display before finally hanging.  In the second example, it looked like
the previous messages all had eflags 13282 in them, until the final
one that hung tight which had the eflags 13286.)

Does this make any sense to anyone?  I'm going to try to build a new
kernel with this weekend's tarballs, but I have a feeling it's not
going to make any difference, since I haven't seen any report of
anything affecting this lately.

Is this enough information to track down any bugs with?  Has anyone
heard of anything that might affect this?  I'd love to kill this bug
so I can have a stable net connection.

-----------------------------------------------------------------------------
 Michael L. VanLoon                 Iowa State University Computation Center
    michaelv@iastate.edu                    Project Vincent Systems Staff
  Free your mind and your machine -- NetBSD free Un*x for PC/Mac/Amiga/etc.
-----------------------------------------------------------------------------



------------------------------------------------------------------------------