Subject: Possible I/F race?
To: Port-SPARC <Port-SPARC@NetBSD.org>
From: Don Yuniskis <auryn@gci-net.com>
List: port-sparc
Date: 11/26/2001 15:00:04
Greetings and Sublimations!
    OK, so I didn't take good enough notes at the time...
mea culpa.

    I was installing/configuring 1.5.2 on a few different
boxes here and noticed some flakey behaviour.  I *think*
this was on an LX box (I seem to recall the network transceiver
was not directly plugged into a DE15 on the *box* but, rather,
on the end of an umbilical -- Thanks Todd!  :>)  And, I am
assuming it was an LX instead of a Classic (the other box that
I have that requires said umbilical) because the LX is a
recent addition and I know I didn't recall seeing this
behaviour [sic] previously when I was working on the Classic.

    I didn't have enough "T"s into my coax (10base2) nearby so
I was sharing a tap -- do something on one box, unplug the
"T" and plug it into the other box, etc.

    I had just rebooted the machie in question (?) and noticed
that the transceiver wasn't connected to the media.  So, I
reached over and unplugged machine A from the coax and plugged
this machine into it in its place.  And, almost simultaneously,
the machine panicked.
    The first time this happened, I thought it was a fluke
or, possibly flakey hardware (recall that the LX was being
brought up for the first time).
    Some time later, the same set of circumstances happened
and the same result.  "Fool me once, shame on you; fool me
twice, shame on *me*!"  So, I looked at the messages on
the screen and noticed that they were awfully similar to
the exact set of messages that had appeared the last time 
this had happened.  And, I considered it highly unlikely
that I would have coincidentally happened to plug in the
media at *exactly* the same point in the boot process!

    The system panicked just as it was "adding network 
aliases", IIRC.

    Now, I know I can remove the transceiver and/or disconnect
and reconnect the media while the system is *up*.  So, if
my actions are causing the panic, then it is some sort of race
that occurs early in the boot sequence -- maybe an unclaimed
interrupt that is caused by the network interface's status
changing?  Recall that the LX (and Classic in case it was 
that machine!) have a 10baseT built in and I *think* this is
the default "connection" that is used for the interface.
Once the system detects loss of carrier, it switches to
the AUI port.  Perhaps this is the issue (?)

    I can try to make some time to recreate this and jot
down the exact error message (if there *was* one!) if someone
is interested in pursuing it (or, submit a send-pr).  Or, if
someone more intimate with the code base can just casually
peruse it and stumble over something "obvious" given these
symptoms...

    Note that all machines are operating fine so it doesn't
appear to be a persistent hardware problem -- just don't
mess with the network connection!  :>

Thx!
--don