Port-i386 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: 6.0.1 upgrade dhcp problem



    Date:        Mon, 4 Feb 2013 11:10:40 +0100
    From:        Martin Husemann <martin%duskware.de@localhost>
    Message-ID:  <20130204101040.GB3895%mail.duskware.de@localhost>

  | Yes - dhcpcd properly monitors link state. You link goes up and down,

That's obvious from the diagnostics in the Ray's most recent e-mail.
The question is what is doing that, and why.

I suppose it might be just (very strangely) broken hardware, but
it is hard to imagine what could cause the same kind of problem for
two different hardware interfaces (fxp & sk - those are very different beasts.)

The interrupt routing suggestion was one that seemed plausible to me,
except that it is now clear that everything works fine if dhclient is
used, or if the interface is manually configured, and I can't imagine
how that would happen if interrupts were not working properly.

It also can't really be the driver, as Ray's other system had what looks
to be an identical fxp interface (certainly it was using the same driver,
loaded off the same CD) so if it is a buggy driver that's doing this, I
would have expected it to affect both fxp interfaces and not the sk interface
(or vice versa) and not two different interfaces on one system, but not
an interface that looks identical on a different system.

I guess now is time to suspect the power supply - about the only remaining
thing that I can think of that could affect carrier, on different interfaces
(and interface types) and which would apply to one system but not an
identical one.

But if the discs and CD (etc) are all working without any observed glitches
(and they're likely bigger power consumers) then this doesn't seem too
likely - unless there's some motherboard power problem that doesn't
affect external connections.

Ray, does the other system, the one with the fxp0 interface that worked,
also have two network interfaces?   And is it the same kind of motherboard?

I suppose you used the same drop cable when you connected the other
system, so we can rule out a broken (comes and goes) wire in the drop
cable?

I'm just hunting for things that are the same (and different) between those
two systems that could be investigated to see if they're the cause.

It could also be worth doing a fairly tight loop after either configuring
with dhclient, or manually, to see if the
         status: active
is ever changing to
        status: no carrier

Something like:

        while :; do ifconfig fxp0; done

or perhaps better

        while :; do ifconfig fxp0; done | grep status:

You could even make it be

        while :; do ifconfig fxp0; done | grep status: | grep -v active

That is, when dhcpcd is not doing its thing - this is to attempt to find
out if something is causing carrier to come & go randomly all the time
(which would certainly upset dhcpcd, but not dhclient) or whether the
carrier transitions observed while running dhcpcd are being caused by
something that dhcpcd is (directly or indirectly) doing.

You should only need to let that run for 10 or 15 seconds (and there's
no need to paste all the output, just whether or not the status ever
changes while you're watching).

You could also try running dhcpcd -K fxp0  (perhaps with the -B option as 
well, as Martin suggested) and see if that is able to configure the address.
-K is meant to compensate for buggy drivers (which I don't think this is,
too many people use the fxp driver not to have noticed if it had that kind
of problem) but it might also paper over problems if the carrier change
events are causing dhcpcd to unconfigure the interface and start again,
rather than something else causing the unconfig which in turn causes
the carrier loss.

That is, at the minute we don't know which is chicken and which is egg...

From your (Ray's) message ...

r.phillips%uq.edu.au@localhost said:
  | It shows the carrier coming and going for some reason  but no IPv4 address
  | being assigned to the NIC.

yes, it does.   This is the point I would have expected the next ifconfig
output to display the address (as from earlier messages I am very confident
that it is actually configuring it, for at least a short time):

dhcpcd[49]: fxp0: leased 192.168.36.135 for 604800 seconds
dhcpcd[49]: forked to background, child pid 83

but the next ifconfig output, which had to be within 1 second was ...

fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
         capabilities=1400<TCP4CSUM_Rx,UDP4CSUM_Rx>
         enabled=0
         address: 00:10:dc:7b:92:a9
         media: Ethernet autoselect (none)
         status: no carrier
         inet6 fe80::210:dcff:fe7b:92a9%fxp0 prefixlen 64 scopeid 0x3

and carrier has now gone (whereas carrier was there just before dhcpcd would
have been configuring the address, that is 1 second earlier).

And this happens every time (which kind of blows away power supply glitching
as a hypothesis - that couldn't be that predictable.)

It really looks as if something in the way that dhcpcd actually configures
the address into the interface - on this system, but not on others using
the same ethernet driver - is causing the carrier loss.

If it isn't that, then the coincidences are just unbelievable.

I don't suppose there's any kind of authorisation issue at the switch,
where the switch is dropping carrier for your system that is having
problems (and dhclinet just ignores it) but the system that does work
is configured differently there, and so doesn't get bounced?

That is, another difference between the system that works, and the one that
doesn't, is the MAC address(es) - and while those should be irrelevant
to NetBSD, they might be significant to the switch, and the switch can
certainly control carrier, if it feels the need.

This is still all just speculation & guesswork, and the guesses in this
message are no more (and probably less) likely than those in the previous one.

kre

ps: if you're able, get spanning tree disabled for the port you're connected
to - unless you're planning on running as a bridge, and bridging to a link
that loops back to the switch, there's really no benefit, and lots of pain,
to having spanning tree enabled.        



Home | Main Index | Thread Index | Old Index