NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: PR/51531 CVS commit: src/usr.sbin/sysinst
The following reply was made to PR kern/51531; it has been noted by GNATS.
From: Andreas Gustafsson <gson%gson.org@localhost>
To: Roy Marples <roy%marples.name@localhost>
Cc: gnats-bugs%netbsd.org@localhost
Subject: Re: PR/51531 CVS commit: src/usr.sbin/sysinst
Date: Wed, 14 Dec 2016 18:39:57 +0200
Roy Marples wrote:
> This sounds like one of two things:
> 1) there is a timing issue
> 2) the link status is flapping
>
> Because at the point of the ifconfig -w command there are no futher
> interface changes, nothing should be resetting the PHY at all so one of
> the above must be true.
> I say this because the maximum length from DETACHED -> TENTATIVE ->
> ready to go is 10 seconds once the link is active and your logs show
> it's at TENTATIVE way beyond that which implies link state is flapping
> or 10 seconds is not 10 seconds on this device.
I just ran another pair of tests. First, I made the test script run
"ifconfig -a" in a loop, once per second for 60 seconds, starting two
seconds after the failed "ftp". The "ifconfig -a" output had the
TENTATIVE flag set in the first five runs, and clear in the remaining
55 runs.
Then I did a second run, but now with a 30-second delay between the
failed "ftp" and the first "ifconfig -a" instead of a 2-second delay.
Now, the output had the TENTATIVE flag set in the first six runs, and
clear in the remaining 54.
So, the TENTATIVE flag appears to be set for some 5-6 seconds,
counting from the time when "ifconfig -a" is first run, *regardless*
of when that first run takes place. The only explanation I can think
of is that it must be the "ifconfig -a" itself that starts the timer,
despite supposedly being a read-only operation.
Is it possible that the initial DAD attempt could somehow get stuck
indefinitely, for example because of the NIC being in an unexpected
initial state after the netboot, and that the "ifconfig -a" could
then cause it to come unstuck?
> Some idea of how to solve this are
> 1) Implement RFC4429 and apply it to IPv4 as well
This, too, seems like a worthwhile change for reasons unrelated to the
present issue, and it might be a viable work-around, but I don't think
it addresses the root cause.
> (needs a toggle like ifconfig bge0 inet 1.2.3.4/24 optimistic)
> 2) Add a toggle to sysinst to disable DaD
IMO, it needs to work with the default settings. Someone trying to
install NetBSD for the first time and running into this issue is far
more likely to give up and move on to the next OS candidate than to
find out what special settings are needed.
> I don't suppose you could try swapping the interfacec out with another
> one?
I currently have two machines configured for this automated test.
I already tried the other one, and it behaved the same way.
Unfortunately, even though they have different motherboards and CPUs,
they have the same type of on-board Ethernet interface (Marvell Yukon
Lite) so the possibility remains that this issue could be specific
to that chip or the sk driver.
I'll see if I can find a suitable PCI card (this is not as trivial as
it may sound because it needs to have a working PXE ROM). Or maybe
I'll set up a third machine...
--
Andreas Gustafsson, gson%gson.org@localhost
Home |
Main Index |
Thread Index |
Old Index