NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: PR/51531 CVS commit: src/usr.sbin/sysinst



The following reply was made to PR kern/51531; it has been noted by GNATS.

From: Andreas Gustafsson <gson%gson.org@localhost>
To: Roy Marples <roy%marples.name@localhost>
Cc: gnats-bugs%netbsd.org@localhost
Subject: Re: PR/51531 CVS commit: src/usr.sbin/sysinst
Date: Wed, 14 Dec 2016 18:39:57 +0200

 Roy Marples wrote:
 > This sounds like one of two things:
 > 1) there is a timing issue
 > 2) the link status is flapping
 > 
 > Because at the point of the ifconfig -w command there are no futher 
 > interface changes, nothing should be resetting the PHY at all so one of 
 > the above must be true.
 > I say this because the maximum length from DETACHED -> TENTATIVE -> 
 > ready to go is 10 seconds once the link is active and your logs show 
 > it's at TENTATIVE way beyond that which implies link state is flapping 
 > or 10 seconds is not 10 seconds on this device.
 
 I just ran another pair of tests.  First, I made the test script run
 "ifconfig -a" in a loop, once per second for 60 seconds, starting two
 seconds after the failed "ftp".  The "ifconfig -a" output had the
 TENTATIVE flag set in the first five runs, and clear in the remaining
 55 runs.
 
 Then I did a second run, but now with a 30-second delay between the
 failed "ftp" and the first "ifconfig -a" instead of a 2-second delay.
 Now, the output had the TENTATIVE flag set in the first six runs, and
 clear in the remaining 54.
 
 So, the TENTATIVE flag appears to be set for some 5-6 seconds,
 counting from the time when "ifconfig -a" is first run, *regardless*
 of when that first run takes place.  The only explanation I can think
 of is that it must be the "ifconfig -a" itself that starts the timer,
 despite supposedly being a read-only operation.
 
 Is it possible that the initial DAD attempt could somehow get stuck
 indefinitely, for example because of the NIC being in an unexpected
 initial state after the netboot, and that the "ifconfig -a" could
 then cause it to come unstuck?
 
 > Some idea of how to solve this are
 > 1) Implement RFC4429 and apply it to IPv4 as well
 
 This, too, seems like a worthwhile change for reasons unrelated to the
 present issue, and it might be a viable work-around, but I don't think
 it addresses the root cause.
 
 >     (needs a toggle like ifconfig bge0 inet 1.2.3.4/24 optimistic)
 > 2) Add a toggle to sysinst to disable DaD
 
 IMO, it needs to work with the default settings.  Someone trying to
 install NetBSD for the first time and running into this issue is far
 more likely to give up and move on to the next OS candidate than to
 find out what special settings are needed.
 
 > I don't suppose you could try swapping the interfacec out with another 
 > one?
 
 I currently have two machines configured for this automated test.
 I already tried the other one, and it behaved the same way.
 Unfortunately, even though they have different motherboards and CPUs,
 they have the same type of on-board Ethernet interface (Marvell Yukon
 Lite) so the possibility remains that this issue could be specific
 to that chip or the sk driver.
 
 I'll see if I can find a suitable PCI card (this is not as trivial as
 it may sound because it needs to have a working PXE ROM).  Or maybe
 I'll set up a third machine...
 -- 
 Andreas Gustafsson, gson%gson.org@localhost
 


Home | Main Index | Thread Index | Old Index