tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Looking for help diagnosing dhcpcd problem



Hi, tech-net!

I've had an issue with dhcpcd on an amd64 machine running NetBSD 10 for months and I'd like to ask for help diagnosing it.

Some history:

This is with two different machines at one location and two different ISPs. They only share a bulk of dhcpcd.conf, but otherwise:

One machine ran with this setup for years, and only in the last year or so did I notice dhcpcd dying. I then added a crontab entry for "/etc/rc.d/dhcpcd start" to run every half hour or so.

This was an older AMD machine with Realtek re ethernet interfaces for both public Internet and for local, NATed network.

I then switched to a newer AMD system and used a dual rge card for both Internet and LAN. However, there are issues [1] with rge, so I bought a dual Broadcom (bge) card.

We then switched ISPs from Frontier to Spectrum, but the problem still occurs. Frontier doesn't offer IPv6, so that was turned on recently, but that didn't affect the dying of dhcpcd.

Interestingly, I have another machine, identical to the first machine used here, with the exact same dhcpcd.conf running on Optimum, which also doesn't provide IPv6, but I've left those lines there because that's a few thousand miles away and I don't want to take any chances of that becoming problematic.

Here's the dhcpcd.conf in current use (comment lines removed):

hostname sage.zia.io
duid
persistent
require dhcp_server_identifier
option rapid_commit
option domain_name_servers, domain_name, domain_search
option classless_static_routes
option interface_mtu
slaac hwaddr
nohook resolv.conf
noipv6rs		# disable routing solicitation
allowinterfaces bge0
interface bge0
	timeout 360	# Wait up to six minutes (time for cable modem to boot)
	waitip 4
	ipv6rs		# enable routing solicitation
	ia_na 1		# request an IPv6 address
	ia_pd 2 bge1/0	# request a PD and assign it to bge1

There's /etc/dhcpcd.exit-hook which is needed for Internet to work when the address changes:

#!/bin/sh
case "$interface" in
    lo[0-9]* | tun[0-9]*) exit;;
esac
/etc/rc.d/npf reload

There's nothing meaningful in the logs - when dhcpcd dies, it does so silently, so I ran it like so:

ktrace /sbin/dhcpcd -B -d -M -f /etc/dhcpcd.conf

After waiting several days, it died, and I now have 2,176,126 line long ktrace.out ;) The last 2000 lines are here:

https://www.klos.com/~john/ktracedhcpcd.log

I see plenty of "Too many open files" messages near the end, even though the typically running dhcpcd process has less than 70 open file handles, even after days (on the system where it exits) or months (the system that's far away). The system where it exits has kern.maxfiles = 16384, so that's not an issue).

Also, this appears to happen when the ISP would either be giving a new lease or a new IP address.

Also, running dhcpcd with -d showed lots and lots of lines like these:

bge0: Router Advertisement from fe80::201:5cff:fe6b:4846
bge0: executing: /libexec/dhcpcd-run-hooks ROUTERADVERT
Reloading NPF ruleset /etc/npf.conf

This happens every five seconds or so, which seeems... excessive.

Does anyone have any thoughts, suggestions or observations about what I might be doing wrong, or what I could try differently?

Thanks!
John


[1] https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=58047
    https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57694
    https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57972


Home | Main Index | Thread Index | Old Index