tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

IPv6 forwarding fail = mbuf leak?



Okay, I want to investigate this further, but it's been a few days and
I haven't got the round tuits yet.  So I'll put it out here in case
it's relevant to anyone else - which it may well be; see the last
paragraph.

My house network's router, including uplink, is a 4.0.1 i386 machine.
(Yes, I know 4.0.1 is no longer officially supported; I'm not asking
for support here.  If I wanted support, I'd file a PR.  Just throwing
out a heads up in case the problem still exists; if nobody else picks
this up, I'll track it down myself some one of these days.  And if
anyone does manage to find it before I do and cares to say so, bonus.)

Recently, thanks to a failure upstream from me, its IPv6 uplink went
away.  (As in, nothing responded at the address it was expecting to see
it at - in IPv4, I'd say nothing answered the ARP request, but I forget
what IPv6 calls the analogous protocol.)  At the same time, the machine
started wedging.  The wedges proved to be some kind of "out of network
memory" condition; rebooting helped...for about a day.

I built a kernel with MBUFTRACE and it developed that it was an mbuf
leak, leaking, according to netstat -mssv, "vlan2 rx".  vlan2 is/was
the main house-facing vlan.  I set up a cronjob to run netstat -m
periodically and reboot if it got close to the limit.  This job
rebooted the machine a bit more than once a day - not good, but I
prefer brief downtimes to wedging while I try to figure out what's
wrong.  (Okay, not strictly total wedging, but for my main house
router, stopping talking with the network amounts to much the same
thing in practice.)

I started looking into various ways to figure out more details of where
the mbufs were going.  My first few attempts failed badly (the first
one panicked when autoconf first attached the interface; the second, as
soon as I brought it up).

Before I got anything better working, I got fed up with IPv6 not
working and fixed it (well, fixed most of it myself and got the
relevant person to fix the rest).

The mbuf leak stopped dead.  The machine's been up for multiple days,
now, and vlan2 rx usage is still at only 1.

This leads me to a very strong suspicion that there's an mbuf leak in
the code paths used when an IPv6 packet is forwarded to a host that
isn't answering whatever IPv6's IPv6-address-to-MAC-address mapping
protocol is.  The house network would try to speak to the v6 world
occasionally even without v6 connectivity, so I would expect a low
level of outgoing v6 traffic.

The machine actually is not quite 4.0.1; it's 4.0.1 plus my fixes.  But
I am moderately sure none of them are likely to have any bearing (as
one simple example, the route in question does not go out an srt).

As I said, I'll track it down someday if nobody else does.  But, unless
that code has been reworked between 4.x and now, the bug may well still
exist, in which case someone might want to look into it in more modern
NetBSD.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index