Subject: pkg/32621: ucarp pkg doesn't work
To: None <pkg-manager@netbsd.org, gnats-admin@netbsd.org,>
From: None <g.mcgarry@ieee.org>
List: pkgsrc-bugs
Date: 01/24/2006 23:05:01
>Number:         32621
>Category:       pkg
>Synopsis:       ucarp pkg doesn't work
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    pkg-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jan 24 23:05:01 +0000 2006
>Originator:     Gregory McGarry
>Release:        NetBSD 2.0, NetBSD-current
>Organization:
>Environment:
NetBSD 2.0 w/ ex0
NetBSD-current w/ tlp0
>Description:
Using ucarp 1.1.  It's the same version as was recently updated in pkgsrc.

The UCARP master machine sends multicast "heartbeat" packets onto the network which are received by backup machines.  Any backup machine will assume the role of the master if the master machine goes offline.  When the master machine resumes, the multicast beartbeat packets are detected by the backup machine and it reliquished the master role.

However, if the master machine is running NetBSD, then it receives its own multicast heartbeat packets, interpretting them as coming from another master machine, and immediately falls back to the backup role.  When the multicast heartbeat signal is missing, it switches back to the master role, detects its own multicast heartbeat packet again, and immediately resumes the backup role.  This ping-pong effect continues.

>How-To-Repeat:
Trying running ucarp 1.1.
>Fix:
Does this happen on all machines?  I have seen it on tlp and ex hardware.  I'm not sure whether it is expected behaviour, or an issue with multicast filters on these nics.

Anyway, the following patch simply checks if the multicast heartbeat packet was sent by us:

--- carp.c.orig 2006-01-24 14:44:07.000000000 -0800
+++ carp.c      2006-01-24 14:45:00.000000000 -0800
@@ -428,6 +428,16 @@
     dest = ntohl(iphead.ip_dst.s_addr);
     proto = iphead.ip_p;    
 
+#ifdef DEBUG
+    printf("source=%ld (%ld), srcip=%ld(%ld)\n", source, (iphead.ip_src.s_addr)
, ntohl(srcip.s_addr), srcip.s_addr);
+#endif
+
+    /*
+     * Don't process our own multicasts.
+     */
+    if (iphead.ip_src.s_addr == srcip.s_addr)
+       return;
+
     switch (proto) {
     case IPPROTO_CARP: {
         struct carp_header ch;


With this change UCARP works well.  I have used it on some very large public networks.  Having thoughts about it, i wonder if this failure is the source of people's interest in integrating CARP into the kernel.  IMHO, this is not a protocol which belongs in the kernel and getting UCARP working correctly is the correct solution.