pkgsrc-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

pkg/32621: ucarp pkg doesn't work



>Number:         32621
>Category:       pkg
>Synopsis:       ucarp pkg doesn't work
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    pkg-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jan 24 23:05:01 +0000 2006
>Originator:     Gregory McGarry
>Release:        NetBSD 2.0, NetBSD-current
>Organization:
>Environment:
NetBSD 2.0 w/ ex0
NetBSD-current w/ tlp0
>Description:
Using ucarp 1.1.  It's the same version as was recently updated in pkgsrc.

The UCARP master machine sends multicast "heartbeat" packets onto the network 
which are received by backup machines.  Any backup machine will assume the role 
of the master if the master machine goes offline.  When the master machine 
resumes, the multicast beartbeat packets are detected by the backup machine and 
it reliquished the master role.

However, if the master machine is running NetBSD, then it receives its own 
multicast heartbeat packets, interpretting them as coming from another master 
machine, and immediately falls back to the backup role.  When the multicast 
heartbeat signal is missing, it switches back to the master role, detects its 
own multicast heartbeat packet again, and immediately resumes the backup role.  
This ping-pong effect continues.

>How-To-Repeat:
Trying running ucarp 1.1.
>Fix:
Does this happen on all machines?  I have seen it on tlp and ex hardware.  I'm 
not sure whether it is expected behaviour, or an issue with multicast filters 
on these nics.

Anyway, the following patch simply checks if the multicast heartbeat packet was 
sent by us:

--- carp.c.orig 2006-01-24 14:44:07.000000000 -0800
+++ carp.c      2006-01-24 14:45:00.000000000 -0800
@@ -428,6 +428,16 @@
     dest = ntohl(iphead.ip_dst.s_addr);
     proto = iphead.ip_p;    
 
+#ifdef DEBUG
+    printf("source=%ld (%ld), srcip=%ld(%ld)\n", source, (iphead.ip_src.s_addr)
, ntohl(srcip.s_addr), srcip.s_addr);
+#endif
+
+    /*
+     * Don't process our own multicasts.
+     */
+    if (iphead.ip_src.s_addr == srcip.s_addr)
+       return;
+
     switch (proto) {
     case IPPROTO_CARP: {
         struct carp_header ch;


With this change UCARP works well.  I have used it on some very large public 
networks.  Having thoughts about it, i wonder if this failure is the source of 
people's interest in integrating CARP into the kernel.  IMHO, this is not a 
protocol which belongs in the kernel and getting UCARP working correctly is the 
correct solution.




Home | Main Index | Thread Index | Old Index