NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Corosync on NetBSD



On Fri, Sep 17, 2010 at 11:29 AM, haad <haaaad%gmail.com@localhost> wrote:

> Hi,
>
> On Fri, Sep 17, 2010 at 11:51 AM, Adam Hoka <adam.hoka%gmail.com@localhost> 
> wrote:
> > On Thu, 16 Sep 2010 09:14:01 +0200
> > Stephan Wiebusch <stephanwib%googlemail.com@localhost> wrote:
> >
> >> Hi,
> >>
> >> I tried to make the Corosync 1.2.8 cluster software work on NetBSD
> 5.0.2.
> >> There were some issues to fix in the source and make files. One can
> compile
> >> and run it then. Unfortunately it doesn_t do anything except eating 100%
> >> CPU. I sent a list with the build issues to the Corosync mailing list,
> as
> >> there were:
> >>
> >>
> >>
> >> => lib/coroipcc.c
> >> -There are some "#ifdef COROSYNC_BSD" statements which include some
> >> madvise() calls. There is a MADV_NOSYNC flag being used which is only
> >> available on FreeBSD. I commented these lines out.
> >> -There is a "semun" union requiered which is not defined in "sys/sem.h"
> on
> >> NetBSD, in contrast to FreeBSD. I had to add it to this file:
> >>
> >>         union semun {
> >>                 int     val;            /* value for SETVAL */
> >>                 struct  semid_ds *buf;  /* buffer for IPC_STAT & IPC_SET
> */
> >>                 u_short *array;         /* array for GETALL & SETALL */
> >>         };
>
> To me it seems that we should add this to sem.h. See [1]
>
> >> =>exec/totemip.c
> >> -There is an #include for <net/if_var.h>. This file is not present on
> >> NetBSD, so I just dropped that.
> >>
> >> =>exec/logsys.c
> >> -The same problem with madvise() and MADV_NOSYNC.
> >
> > Are you sure, that the code will do the same after that?
>
> These calls were added by FreeBSD guys in this commit [2]. To me it
> seems that we can ignore those calls as we do not specify anything
> like that in our mmap/madvise manual pages. Have you tried to run it
> with gdb ?
>

Yes. The corosync process has 4 LWPs:

# ps axs | grep corosync
  0  3837     1 3957   4    4 191  0 23708  2708 parked   I-   ?
19:28.84 ./corosync
  0  3837     1 3957   3    4 191  0 23708  2708 select   I-   ?
19:28.84 ./corosync
  0  3837     1 3957   2    4 191  0 23708  2708 parked   I-   ?
19:28.84 ./corosync
  0  3837     1 3957   1    4 191  0 23708  2708 -        R    ?
19:28.84 ./corosync

In "live" mode, it seems that gdb can not handle LWPs (is this a NetBSD
issue?):

0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
(gdb)
(gdb) info thr
(gdb) thr 2
Thread ID 2 not known.
(gdb) bt
#0  0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
#1  0xbbbafb14 in pthread_cond_wait () from /usr/lib/libpthread.so.0
#2  0xbbbac8b1 in sem_wait () from /usr/lib/libpthread.so.0
#3  0x0804dbcf in corosync_exit_thread_handler (arg=0x0) at main.c:198
#4  0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#5  0xbbafd670 in swapcontext () from /usr/lib/libc.so.12


What I can do is to kill the process with SIGABRT and then analyse the core
file:

(gdb) info thr
  4 process 69373  0xbbbae202 in pthread_rwlock_unlock () from
/usr/lib/libpthread.so.0
  3 process 134909  0xbbb14697 in _lwp_exit () from /usr/lib/libc.so.12
  2 process 200445  0xbbadab67 in poll () from /usr/lib/libc.so.12
* 1 process 265981  0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
(gdb) bt
#0  0xbbadb5a7 in _lwp_park () from /usr/lib/libc.so.12
#1  0xbbbafb14 in pthread_cond_wait () from /usr/lib/libpthread.so.0
#2  0xbbbac8b1 in sem_wait () from /usr/lib/libpthread.so.0
#3  0x0804dbcf in corosync_exit_thread_handler (arg=0x0) at main.c:198
#4  0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#5  0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 2
[Switching to thread 2 (process 200445)]#0  0xbbadab67 in poll () from
/usr/lib/libc.so.12
(gdb) bt
#0  0xbbadab67 in poll () from /usr/lib/libc.so.12
#1  0xbbbad0f9 in poll () from /usr/lib/libpthread.so.0
#2  0x08050269 in prioritized_timer_thread (data=0x0) at timer.c:127
#3  0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#4  0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 3
[Switching to thread 3 (process 134909)]#0  0xbbb14697 in _lwp_exit () from
/usr/lib/libc.so.12
(gdb) bt
#0  0xbbb14697 in _lwp_exit () from /usr/lib/libc.so.12
#1  0xbbbb11f6 in pthread_exit () from /usr/lib/libpthread.so.0
#2  0xbbbc0f75 in logsys_worker_thread (data=0x0) at logsys.c:733
#3  0xbbbb19df in pthread_create () from /usr/lib/libpthread.so.0
#4  0xbbafd670 in swapcontext () from /usr/lib/libc.so.12
(gdb) thr 4
[Switching to thread 4 (process 69373)]#0  0xbbbae202 in
pthread_rwlock_unlock () from /usr/lib/libpthread.so.0
(gdb) bt
#0  0xbbbae202 in pthread_rwlock_unlock () from /usr/lib/libpthread.so.0
#1  0x00000000 in ?? ()



>
> >> =>exec/coroipcs.c
> >> -Once more the madvise() issue.
> >> -And once more a missing semun union.
> >>
>
>
>
>
> [1] http://www.opengroup.org/onlinepubs/000095399/functions/semctl.html
> [2]
> http://www.mail-archive.com/openais%lists.linux-foundation.org@localhost/msg02767.html
>
> --
>
>
> Regards.
>
> Adam


Home | Main Index | Thread Index | Old Index