Strange semaphore behavior

To: NetBSD User Maillist <netbsd-users%NetBSD.org@localhost>
Subject: Strange semaphore behavior
From: BERTRAND Joël <joel.bertrand%systella.fr@localhost>
Date: Thu, 7 Feb 2019 18:16:22 +0100

	Hello,

	I develop for a long time RPL/2 (http://www.rpl2.net). This software
was mainly written on Linux and Solaris operating systems but I have
tested this software on NetBSD 6 and 7 without any special trouble.

	I have installed NetBSD-8 on a new workstation and computations I have
launched aborts with an RPL/2 system error. The same computations run
without any trouble on Solaris or Linux. Maybe there are some bugs in
this software, and I have tried to debug.

	RPL/2 aborts on semaphore operation. Thus I have replaced regular
sem_wait and sem_post by:

#   define sem_wait(a) ({ int value; sem_getvalue(a, &value); \
            printf("[%d-%llu] Semaphore %s (%p) "\
            "waiting at %s() " \
            "line #%d <%d>\n", (int) getpid(), (unsigned long long) \
                    pthread_self(), \
            #a, a, __FUNCTION__, __LINE__, value), fflush(stdout); \
            sem_wait(a); })
#   define sem_post(a) ({ int value; sem_getvalue(a, &value); \
            printf("[%d-%llu] Semaphore %s (%p) "\
            "posting at %s() " \
            "line #%d <%d>\n", (int) getpid(), (unsigned long long) \
                    pthread_self(), \
            #a, a, __FUNCTION__, __LINE__, value), fflush(stdout); \
            sem_post(a); })

	Of course, all sources files were compiled again. I obtain a very long
output file that contains:

[2871-135822618777600] Semaphore &((*s_etat_processus).semaphore_fork)
(0x7b87aab6fc50) waiting at librpl_analyse() line #1021 <2>
LAST ERROR: Invalid argument
ERROR 2009 AT librpl_analyse() FROM analyse-conv.c LINE 1028
[2871-135822618777600] librpl_analyse() from analyse-conv.c at line
1028: BACKTRACE only defined in glibc
+++Système : Erreur dans la gestion des processus [2871]

	OK. Faulty line is:

	while(sem_wait(&((*s_etat_processus).semaphore_fork)) != 0)
	{
		...
	}

and errno os EINVAL. A few line above, I have written :

	if (sem_post(&((*s_etat_processus).semaphore_fork)) != 0)
	{
		...
	}

that triggers no error. Of course, I have verified this semaphore was
not destroyed.

	Real source code is:

#   ifndef SEMAPHORES_NOMMES
        if (sem_post(&((*s_etat_processus).semaphore_fork)) != 0)
#   else
        if (sem_post((*s_etat_processus).semaphore_fork) != 0)
#   endif
    {
        (*s_etat_processus).erreur_systeme = d_es_processus;
        return;
    }

#   ifndef SEMAPHORES_NOMMES
        while(sem_wait(&((*s_etat_processus).semaphore_fork)) != 0)
#   else
        while(sem_wait((*s_etat_processus).semaphore_fork) != 0)
#   endif
    {
        if (errno != EINTR)
        {
            (*s_etat_processus).erreur_systeme = d_es_processus;
            return;
        }
    }

(SEMAPHORES_NOMMES is undefined on NetBSD operating system).

	In a second time, I have done a grep on output file to check all
operations on this semaphore:

$ grep 0x7b87aab6fc50 out
...
[2871-135822618777600] Semaphore &((*s_etat_processus).semaphore_fork)
(0x7b87aab6fc50) waiting at librpl_analyse() line #1021 <1>
// SEM 1->0 OK
[2871-135822618777600] Semaphore &((*s_etat_processus).semaphore_fork)
(0x7b87aab6fc50) posting at librpl_analyse() line #1011 <0>
// SEM 0->1 OK
[2871-135822618777600] Semaphore &((*s_etat_processus).semaphore_fork)
(0x7b87aab6fc50) waiting at librpl_analyse() line #1021 <1>
// SEM 1->0 OK
[2871-135822618777600] Semaphore &((*s_etat_processus).semaphore_fork)
(0x7b87aab6fc50) posting at librpl_analyse() line #1011 <1>
// SEM 1->2 NOK !!!! Sem should be equal to 0 before post !

	This semaphore is initialized with:

	sem_init(&((*s_etat_processus).semaphore_fork), 0, 0);

and is not shared between process, only between threads.

	Thus, in the same process (same PID) and same thread (same TID), a
value of a sem_wait() seems not to change semaphore value. Of course,
there is no reason to have a value greater than 1.

	This code is built with:
schwarz# /usr/pkg/gcc8/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/pkg/gcc8/bin/gcc
COLLECT_LTO_WRAPPER=/usr/pkg/gcc8/libexec/gcc/x86_64--netbsd/8.2.0/lto-wrapper
Target: x86_64--netbsd
Configured with: ../gcc-8.2.0/configure --disable-libstdcxx-pch
--with-system-zlib --enable-nls --with-libiconv-prefix=/usr
--enable-__cxa_atexit --with-gxx-include-dir=/usr/pkg/gcc8/include/c++/
--disable-libssp --enable-languages='c obj-c++ objc fortran c++'
--enable-shared --enable-long-long --with-local-prefix=/usr/pkg/gcc8
--enable-threads=posix --with-boot-ldflags='-static-libstdc++
-static-libgcc -Wl,-R/usr/pkg/lib ' --with-gnu-ld --with-ld=/usr/bin/ld
--with-gnu-as --with-as=/usr/bin/as --with-arch=nocona
--with-tune=nocona --with-fpmath=sse --prefix=/usr/pkg/gcc8
--build=x86_64--netbsd --host=x86_64--netbsd
--infodir=/usr/pkg/gcc8/info --mandir=/usr/pkg/gcc8/man
Thread model: posix
gcc version 8.2.0 (GCC)

and compilation flags are:
CFLAGS = -g -O2 -mtune=native -march=native -fno-strict-overflow
-malign-double -Wall -funsigned-char -Wno-pointer-sign
(on a i7-4770 workstation)

	I have tried to write a simple program that could trigger this 'bug'
without any success. On my test program, the first bug appears after the
creation of more than 3000 threads... How can I be sure that the error
comes from my code and not from a NetBSD bug?

	Best regards,

	JB

Follow-Ups:
- Re: Strange semaphore behavior, sem_init() fails with errno 4294967295 (-1)
  - From: BERTRAND Joël

Prev by Date: npf and GRE
Next by Date: xfce4 startup issue
Previous by Thread: npf and GRE
Next by Thread: Re: Strange semaphore behavior, sem_init() fails with errno 4294967295 (-1)
Indexes:

Home | Main Index | Thread Index | Old Index