Subject: lib/30846: problem interrupting select() via signal in a multi-threaded process
To: None <lib-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <pvinci@avdat.com.au>
List: netbsd-bugs
Date: 07/27/2005 04:28:00
>Number:         30846
>Category:       lib
>Synopsis:       problem interrupting select() via signal in a multi-threaded process
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 27 04:28:00 +0000 2005
>Originator:     Peter Vinci
>Release:        NetBSD 2.0
>Organization:
Aviation Data Systems
>Environment:
	
	
System: NetBSD tb5.avdat.com.au 2.0 NetBSD 2.0 (GENERIC) #0: Wed Dec 15 14:17:29 EST 2004 root@tb4.avdat.com.au:/mnt/netbsd/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
I am experiencing difficulties interrupting the select() call with a signal in
a multi threaded process.

For example, I have a process which uses pthread_sigmask to mask SIGALRM,
spawn a single thread which promply goes to sleep, unmasks SIGALRM with
pthread_sigmask again and then blocks without timeout on select.

SIGALRM is previously set to be handled by using sigaction().
I verify pthread_id within the signal handler to make sure it is running within
the main thread.

If I then send the SIGALRM signal to the process with either alarm() or using
kill externally, select will return -1 with errno set to EINTR as expected.
If another SIGALRM is generated, either internally or externally, select no
longer returns, nor does the handler get executed, it appears as if the signal
is masked.

I have also tried using SIGUSR1 in which I experienced the same problem.
If I handle both signals simultaneously, each signal will interrupt select()
exactly one time only.

If I don't spawn the thread, select gets interrupted every time a signal is
sent to it.

I have tried running the program under NetBSD 2.0 and NetBSD 2.02 with the
results as described above. I also tried the same program under linux, kernel
version 2.4.1.8 where it did seem to behave correctly.


>How-To-Repeat:

below is a sample code which reproduces this problem.

#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/time.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>

struct _thread_data
{
  pthread_t thread_id;
  char name[100];
};

struct _thread_data thread_data_list[10];
int data_index = 0;

void SignalHandler(int sig)
{
  int i;
  for (i = 0; i<data_index; i++)
  {
    if (thread_data_list[i].thread_id == pthread_self())
      break;
  }

  fprintf(stderr, "Signal: %d caught by %s\n", sig, thread_data_list[i].name);

  if (sig == SIGALRM)
    fprintf(stderr, "alarm caught\n");

  if (sig == SIGUSR1)
    fprintf(stderr, "usr1 caught\n");
}

void* thread_handle(void* arg)
{
  struct _thread_data t_data;

  t_data.thread_id = pthread_self();
  strcpy(t_data.name, (char*)arg);
  thread_data_list[data_index++] = t_data;

  fprintf(stderr, "Name %s created\n", (char*)arg);

  while (1)
  {
    sleep(1);
  }
}

int main(int argc, char** argv)
{
   int result;
   pthread_t id;
   struct _thread_data t_data;
   sigset_t set;
   struct sigaction sigact;

   sigact.sa_sigaction = NULL;
   sigemptyset(&sigact.sa_mask);
   sigact.sa_flags = 0;
   sigact.sa_handler = SignalHandler;

   /* set to handle SIGALRM */
   if (sigaction(SIGALRM, &sigact, NULL))
   {
      fprintf(stderr, "Error: sigaction for SIGALRM failed");
      return 1;
   }

   /* set to handle SIGUSR1 */
   if (sigaction(SIGUSR1, &sigact, NULL))
   {
      fprintf(stderr, "Error: sigaction for SIGUSR1 failed");
      return 1;
   }

  t_data.thread_id = pthread_self();
  strcpy(t_data.name, "main");
  thread_data_list[data_index++] = t_data;

  sigemptyset(&set);
  sigaddset(&set, SIGALRM);
  sigaddset(&set, SIGUSR1);
  pthread_sigmask(SIG_SETMASK, &set, NULL);

  pthread_create(&id, NULL, thread_handle, (void*)("t1"));
  sleep(1);

  sigemptyset(&set);
  pthread_sigmask(SIG_SETMASK, &set, NULL);

  while (1)
  {
    alarm(1);

    result = select(1, NULL, NULL, NULL, NULL);

    /* check for error condition */
    if (result == -1)
    {
       /* continue if interrupted */
       if (errno == EINTR)
       {
          fprintf(stderr, "interrupted\n");
          continue;
       }

       /* we have an error */
       fprintf(stderr, "Error: select() failed with errno = %d\n", errno);
       return 1;
    }
  }

  return 0;
}

>Fix:
	

>Unformatted: