tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

fd-passing (mis)behaviour: confirm/refute?



I'm seeing some peculiar misbehaviour out of SCM_RIGHTS passing of file
descriptors through AF_LOCAL/SOCK_DGRAM sockets on 5.2.  I'm wondering
if someone can try my test program on something more recent and/or give
me some pointers as to where to look to fix it.

Most briefly, if two processes are sending descriptors via SCM_RIGHTS
at a very high rate (~400000 such messages per second), it can cause
two _other_ processes to get surprising errors, like ENOTCONN when
sending to one end of a socketpair - see the comment header on the test
program (pointer below) for more.

This came about because I wanted to add something to AF_LOCAL that
meant meddling with unp_internalize and unp_externalize().  When I
wrote my stress-test program, I saw this misbehaviour.  So I rolled
back my changes - and I still saw it.

So I'm wondering if anyone else can try it.  The only machines I have
access to at the moment either (a) are running my mutant 5.2 (which
show the misbehaviour) or earlier or (b) are someone else's production
machines.  And this misbehaviour makes me think there are bugs in the
SCM_RIGHTS codepaths, leading me to fear misbehaviour affecting other
users, maybe even crashes, if I try this on someone else's production
machine.  I have numerous other changes in my kernels, but none that I
would _expect_ to have any bearing on this.

The test program is on ftp.rodents-montreal.org in
/mouse/misc/unfdstress.c.  It did compile for me on 8.0 (I test-built
it, though I didn't run it, on an 8.0 guest login) and my mutant 5.2;
I'd be interested to hear any test results anyone cares to report.

I'd also be interested in any thoughts anyone has on the question of
what may be behind the peculiar misbehaviour.  It seems to me that one
pair of processes pushing SCM_RIGHTS messages at a high rate shouldn't
break two unrelated processes' use of SCM_RIGHTS, but it also occurs to
me that there are a lot of system-wide resources that the system can
run short of (processes and open files, just for two, not that either
of those is relevant here), and I may be running into one of those.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse%rodents-montreal.org@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index