tests/lib/libpthread/t_mutex:mutex6

To: tech-userlevel%netbsd.org@localhost
Subject: tests/lib/libpthread/t_mutex:mutex6
From: Robert Elz <kre%munnari.OZ.AU@localhost>
Date: Thu, 23 Nov 2017 17:34:54 +0700
I have been spending a few minutes examining this test, which
fails from time to time, to see if I can see why...

My conclusion is that the test is a total crock of sh*t - it
is a race condition on a short leash carrying around a large
box of bugs, hoping to avoid spilling any.

And failing.

The whole thing should simply be deleted, it is useless for
anything.

First the test (from its name) can be presumed to be a test of
mutexes - and sure enough, there is one, which guards a critical
region where two threads do code like

	if (start == 0) start = ME;

just before they exit.   The test ultimately succeeds or fails
based upon whether or not the "correct" thread exits first - after
both threads have finished, main() checks the value of "start" and
expects to see the value for the thread it expected to have finished
first.  (First, because only that one will see start == 0, by the time
the other gets there, start will have been altered to identify the first.)

All that sounds fine - provided that the main program can correctly
diagnose which thread will be finished first ... except that for this
to be a useful test of the correct working of the mutex, we really
want the two threads to be finishing about the same time - if we know
that one will finish before the other, then the mutex is really just
there for show, and isn't really being tested at all.

And that's what the test thinks it is doing - the threads run at
different priorities, doing (it seems) a whole bunch of computation,
so one should be getting the cpu, and the other waiting for the
first to allow it some time.

First problem (or possible problem, I don't know enough about our
threads implementation to know if this is an issue or not) - if
the threads get scheduled on different (physical) cpus (which might
have been unlikely years ago) then they're both going to run in
parallel, their priorities would be irrelevant (this doesn't apply
if somehow we guarantee that two threads cannot both be running
on different CPUs at the same time - but to me, that would make
threads go from just barely a useful programming technique, to one
so useless as to not be worth implementing.)

Second, in order to allow the "other" thread to work, each thread
periodically interrupts its computation with a sleep(1) - that should
allow the lower priority thread to work while the higher priority
thread sleeps - next problem, the code is assuming that it is doing
enough computation that it won't finish what it is asked to do in
less than a second .. with ever increasing CPU speeds and a computation
loop that never alters, this gets less and less likely to be true.
If the computation takes less than a second, then each of the threads
does its bit while the other is sleeping for a second (even assuming that
they are sharing one physical cpu), and some of the time, they're both
just sleeping.

With this, it is anyone's guess which of the threads will finish first,
the two of them are operating in lock step, my turn, your turn ...
so what will usually matter is who starts first - the main program
starts the high priority thread first, so usually it will take its
turn first, and so finish first.   But there's certainly no guarantee
of that, and the priorities would have no effect.

Of course, that is true only if the computation is taking less than
a second to complete - on a modern high performance amd64 class CPU
one might suspect that there might be a problem, but surely not on
the qemu emulator that bablylon5 uses to run many of the tests,
it is not all that speedy...

So, we need to examine the computation see how much work is being
asked of the cpu, in between the one second sleeps, how long can we
expect it to take?

Here's the computation loop .. (as it is actually executed, ignore
what is written in the code .. for that see a bit lower down)

	for (i = 0; i < 20; i++) {
		low_cnt = 0;
		sleep(1);
	}

That's it.   The computation between sleeps amounts to "i++", and
a test for i < 20, and assigning 0 to a var (a 64 bit var.)

All the rest of the nonsense that's written in the source is simply
optimised away by the compiler, the results are never used, it is
obvious the results are never used, and no optimiser worthy of the
name would leave any of it there doing nothing.

Here's a disassembly of the loop from one of the threads (from amd64),
just to demonstrate...

   0x0000000000001ea9 <+167>:   mov    $0x14,%ebx
   0x0000000000001eae <+172>:   movq   $0x0,0x20337f(%rip)        # 0x205238 <low_cnt>
   0x0000000000001eb9 <+183>:   mov    $0x1,%edi
   0x0000000000001ebe <+188>:   callq  0x13e0 <sleep@plt>
   0x0000000000001ec3 <+193>:   sub    $0x1,%ebx
   0x0000000000001ec6 <+196>:   jne    0x1eae <low_prio+172>

So you don't need to go check out the sources, the actual loop code
as written is:

        long tmp = 0;
        for (int i = 0; i < 20; i++) {
                while (low_cnt < MAX_LOOP) {
                        tmp += (123456789 % 1234) * (987654321 % 54321);
                        low_cnt += 1;
                }
                low_cnt = 0;
                sleep(1);
        }

tmp is not used anywhere else.   The inner while loop, which is
supposed to be where all the CPU time is being used, is simply
gone - the change to low_cnt in it is obviously unwanted, it keeps
getting set back to 0, and tmp is simply unused.  Even if the inner
loop wasn't obviously useless, the computations are just arithmetic
on constants.  Even the ancient Ritchie C compiler would have calculated
	(123456789 % 1234) * (987654321 % 54321)
at compile time and just made the code be doing
		tmp += 1105500;
each time around the loop...  That's not a lot of work, really.
MAX_LOOP is 100 million, so if the inner loop were being executed
it would at least be doing 100,000,000 additions of a constant,
which might take something of the order of a second on a modern CPu
I guess (much more on qemu).   Pity that it doesn't.

Again, sometime in the distant past, the compiler optimisers might
not have been quite so good (though it does not take much to see
through that bogus code) and may have actually cause more work to
be done.   Not any more.   It would be possible to use the result
of the computation (tmp), even just make it be in a volatile var, but
really, what's the point?   This is not supposed to be a test of
thread scheduling it is supposed to be a mutex test, and unless
the two threads finish at almost the same instant, it would not be
(so at least the lack of real computation gives the mutex a chance
to be useful.)

So, please, let's just delete this meaningless test, it isn't (and
was never designed to be) a real test of anything related to mutexes,
which is what it should be, and for any other purpose, it is simply
a waste of time.

kre

ps: if for any reason the test is to be kept, perhaps after fixing,
please delete the \n in the message Martin added earlier this year
(the diagnostic to help work out why it fails ... not really needed,
it is obvious from code reading).   ATF does not cope well with
multi-line messages in its fail/skip/... routines, and anything with
an embedded \n (even when nothing comes after) counts as that, and
causes it to barf.
Prev by Date: Re: Ecessive use of PWD command in /usr/bin/ftp
Next by Date: i386 floating point expertise needed
Previous by Thread: Ecessive use of PWD command in /usr/bin/ftp
Next by Thread: i386 floating point expertise needed
Indexes:
Home | Main Index | Thread Index | Old Index