tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A couple of questions



I don't have an immediate answer for the original questions, however...

On Sat, 3 Nov 2012 04:04:58 +0200
Jukka Ruohonen <jruohonen%iki.fi@localhost> wrote:

> Seriously -- while the old saying goes that all tests should be as
> high quality as the production code -- I am not sure we can follow
> this fine principle with tests(7). I can only assume that e.g.
> inducing root to run the tests would reveal numerous potential
> vulnerabilities. Yet, several of the reproducable bugs we've found
> are only available to the root, even with rump and all.

Actually, tests need to be even better than production code. The
reason for this is that ANY test which ever produces the wrong result
is not really a test at all. False positives mean regressions being
missed. False negatives mean wasted time and energy looking for a bug
in the wrong place, or else slipping into a habit of ignoring them.

On Fri, 2 Nov 2012 18:45:43 -0700 (PDT)
Paul Goyette <paul%whooppee.com@localhost> wrote:

> Is there some clean way to force preemption, even on SMP systems with
> lots of cores?  (On my 24-core machine, the tests succeed less than
> half the time under normal system load.)

And this example is a perfect one. The man page for tests(7) states “If
there is _any failure_ during the execution of the test suite, please
considering reporting it to the NetBSD developers so that the failure
can be analyzed and fixed.” (emphasis in original) It's one thing if
someone who knows the code/tests can spot a spurious false negative and
ignore it (although, how do you tell between an expected false negative
and a genuine heisenbug? - more on this in my next paragraph) but end
users certainly can't be expected to, especially when instructed to
report any failure.

An example from my professional experience: A certain test would
sometimes fail in automated testing but much more often pass. For a
while it was written off by devs as a race or other bug in the test
(release engineers often get blamed!) However, as this problem
persisted a pattern was spotted: the test failed when executed on
certain machines in the farm. Eventually it was tracked down to a
difference in filesystem semantics; a few nodes used a different FS and
the code made an assumption that wasn't true in all cases. There really
was a bug triggering the intermittent failure. And what's more, if the
test farm had been homogeneous and all used the same setup, it probably
wouldn't have been spotted.

So it's not about whether faulty tests constitute a vulnerability
(though I'm sure this is a concern for some), but rather whether we can
deduce from a test failure that there is a bug to find. Any assumption
in the test which might not hold true in all cases is one which could
mask an assumption in the code being tested, and there may be no way to
distinguish between the two. And any race condition in the test could
mask a race in the code (et cetera). If one person's specific setup
causes a test to fail, then anything in the test which could be causing
the failure needs to be eliminated to see if that setup also causes
problems in the code.

That's my 2p worth, anyway.

Julian

-- 
3072D/F3A66B3A Julian Yon (2012 General Use) <pgp.2012%jry.me@localhost>

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index