Testing

Sun Jul 14 00:07:50 UTC 2019

Hal Murray <hmurray at megapathdsl.net>:
> Your writeup focuses on code mutations rather than state space.  (Or maybe I didn't read what you intended.)

Perhaps I could have been clearer.  But those two problems run
together in my mind becauase of what unifies them and contrasts
with the GPSD and reposurgeon cases.

It's...hm...maybe a good way to put it is that the structure of the
NTPsec state space and sync algorithms is extremely hostile to
testing.

In reposurgeon, when I want to test a command it's generally not too
difficult to hand-craft a repository with the relevant features, run
the command, look at the output repo and verify that the
transformation is as expected.  In GPSD one of the things that makes
the test suite work so well is that by eyeballing a check file you can
actually see the correctness of the relationship between a sentence
burst fom the GPS and the JSON it's transformed into.

If you try to do this kind of eyeballing in NTPsec it will make your
brain hurt.  It's not just that the input and output packets are
binary, that's superficial and fixable with textualization tools I can
write in my sleep. Fine, let's say you've done that. You've got an
interleaved stream of input and output timestamps.  How do you reason
through the sync algorithms to know whether the relationships are
correct?

Not only are there time-dependent hidden inputs to the computation from
the kernel clock and PLL, but they're going to be qualitatively
different depending on whether you have an adjtimex or not.  Yes,
sure, you can expose all those inputs in your test loads, but now what
you have is a mess of numbers related through algorithms with an
intractably large number of moving parts. Every bit of state retained
between packet-handler invocations is a moving part. So is every
configuration option.

And there's no smoothness in the test space. In reposurgeon or GPSD I
can take an existing test, modify it, and often know with reasonable
certainty how the code path traversed will change and whee the new
check file will differ. In NTPsec it's nonlinearities and edge
triggers all the way down.

What I found out after writing almost all the mechanics of TESTFRAME
is that once you have it you slam into a wall that better tooling is
zero help with. There's no way to get enough of a causal handle on
what's going on to be sure you can test even simple features like
outlier clipping.  General verification of correctness is *completely*
out of reach; the best you can do is test for (a) same-input-same-output
stability and (b) try to cover enough code paths to smoke out the 
core dumps.

In forty-odd years of software engineering I've never seen another
testing problem this wicked.  I don't really expect to see one
if I live another forty.

> The "known to be interesting" phrase gets back to my query that
> started this thread.  I'm looking for a way to test corner cases.
> Would TESTFRAME would have done that?

Given a set of inputs that triggers a corner case, yes.  The
problem is *how do you compose that input load?*

There are simple cases in when you can do this.  An obvious one 
is writing perverse configurations to try to crash ntpd on
startup. The problem is that those aren't *interesting* cases -
testing them would be like looking for your car keys under
a streetlight because the light is good even though you
dropped them in the dark two blocks over.

> If we don't like TESTFRAME, what else can we do?

In principle? Fuzz-probe for core dumps.  *With* TESTFRAME, we could
test for same-input/same-output stability. And that's about it.

I've had four years to think about this problem. If there were 
a way over, under, or around it I do not think it would have
taken me that long to spot it.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>