My pre-1.0 wishlist

Sun Jun 5 14:35:40 UTC 2016

On 6/5/16, Eric S. Raymond <esr at thyrsus.com> wrote:

> Unless you set up behavioral replicability (that is, an environment in
> which a known sequence of clock readings, I/O events, and other
> syscalls leads to another known sequence, or at least correct
> recognition teatures of same like ntpq -p showing what you expect) you
> don't have testing - because you don't know what output features
> discriminate between success and failure pf the test.

So weaken your notion of replicability from bit-for-bit-consistent
results, to statistical behavior of a linear time-invariant system.
Report test results as p-values rather than pass/fail. If you're
manually testing a client talking a server 10ms away, and after
several queries you're still seeing deltas of 20ms, then you know
something is horribly broken. If all your deltas are inside 2µs,
that's damned suspicious too. The intuitions you're applying here can
be made rigorous and your testing made replicable by collecting
statistics on delta values from a believed-good baseline and then
applying a KS test to see if the version you're testing follows the
same distribution. You can automate testing like this entirely on real
hardware without having to spoof any inputs at all.