My pre-1.0 wishlist

Sun Jun 5 19:03:02 UTC 2016

On 6/5/16, Eric S. Raymond <esr at thyrsus.com> wrote:
> I will be delighted if you keep having boundary-busting ideas like
> that, because some of them will work.  But I have to be the guy to
> tell you that, in our context, trading a very hard but in principle
> deterministically solvable problem of software forenics and testing
> for a research-grade problem in applied statistics is *not progress!*.
> That would be buying more trouble to escape less.

*What* research-grade problem? Dave Mills already solved the
research-grade part of the problem decades ago. The statistics we
should be monitoring are already collected
by ntpd and exported in machine-readable form through ntpq. Sample
these statistics from version A and version B. From there it's matter
of figuring out whether they line up -- and Kolmogorov showed us how
to do that part close to a *century* ago. Anyway, the fine points of
our statistical methodology are seldom going to matter: I think bugs
like "we degraded our precision by 20%" are going to be pretty rare
compared to "this configuration used to work, and now it's completely
broken".

> Even if we developed a procedure for computing a confidence figure that we
> belived, *how would we sell it to our customers*?

We don't have to sell that to them any more than Tupperware has to
sell me on their quality control methodology before I'll buy a pot
from them. The primary purpose of this testing is not to persuade
users of the absence of bugs; it's to alert us to their presence
before we ship broken code. If we do it right, then we'll develop a
reputation that speaks for itself even to the statistically
illiterate.