Testing

Wed Dec 6 19:26:28 UTC 2017

Hal Murray <hmurray at megapathdsl.net>:
> 
> esr at thyrsus.com said:
> > Well, normally it would be a longer freeze, but we're doing this to roll out
> > a bug fix on code that has been pretty stable. 
> 
> Your version of "pretty stable" doesn't match mine.

Probably not. To a first approximation I judge "pretty stable" by the
burn-in time on my Pis. If it runs for long periods of time on all six
with no anomalies, it's stable.

Of course, I don't know that code younger than a couple of weeks
(since the last farm refresh) is like that.  But I can make a pretty
good bet on it by being conservative in a certain very specific way.

I am highly productive and my error rate is very low.  Here is how I do that.
I do not change executable code without having clearly in mind a set of
invariants that the code must satisfy before and after, and then proving
to myself that those invariants are still satisfied.

So, unless I fuck up, every change I make is a correctness-preserving
transformation.  I do fuck that up, but only rarely; it has only happened
on the close order of a dozen times during the entire life of this project.
We literally go months at a time between my fumbles.

(About 50% of the time when I do mess up it traces back to being short
of sleep.)

Most programmers don't know how to do this.  They have high error rates
because they *don't* stick to invariant-preserving transformations.
This is what I mean by being conservative in a very specific way.  I
half-joke about my cowboy tendencies, but the truth is I can only
operate in what sometimes looks like a loose and undisciplined way
because of that particular strict conservatism at the bottom.

Most importantly...*until* I can identify what invariants are relevant
and satisfy myself of an at least informal demonstration of correctness,
I keep my hands *off* the freaking code.

That discipline makes my sustained error rate really low, which has a
profound effect on my working style. It means that my optimax is to
move fast and trust myself not to break things.  Or, to put it
differently, my error rate can't go much lower, so I have little to
gain - and a lot of production to lose - by slowing down.

The burn-in testing on my Pis gives me confidence that I'm not leaving
old, bad bugs in my wake.  So I can be pretty sure that at any given time
my worst undiscovered error is only a week or two deep.

(On other projects I have different - sometimes better - ways of
checking my six.  GPSD and reposurgeon have *really good*
regression test suites. That didn't happen by accident.)

So, when I say the code is "pretty stable", what I actually mean is
that (a) the burn-in-testing has bounded the time depth of any serious
problems, and (b) my confidence that recent changes are
correctness-preserving is high.

It is always possible that at any specific time I have erred in
believing this. Very occasionally that will be so. But the way to bet
based on my long-term performance is that I'm not wrong.

> You want to make a point release to push out a one line patch.  You are 
> dragging along all the other changes since 1.0.  Sure, we don't know of 
> problems, but there have been lots of opportunities for seemingly innocent 
> edits to break something.

Your assumption is incorrect.  *I* don't want to push out a one-line
patch at all.  I want to clean up HEAD and ship it - move fast and
find out if anything has broken in our accumuated changes as quickly
as possible.

I wrote a paper about this once...

> Your removal of HAVE_KERNEL_PLL is a good example.  You could easily
> have made a typo that didn't break builds.  It's only had a few days
> of testing.

OK, let's take that as an example.  Why wasn't I worried?

Because if you have a colorizing diff like git's, removing a set of #ifdefs
is easy to eyeball-check. You look for balanced cliques of red lines. There
are a couple of *visual* invariants you can verify.

The point is, I knew what invariants to check before I went in.

> Another good example: A few days before 1.0, you edited all the
> msyslog messages to have a TAG: on the front of the text.  Sure, it
> didn't actual break anything,

Sure, it didn't actually break anything.  QED.

> but you didn't leave much time for fixing typos and there were
> several of them.  That sort of thing looks sloppy to me.

News from planet Earth: Just because it looks sloppy doesn't mean it is.

The typos weren't important. No invariant was busted.  I allowed myself
that change, that late, because I knew no invariant *could* be busted.
I'll make changes like that all day long and not count it against code
stability.

Instability is introduced when you can no longer be sure that your mental
model of the code captures its functional invariants.  In Curry-Howard
terms, you are no longer confident it's the same proof.

Also see: why I didn't let Gary mess with the pivot code.  Only now, a
year later, am I close to fully understanding the invariants around
that.  I wouldn't quite trust myself to alter it yet.

> What would you do if the world wasn't "pretty stable"?

Slow down a lot - be more like a normal programmer.  This is unlikely
to arise unless I have a stroke or something and my ability to reason
about invariants plunges.  We can know this has happened if my error
rate spikes.

>Should we have the technology to release branches?  Is this a good
>opportunity to test/debug that?

I don't like that approach.  I've said so before but maybe didn't
explain it well enough.  When you branch for a "stable" release you
may buy a lower prompt defect rate in the omne release, but you increase
the expected time to discovery of the bugs on your unstable branch.
Under a relatively wide range of conditions this actually pushes total
defect rates over time up.

Linux used to have stable and unstable branches.  It doesn't
any more, because Linus noticed this effect.

> I'd vote for including my fix to ntpq direct mode for mrulist.  It
> will take manual intervention.  I had to fix the test program too.
> I put them in the same commit to avoid breaking HEAD, but the test
> program wasn't part of 1.0.

See? You don't really want to ship a point fix either. :-)
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.