Bug 119, ntpdig doesn't do IPv6, fixed

Mon Oct 3 14:13:06 UTC 2016

Hal Murray <hmurray at megapathdsl.net>:
> 
> Thanks for tracking that down.
> 
> > I suspect the reason Hal didn't catch this and fix it instantly is that he
> > is, like the rest of us, really focused on ntpd.  And thus didn't think to
> > test ntpdig when he modified it.
> 
> Can we add some tests that would have caught this?
> 
> Do we need another category of tests?  I don't have a good word.  I'm 
> thinking of a script that gets run nightly/weekly and requires human review 
> to decide if a problem is due to a recent change in the code or a quirk in 
> the environment.

It could be done with an expect/send framework running smoke tests on ntpdig,
and ntpq to a known-good IP address - ntpsec.org's would do.

> > Take a lesson, everybody. It's the tests you don't run that'll hurt you. 
> 
> I've worked on at least one project where part of the culture was to collect 
> test cases along with bug fixes, and merge them into the standard test 
> collection.  It's embarrassing how often bugs get reinvented.  (That may be 
> an indication of poor architecture or just a messy area.)

I'm pretty religious about this practice on two of my other projects, GPSD and
reposurgeon.  GPSD has about 125 tests; reposurgeon about 145. On both,
reinvented bugs have been quite rare, though not entirely nonexistent.

There is a noticeable difference in effectiveness; GPSD has a very low
defect rate, reposurgeon a somewhat higher one (though still pretty
good compared to what I see on other projects' bugtrackers).

I think one difference is degree of novelty.  Bug replay is extremely
effective mitigation when your codebase is relatively stable, doing
much the same things it did last year; less so when you routinely try
to add capability.  GPSD is stable that way, reposurgeon is not.
I think NTP might be more like GPSD.

Another difference is algorithmic density.  GPSD is high on that scale
(how many programs include both a pattern-recognition FSM derived from
compiler technology and nontrivial matrix algebra?) but reposurgeon is
stratospheric (several large FSMs, heavy use of exotic graph-traversal
and graph-surgery algorithms, a copy-on-write cache tuned for its
internal data structures).

I think bug replay decreases in effectiveness when your problem space
is so tricky that not being sure what the right thing is looms larger
than implementation mistakes.  The actual time-sync algorithms in NTP are
like that; most of the rest of (the network plumbing, in particular)
is not.

TESTFRAME was intended in large part as a way to collect bug cases and
rerun them.  The concept stemmed directly from the way GPSD's gpsfake
works.  I put huge effort into it because I know how effective gpsfake
has been.  It is very sad that TESTFRAME turned out to be unworkable;
I don't have a plan B for bug replay yet, but it is something I am
thinking about.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>