A milestone - the first TESTFRAME capture

Wed Dec 9 14:54:18 UTC 2015

Hal Murray <hmurray at megapathdsl.net>:
> Another way to look at things is that intercepting ntp_gettime is wrong.  You 
> aren't checking the code in ntp_gettime.  You should be intercepting the 
> kernel calls to get the time.

Yes, in fact I'm already doing that.  This would probably not be a bad
time for you to audit ntpd/ntp_intercept.c snd see if you can spot holes
in my assumptions.

> I think you said you weren't planning to intercept the main select, but 
> replace it with a dispatcher to work off the replay log file.  That will have 
> the same problem.  You won't be testing the select and dispatch logic.

Oh *good*!  Design review.  I haven't gotten enough of that, and you
are probably as well or better equipped for it with the right domain
knowledge as else on the team.

You are correct.  I thought through what would be necessary to
exercise that piece, and I did come up with a method - insert a wedge
library to mock the socket and UDP system calls.  I rejected this as
prohibitively complex and platform-dependent.

Fortunately, the main select (which is not actually a select because
the interfaces are connectionless and nonblocking, but it is a similar
polling loop) is very well isolated from the timekeeping logic, and
actually pretty effectively contained in ntp_io.c.  I've said on my
blog that I respect Dave Mills a lot as a system architect, and this
is one of the major reasons why; while the code had a lot of archaisms
and cruft in it, the separation of concerns in the basic architecture
is sound and well executed.

Because that is so, with a relatively small amount of refactoring I
have been able to isolate the whole receive side of the network plumbing
into a function named mainloop() which calls out to (a) the protocol
machine (via the receive() function) and (b) timer(), which calls
various intercept-layer functions.

I can't test mainloop() itself. But by inserting the intercept layer
where it is, I can cover *everything else*, including the loop filter
and the timekeeping logic and all the hair around configuration and
drift and leapseconds.

The replay interpreter will replace mainloop(). It's not ideal, but it's
what's practically feasible.

It might be doable to set up a separate test jig for mainloop().  The
reason that would be easier is because the NTP configuration machinery
would be out of the picture - we could tell mainloop() to listen on
nonroutable local addresses and just throw packets at it to verify
that the receive hook is seeing the data we expect.

> This is going to be an interesting adventure.
> 
> The reason that your test methodology works so well in GPSD is that there is 
> no internal state.  There is a one-to-one mapping from input to output.

Actually, there is internal state in GPSD.  It is used, for example, to
aggregate entire cycles of NMEA sentences into complete 3D fixes.  The
really crucial difference is that gpsd doesn't change any *external*
state - it does nothing comparable to ntp_set_tod()/ntp_adjtime() or
otherwise tweaking the state of the system outside it.

Accordingly, one of the things replay mode is going to have to do is
mock the timekeeping state of a kernel so that replayed calls think
they're seeing the same data they did during the capture.  It is quite
clear how to mock the system clock, less so how to emulate the control
parameters for the PLL (and that part's not done yet).

> I wonder what fraction of the changes we will be making to NTP will show up 
> in the output of ntpd and break a test.

That is a good question and one that I admit worries me a little.

Certainly changes to the timekeeping logic are going to tend to break
the whole test set.  There's an analogy to this in GPSD - changing the
error-bar calculations was always a PITA because it would often break
a lot of regression tests in trivial numeric-mismatch ways.

But even in the worst case the test set should function the way it
does in GPSD, warning developers when some change that should not have
changed visible behavior actually did.  For GPSD that turned out to be
three-quarters of the battle in reducing defect rates to near zip; I'm
hoping for a similar success here.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>