My pre-1.0 wishlist

Sun Jun 5 18:31:02 UTC 2016

Achim Gratz <Stromeko at nexgo.de>:
> If you really want to do full behavioral testing like that, then I'd
> rather go for complete mocking/scaffolding of the environment (including
> kernel calls) to produce the required determinism to make the results
> meaningful.  This way you can do a trace replay faster than real-time
> and ensure that the input to ntpd is identical between runs.  There's
> still the problem of coming up with traces that meaningfully exercise
> the code.

You've just described TESTFRAME.

   ntp_intercept.c - capture and replay logic for NTP environment calls

   Think of ntpd as a complex finite-state machine for transforming a
   stream of input events to output events.  Events are of the
   following kinds:

   1. Startup, capturing command-line switches.

   2. Configuration read (and synchronous DNS call/returns).

   3. Time reports from reference clocks.

   4. Time calls to the host system clock.

   5. Read and write of the system drift file.

   6. Calls to the host's random-number generator.

   7. Calls to adjtime/ntp_adjtime/adjtime to adjust the system clock.

   8  Calls to ntp_set_tod to set the system clock.

   9. Read of the system leapsecond file.

   10. Packets incoming from NTP peers and others.

   11. Packets outgoing to NTP peers and others.

   12. Read of authkey file

   13. Termination.

   We must support two modes of operation.  In "capture" mode, ntpd
   operates normally, logging all events.  In "replay" mode, ntpd accepts
   an event-capture log and replays it, processing all input events in a
   previous capture.

   We say that test performance is *stable* when replay mode is
   idempotent - that is, replaying an event-capture log produces an exact
   copy of itself by duplicating the same output events.

   When test performance is stable, we know two things: (1) we have
   successfully captured all inputs of the system, and (2) the code
   has experienced no functional regressions since the event capture.

   We can regression-test the code by capturing logs of production
   behavior and replaying them.  We can also hand-craft tests to probe
   edge cases.  To support the latter case, it is highly desirable that the
   event-capture format be a text stream in an eyeball-friendly,
   readily-editable format.

   == Implementation ==

   ntpd needs two new switches: capture and replay.  The capture switch
   says: to log all event calls to an event capture file in addition to
   their normal behaviors.  This includes both read events (such as
   refclock inputs) and write events (such as adjtimex calls).

   The replay switch has more complex behavior. Interpret a capture
   file. Mock all event calls with code that looks at each event
   sequentially from the capture.  If a read call is performed, and the
   next data in the log file is for that read call, the logged data is
   returned for the call.  If a write call is performed, the call type and
   call data is compared to the next log data.  If the next event doesn't
   match the expected type or has different data, show the difference and
   terminate - the replay failed.  Otherwise continue.

   Replay succeeds if the event stream reaches the shutdown event with
   no mismatches.

   == Limitations ==

   Reference-clock events are not yet intercepted.

The capture end is done.  The replay end turns out to be really hard.
The problems cluster around replaying type 10 events.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>