Testing

Tom Enterline tenterline at gmail.com
Mon Jul 15 14:44:21 UTC 2019


Please excuse an outsider jumping into the conversation.

AIUI, the testing under discussion is what I think of as the system
programming type - if we have inputs A and B to a black box, and the
test reproduces output C exactly, bit-for-bit, then the test is a
success, otherwise it is a complete failure.

I come from a scientific background, where we compare results somewhat
as analog values. If the test result is off the expected by 1000%,
that's bad. If it's off 1%, better. If the error is .00001%, probably
within  achievable accuracy.

NTP is dealing with digital approximations to real-world, analog
values. So if a test outputs a time within 1 nsec of the reference
output, maybe it's 'good'.

Also AIUI, one issue is that the there is a black box, that no one
alive really understands. I suggest creating a simplified version of
the box (level 0) that is easy to understand. Maybe all it does is
echo the last time it received -- no fancy filtering, floating point
tricks, adjtimex, etc. Then all the rest of the machinery (config file
reads, server selection, UDP, etc.) could run and have results that
are easily predictable. Then a level 1 box could add a few more
features that would exercise more of the code, etc.

I'm not saying this would be easy, I'm sure the there isn't an obvious
replaceable black box.

Eric,
If this is one of the many approaches you already considered, I
apologize for wasting your time.

Tom

On Mon, Jul 15, 2019 at 9:04 AM Eric S. Raymond via devel
<devel at ntpsec.org> wrote:
>
> Hal Murray <hmurray at megapathdsl.net>:
> >
> > > It's...hm...maybe a good way to put it is that the structure of the NTPsec
> > > state space and sync algorithms is extremely hostile to testing.
> >
> > I still don't have a good understanding of why TESTFRAME didn't work.  I can't
> > explain it to somebody.
> >
> > We've got
> >   code mutations
> >   hidden variables in the FSM
> >   hostile
> >
> > So what makes it hostile?  Is it more than just complexity?
>
> Once you have TESTFRAME in place, and are logging all the input state
> including things like the system clock and PLL state, and all the oput
> state too, it's just complexity.  But that "just" is a doozy.
>
> Let me refocus on the basic question here.  Let's say you've put
> TESTFRAME back in place and finished it.  You can now start ntpd up
> and make captures that include all of the input state of the FSM and
> all of its outputs.
>
> How do you tell that any given capture represents correct operation?
> What check do you apply to the captured I/O history to verify that the
> sync algorithms were functioning as intended when the cature was
> taken?  *That's* the hard part - not the mechanics of TESTFRAME
> itself, which is just tooling.
>
> If you have such a check, then TESTFRAME can be used to verify
> correctness of operation.  You do it the way I built a test suite for
> GPSD. You take a whole bunch of captures. Run your magic check on the
> relationship between input and output to verify that operation
> is correct on each.  Stash the captures in the tests directory.  Then,
> when you change the code, you rerun each of the caputures. If actual
> and expected outputs don't diverge, you're good.
>
> (You still don't know how to compose captures to trigger specied corner
> cases, but there's no point in worrying about that problem until you
> have your check procedure.)
>
> In GPSD, the magic check is just looking at a capture, because the
> correctness of the relationship between input packets and output JSON
> is pretty easy for an unaided Mark I Brain to verify.  In reposurgeon
> it's a little tricker to verify that a load/checkfile pair represents
> correct operation, but not all that diificult for small carefully
> crafted cases.  You end up crafting a lot of small cases - I have 145
> of them.
>
> I don't know how to write that check for NTPsec captures - it sure as
> hell can't be done by eyeballing the packet traffic.  That's the first part
> of I mean by "hostile to testing"; there are other issues, but until
> we know how to address this one there's little point in even
> enumerating them.
>
> In the absence of such a check procedure for captures,
> TESTFRAME is nearly useless. You can use it to test
> same-input-same-output stability over time but that's about it.
>
> > Why isn't this sort of testing even more valuable when things get complex?
>
> Of course it would be more valuable because things are complex.  If it
> were practical at all; I don't think it is.  I would be very happy if
> you were to prove me wrong.
>
> > How do we tell that it is working without TESTFRAME?  I eyeball ntpq -p and/or
> > graphs of loopstats and friends.  That's using the stats files as a summary of
> > the internal state.
>
> OK, you have something resembling a check procedure.  I can't do that.  I
> don't know enough about the visible signs of correct vs. incorrect
> operation to trust my ability to tell.
>
> Now we get to the next kind of hostility to testing.  How do you compose
> captures so they explore some desired set of transitions in the daemon?
> I know how to do that in GPSD and reposurgeon; I haven't the faintest
> clue how to do it in NTPsec.
>
> And the next. Test-pair brittleness. Reposurgeon tests never break
> once composed unless I decide to deliberately change the behavior of a
> feature. In GPSD test pairs will break only when the leap offset goes
> up, or on era rollover.  Those are rare events because GPSD relies on
> very little hidden or retained state between packet bursts.
>
> ntpd retains a huge amount of state (packet queues for median
> filtering, etc).  That's why reasoning forward from inputs to outputs,
> or backwards from outputs to inputs, would be brutally hard
> even for someone with perfect knowledge of the sync-algorithm
> theory of operation.  There's sensitive dependence on that state.
>
> > Did TESTFRAME capture the stats files?
>
> I think they became part of the TESTFRAME capture.  If they didn't, that would
> be easy to fix.  That's just mechanics, it's not the hard part.
>
> > With a bit more logging, we could probably log enough data so that it would be possible to do the manual verification of what is going on.  We would have to write a memo explaining how it works, maybe that would include chunks of pseudo code.
>
> Good luck. I did not have the knowledge base to do that. If you do, more power to you.
>
> > How much of the problem is that Eric didn't/doesn't understand the way the inner parts of ntpd work?  I've read the descriptions many times but I still don't understand it well enough to explain it to somebody.
>
> Hal, I don't think *anybody* does. Daniel comes close - he can explain
> the theory better than I can. But there's a gap between cup and lip;
> knowing how an idealized NTP-like sync is works is not the same
> thin as being able to do detailed enough predictive modeling of the
> implementation to *compose* test cases, the way I can in GPSD or
> reposurgeon.
>
> > I'm pretty sure we gave up on systems that don't support adjtimex.  OpenBSD doesn't have it, but does have enough to slew the clock.  We dropped support for OpenBSD when that shim was removed.
>
> Thsat's not how I renember it.  Both code paths are still present in the codebase.
>
> > How far did you get with TESTFRAME?  Do you remember why you decided to give up?  Was there something in particular, or did you just get tired of banging your head against the wall?
> >
> > How many lines of code went away when you removed it?
>
> commit e3fa301b1ae9d5502f955b47b60fe067e15d0755
> Author: Matt Selsky <matthew.selsky at twosigma.com>
> Date:   Wed Feb 1 02:05:14 2017 -0500
>
>     Remove ntpd flags related to TESTFRAME
>
> commit df63da97a1563572b2f4252d67998e6342f4f207
> Author: Eric S. Raymond <esr at thyrsus.com>
> Date:   Fri Oct 7 00:44:20 2016 -0400
>
>     TESTFRAME: Withdraw the TESTFRAME code.
>
>     There's an incompatible split between KERNEL_PLL and non-KERNEL_PLL
>     capture logs - neither can be interpreted by the replay logic that
>     would work for the other.
>
>     Because we can't get rid of KERNEL_PLL without seriously hurting
>     convergence time, this means the original dream of a single set of
>     regression tests that can be run everywhere by waf check is dead.
>     Possibly to be revived if we solve the slow-convergence problem
>     and drop KERNEL_PLL, but that's far in the future.
>
>     Various nasty kludges could be attempted to partly save the concept
>     by, for example, having two different sets of capture logs.  But, as
>     the architect of TESTFRAME, I have concluded that this would be
>     borrowing trouble we don't need - there are strong reasons to suspect
>     the additional complexity would be a defect attractor.
>
>     One problem independent of the KERNEL_PLL/non-KERNEL_PLL split is that
>     once capture mode was (mostly) working, it became apparent that the
>     log format is very brittle in the sense that captures would easily be
>     rendered invalid for replay by minor logic changes or even changes
>     in tuning parameters for the sync algorithms.
>
> At that time I did not yet fully understand the brittleness problem;
> I now think the last paragraph greatly understates it.
>
> If you reverse those, in that order, you'll get most of TESTFRAME
> back.  Some hand-patching will be required because this was before
> Daniel's big refactor landed, but that's a small effort compared to
> recreating TESTFRAME from scratch (which took me months!).  I did
> document the assumptions and architecture pretty carefully.
>
> I think the only piece of mechanics missing is mocking UDP input.
> The code for landing input packets was much hairier then than it
> is now.
>
> > Would it be interesting for me to take a try?
>
> Well, sure.  You're one of the very few who knows the problem space
> (not the code, but the problem space) better than I do.  I don't think
> much of your odds of success, but they're not zero.
>
> > But back to the big picture.  How can we test corner cases?
> >
> > Is it reasonable to look for patterns in the log file?
> >
> > Is it reasonable to look for patterns in the output of ntpq -p?  Graphs?
>
> I think I've implied answers to these questions
>
> > When you do a Go port, what can you do to make testing easier?
>
> Not much.  I think mocking UDP input will be easier in Go-land, but
> the serious problems are language-inspendent and not mechanical.
> --
>                 <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
>
>
> _______________________________________________
> devel mailing list
> devel at ntpsec.org
> http://lists.ntpsec.org/mailman/listinfo/devel


More information about the devel mailing list