New blog draft on the TESTFRAME debacle

Eric S. Raymond esr at thyrsus.com
Sun Jan 8 05:40:45 UTC 2017


Hal Murray <hmurray at megapathdsl.net>:
> 
> I don't understand your section about the PLL.  I think it's tangled up in 
> some bogus terminology.  If you say a bit more, I'll try to sort things out.

Alas, "say more" is not really actionable advice.  Can you tell me
what seemed bogus to you?

> What OSes are the odd-balls?  What makes them odd?  Is this just a different 
> API to the kernel that turns into significantly different paths through ntpd?

MacOS, Windows, and ... damn, I don't remember which BSD it was (not
FreeBSD).  What the oddballs lack is ntp_adjtime().  They can only step,
not slew.

> Are there other examples in this area?  If not, I think it would be valuable 
> to be able to run TESTFRAME using only the logs that were captured on a 
> compatible system.

Complexification #1.

> Plan B would be to do something like cross compile using header files that 
> match the system the log file was captured on.  In replay mode, there is no 
> interaction with the kernel so we don't actually need the library/kernel 
> calls that don't exist on the replay system.  We would probably have to error 
> out the real calls in the intercept module.  That doesn't seem like a hard 
> problem.

Complexification #2.

Yes, I'd already thought of these.

What I fear at this point about attempting to revive TESTFRAME is
that I would be committing myself to an increasingly elaborate set of
kludges in order to sort of halfway make the concept work, only to find
that replay logs aren't actually stable enough to pay back the effort.

You are doing  nothing to reassure me about this,  I'm afraid.  You've
proposed the same damned-if-that-don't-look-like-a-rathole possibilities
I came up with.

After having had my hopes dashed several times, I'm going to need more
confidence in a good outcome before I even gesture in that direction.

> > Every leap-second insertion requires us to rebuild some check files. 
> 
> Is that for gpsd or ntpd?

gpsd.  We don't have such check files for ntpd.

> I'd like a few more examples of free parameters.
> 
> > Here are some changes we've already made in NTPsec that break replay: the
> > default poll interval, the allowable minima and maxima the protocol machine
> > hunts in, and the default of minsane.  In effect, the entire logic of the
> > sync algorithms is a gigantic free parameter with 
> 
> What did we change in the way of the default poll interval?  or minsane?
> 
> I think we changed the allowable minpoll, but that doesn't change anything 
> unless you use it and old test logs won't have used it.

Checking...

commit 0544f2229822e89b8a62c2aa7659306593ff4a89
Author: Gary E. Miller <gem at rellim.com>
Date:   Mon Sep 26 12:59:14 2016 -0700

    Fix maxpoll for refclocks.
    
    An undefined maxpoll is not always set to the default maxpoll
    (NTP_MAXDPOLL).  FOr a local refclock the default max poll is
    the minpoll.

commit a3047c7a375877436d422e04a138aace7ce1bd06
Author: Eric S. Raymond <esr at thyrsus.com>
Date:   Sat Jul 23 18:23:33 2016 -0400

    Allow minpoll to be set as low as 0.  Useful on fast networks.
    
    chrony allows minpoll 0 and doesn't blow up the net, so this is
    probably safe.  A higher lower limit made sense when WAN capacity was
    much lower and we didn't want aggressive configurations grabbing a
    lot of it. But today the worst NTP can do is spit in the ocean
    comparesd to, say video streaming traffic.
    
    Note that this does not change the default minpoll value, just the
    lowest you can explicitly set.  The documentation lied about that,
    claiming it was 4 when it had been 3 since an unexplained change in
    2008.

Hm, it seems I misremembered this stuff.  I'm not sure I found all the
relevant commits, though.  I'll delete that sentence for now and do a
more thorough search.

> Even if we did change something like the default poll interval, we should be 
> able to patch old config files used for a TESTFRAME run to restore the old 
> default.

Complexification #3.  If you don't yet understand why I get nervous when
you talk about patching around such problems, please step back a bit and
think about the O(n**2) possibilities for bad interactions between the
kudges.  Every instinct of mine is yelling "DON'T GO THERE!"
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


More information about the devel mailing list