My first positive structural change to NTP

Sun Jun 26 02:23:01 UTC 2016

Today, after a false start yesterday and a correction, I completed a
patch sequence that makes a significant structural change that isn't just
removing cruft.

This is kind of a first. Yes, I've made some pretty dramatic changes to
the code over the last year, but other than the not-yet-successful TESTFRAME
scaffolding they were almost all bug fixes, refactorings, or creative removals.
The one exception, JSON reporting from ntpdig, was rather trivial.

What I've succeeded in doing is almost completely removing from the
code the assumption that refclock addresses necessarily have the
special form 127.127.t.u.  The only code that still believes this is
in the ntp.conf configuration parser, and the only reason *it* still
believes this in order not to break the existing syntax of refclock
declarations.

(In fact, clock addresses do still have this form internally, but that
is only to avoid surprising older ntpq instances; nothing in the
NTPsec code requires it.)

I've also made substantial progress towards eliminating driver-type
magic numbers from the code. The table that used to indirect from
driver-type numbers to driver-type shortnames is gone; instead, the
driver shortname string is what it should be - an element of the
driver method table - and there is only one type-number-to-driver
indirection, a table in refclock_conf.c.

This is all clearing the decks for a big user-visible change.  I'm going
to fix the frighteningly awful refclock declaration syntax.  Consider
this example:

----------------------------------------------------------------------
# Uses the shared-memory driver, accepting fixes from a running gpsd
# instance watching one PPS-capable GPS. Accepts in-band GPS time (not
# very good, likely to have jitter in the 100s of milliseconds) on one
# unit, and PPS time (almost certainly good to 1 ms or less) on
# another.  Prefers the latter.

# GPS Serial data reference (NTP0)
server 127.127.28.0
fudge 127.127.28.0 refid GPS

# GPS PPS reference (NTP1)
server 127.127.28.1 prefer
fudge 127.127.28.1 refid PPS
----------------------------------------------------------------------

The misleading "server" keyword for what is actually a reference clock.  The
magic 127.127.t.u address, which is the only way you *know* it's a reference
clock.  Some attributes of the clock being specified in a mystery 'fudge' 
command only tied in by the magic server address. The magic driver type number
28. The fail is strong here. The only excuse for this garbage (and it's not
much of one - Mills was smart enough to know better) is that it was designed
decades ago in a more primitive time.

Here's how I think it should look:

----------------------------------------------------------------------
refclock shm unit 0 refid GPS
refclock shm unit 1 prefer refid PPS
----------------------------------------------------------------------

No magic IPv4 address, no split syntax, no driver type number (it's
been replaced by the driver shortname "shm").  It should be less work to
get the rest of the way to this (while still supporting the old syntax
for backward compatibility) than I've done already - I've already written
the grammar, only the glue code still needs doing.

An unobvious benefit of this change is that the driver reference pages
are going to become a lot less mystifying.  I can still remember how
and why my head hurt on first reading them.  Removing the magic addresses
and mystery numbers will help a lot.

Along the way I learned a lot about how ntpq and mode 6 responses
work.  (Like NTP in general, it's an odd combination of elegant
structural ideas with an astonishing accumulation of cruft on top.)
In order to remove the magic-address assumptions from ntpq I had to
add another variable, "displayname", to the set you get back when you
request information about a peer. In effect, ntpd gets to say "*this*
is how you should label this peer", and ntpq uses that to decorate the 
clock entries in its -p output.

This has the minor downside that new ntpqs will display 127.127.28.0
(rather than "SHM(0)") when querying Classic ntpd, which doesn't ship
that variable. Oh well...almost everyone disables remote querying
anyway.  It was the right thing to do; ntpq has no business knowing
about driver type numbers.

(Grrrrr...Actually, *nobody* has any business knowing about driver
type numbers.  Things that have names should be referred to by name.
Making humans maintain a level of indirection from names to numbers is
perverse, that's the kind of detail we have computers to track. Or, to
put it slightly differently, "1977 called - it wants its ugly kluge
back.")

It's easy for codebases this size to wind up as huge balls of mud.  There
are several nearly equivalent ways to describe my job as a systems architect;
one of them centers on enforcing proper separation of concerns so collapse-to-
mudball is prevented.  The changes I've just described are a significant
step in the good direction.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>