ntpd: program structure

Sun Jan 6 16:51:53 UTC 2019

Hal Murray via devel <devel at ntpsec.org>:
> 
> The current code is a combined client and server.  I think we should split 
> that into separate programs.  Maybe not right away, but we should at least 
> start thinking that way.
> 
> The server side is simple.  It gets time from the system.  There are a few 
> other parameters that either need to come from the system (via ntp_adjtime?) 
> or from the server via shared memory or similar.
> 
> We should be able to run multi-threaded with only minimal changes.  It would 
> need locks for updating various statistical counters.  There are probably 
> others, but I can't think of them.
> 
> We would probably want to restructure things a bit.  For example, a thread (or 
> several) would be associated with a socket rather than going through a layer.
> 
> The client is pretty much what we have.  Just ignore the small amount of 
> server code.  If we run it on another port, we can run it at the same time as 
> a server.
> 
> I'm not sure what to do with ntpq.  Most of what I use it for (peers, -p) is 
> talking to the client but the server would be listening on port 123.
> 
> Anyway, I think that thinking about them as separate parts will help our 
> discussions.
> We should be able to improve performance on busy servers.

I have, unsurprisingly, spent a lot of time thinking about how to partition
the existing code into smaller pieces.  I had a strong incentive; reducing
global complexity would ipso facto choke off attack vectors.

Some of you may recall that my original project plan included a
project named REFCLOCK - breaking up ntpd into a pure network service
daemon talking to a local refclock manager.  This is a slightly
different partitioning from Hal's proposal here, but raises many of
the same design issues.

The biggest problem with any attempt to break up ntpd into multiple
separate programs is that it would almost necessarily force changes in
the way NTP configuration works that would be (a) user-visible, and
(b) not backward compatible.  The only ways around having such a
configuration break I was able to think up were so complicated and
ugly that they seemed like non-starters in themselves.

Given what we believe about the conservatism of our current and
potential userbase, a compatibility break in configuration would be a
hell of a major cost and not to be undertaken without a really
compelling reason.

Then there is, as Hal notes, the fact that ntpq has a unitary
client/server assumption built in.  I would not consider this a
compelling strike against breaking up the ntpd monolith by itself;
people sophisticated enough to use ntpq in anything other than
peer-listing mode would also be sophisticated enough to understand
whatever changes we made to it.  But it certainly adds to *our*
expected complexity cost for breaking up the code in any significant
way.

Improving performance on busy servers is *not*, in my judgment, a
compelling reason.  NTP simply doesn't load a modern processor very much; I
know this from paying careful attemption to load averages on my RasPis
and from profiling to locate hotspots in the code.  And if we want to
reduce load, the low-hanging fruit is getting rid of the
once-per-second tick handler.

Enforcing separation of functions in order to harden the code would be
a better reason. The reason that never happened is that the cost and
payoff gradients changed as I was successfully ripping out 75% of the
old code.  There are so few refclocks left that isolating them no longer
looks like as big a win as it used to.

I think Hal's ideas about using multithreading are quite sound, but
that C is not the language to do them in.  We've been kicking around
the idea of a move to Go; better, I think, to defer more exploitation
of multithreading until and unless we make that jump and the primitive
set we have available is more tractable, less likely to introduce subtle
defects.

I've gone through several rounds of looking at the repartitioning
problem, mentally trying different ways to carve ntpd apart, and every
time around I've reached the same conclusion: it's too complicated to
be worthwhile.

On the other hand, I can also imagine how this might change.

If we keep stripping down and simplifying the ntpd internals, we may
reach a point at which the internal data flows between different
pieces are so narrowly and clearly defined that the codebase begs to
be carved apart into symbiont processes at the implied joints.

I am keenly aware that part of my job as the sysarch is to *know if and
when this happens* and to keep pushing us towards where it might.

One such obvious cutpoint is the interface from the configuration
parser to everything else.  If we could get all that packaged into a
single context structure, eliminating all globals, some possibilities
would open up.  In fact I'd have done this already if I hadn't judged it
better to not do anything potentially destabilizing while we were waiting
on Cisco.

Actually I think the really hard part in any repartition is going to be
prying the refclocks loose from the loopfilter code. *That* is a headache
and a half, and eliminating the once-per-second tick in favor of a
demand-paced request que should definitely happen sooner.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.