Wonky NTP startup and the incremental-configuration problem

Thu Jun 9 22:49:50 UTC 2016

Mark Atwood <fallenpegasus at gmail.com>:
> It looks like there is no obviously good route forward.
> 
> My first inclination is to change ntpsec to do what chrony does re saving
> the drift stats, and once we see that NTPsec can restart converge roughly
> as well as chrony, we rip out the runtime conf code.  Maybe even use the
> same filesystem file format as chrony for that data?

That is an interesting idea, and probably sound.

My present plan is to do the simplest thing first:

1. First, try to improve convergence time by fixing the #68 startup bug.
Maybe that will lower the performance hit of a restart to a level that doesn't
make Gary mutter imprecations.  Be aware that this might bring some pushback
from people who like a sloppy but fast start.  If that's not enough...

2. Second, figure out what state ntpd needs to get a running start and toss it
to disk for reread on next startup.

Neither thing will be easy, alas.  We're talking about the PLL code,
which runs second only to the network hairball in ability to erode my sanity.
But the first possibility might not be painful if I get lucky.

> My own experience with the MySQL internals, the F5 3DNS internals, and the
> Digeo Moxi internals, is that runtime configuration to a running process
> adds huge amounts of complexity and hair to the code.    I suspect that
> that is also the case in NTP.

I, Mark, do not merely "suspect" this. :-)

>                               A usually better approach is to make
> process restart fast and safe, or if the process MUST be long running, make
> the subsystems isolated and restartable.

Yes, that's the direction I'm pushing.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>