Wonky NTP startup and the incremental-configuration problem

Hal Murray hmurray at megapathdsl.net
Fri Jun 10 07:31:07 UTC 2016


esr at thyrsus.com said:
> ntpq has dangerous operations that tweak parameters of the time-sync
> algorithms on the fly - operations that can be triggered remotely. Or so I
> gather from things Hal Murray has said; my outside view is weak here, I've
> never explored those operations. 

ntpq can be used to tweak things, but it takes a password.
I've never used it that way.


esr at thyrsus.com said:
> How it should work is that there is just one way to hack your configuration,
> modifying ntp.conf, and restarting the daemon to reread it is a low-cost
> operation that produces only transient synchronization glitches.  Of course
> this would also imply faster crash recovery.

It won't help with system crashes.
ntpd doesn't crash often enough for that to be a problem.

> 1. Why is convergence from a standing start so slow?

We should collect a few examples.  In particular, compare various OSes.

Mills was very good about that sort of stuff.  But lots of people have 
"fixed" things in the kernel.  A while ago, Linux rewrote the time keeping 
code in the kernel.  They may have broken one of his assumptions.

It might be interesting to try really old kernels.

Does anybody have access to an old DEC Alpha running True64?  1/2 :)  I'm 
pretty sure Mills was happy with them and I doubt if anybody has made changes 
in that area.  (I remember some comment about their crystal being stable 
under temperature.  I think it was SAW.)


esr at thyrsus.com said:
> 2. If there is a fundamental reason for the slowness, shouldn't it
>    be possible to dump some kind of state that would allow ntpd
>    to reread it and resume from a running start? The key question
>    is whether we can identify that state. 

That feels like more complexity than it is worth.


esr at thyrsus.com said:
> (This is also why I haven't yet removed the SAVECONFIG code, much as I'd
> like to. Shortly after the Penguicon meeting I found that the beliefs we
> based the decision to remove it on were inaccurate - it is not in itself a
> potential security hole, and it has a real use when runtime config
> operations are allowed, which is to dump your actual configuration so you
> can check the cumulative results of your tweaks.) 

It's disabled by default.  It's not really useful in my opinion since it 
writes stuff out in the order it chooses thus sorting the file.  I think it 
drops comments.  So it's not a useful way to maintain a config file.

------------

There may be a start-too-slow bug.  I think I may have seen some of them, but 
there was enough going on that I haven't looked into it carefully.  The with 
PPS case may be different.

What are your goals?  What is good enough?  What is not good enough?


-----------

gem at rellim.com said:
>         a. the '-g' startup algorithm is acting perversely.  Ntpd just 

That's an interesting possibility.  Is that based on solid observations or 
just a wild guess?


gem at rellim.com said:
> Then put your NMEA refclock at the top of the ntp.conf and watch the 'fun'. 

That sounds like a handy way to trigger issue #68

----------


fallenpegasus at gmail.com said:
> My first inclination is to change ntpsec to do what chrony does re saving
> the drift stats, and once we see that NTPsec can restart converge roughly 

ntpd already saves the drift.  It's basic algorithm doesn't use the other 
stuff that chronyd saves.

---------

esr at thyrsus.com said:
> 1. First, try to improve convergence time by fixing the #68 startup bug.
> Maybe that will lower the performance hit of a restart to a level that
> doesn't make Gary mutter imprecations.  Be aware that this might bring some
> pushback from people who like a sloppy but fast start.  If that's not
> enough...

Sounds good.

> 2. Second, figure out what state ntpd needs to get a running start and toss
> it to disk for reread on next startup. 

I can't think of a lot that isn't in the config file or drift file.  Each 
peer has a buffer of the last 8 requests.  That can be reloaded in a dozen 
seconds using iburst.

The drift file may be an hour old.  (or older if it hasn't changed much, but 
then it doesn't matter)  That will turn into a startup transient, but I'd 
like to see examples before we consider this a serious problem.


-- 
These are my opinions.  I hate spam.





More information about the devel mailing list